This will be my take on in memory data structures. I will be up front that a lot of it is really just my opinion. Regardless, all the code has been tested and should at least give you some starting points. The purpose of data structures is to take data and store it in memory in a fashion that makes it easy to access and manipulate the data at need.
Simple Data Structures Plain variable The simplest data structure is a simple variable. Here is an example:
Option Explicit
Dim strFoo
strFoo = “FOO”
WScript.Echo strFoo
Here we have stored data (the string FOO) in a simple variable and we can retrieve or modify this data at will.
Arrays
Typing a new variable every time you need to store a little piece of data can get old. Easier is to put all the data into a structure where it is simple to access each piece. Here is a simple example:
Option Explicit
Dim arrFoo
arrFoo = Array(“FOO”, “BAR”, “FOOBAR”)
WScript.Echo arrFoo(0)
I’m not going to go into the way data structures are stored and accessed by the script engine in memory, but if you take the time to look it up it will make things much more clear. One way to think of an array is a series of buckets. We created a single variable (arrFoo) but this time instead of putting a single value into the variable, we put a series of 3 buckets in it. Notice that VBScript (like most languages) starts the numbering of the buckets at 0 instead of one. This is the cause of most problems for people when they first start working with arrays. To access one of the buckets in the array, you just use the variable name and tell it the bucket that you want. So to get the contents of the second bucket, we would use arrFoo(1).
While this example saves us a little typing by not having to type new names for a bunch of variables, the real benefit of arrays (and pretty much the rest of the data structures) is the ability to build them at run time without having to know what the contents will be ahead of time. Let’s say for instance that you wanted the user to type in a sentence and you wanted to tell them the third word in the sentence (don’t ask me why you want to do this, but you do just trust me).
Option Explicit
Dim arrFoo
Dim strFoo
strFoo = InputBox("Please type a sentence")
arrFoo = Split(strFoo, " ")
WScript.Echo arrFoo(2)
First, notice that this works completely without any knowledge ahead of time of what the user is going to type in. This is just one example of creating an array without having any prior knowledge of the contents or more importantly the size (number of buckets) of the array. We will see a different method of building arrays in a moment. Second, it is important to talk about potential errors here. In this example, if the user had have typed just two words, you would have gotten the dreaded Subscript Out Of Range error. This is the evil side of building arrays without having concrete knowledge of what is going into them. You have to make some assumptions in your code. In this case we assumed that the sentence entered would have at least 3 words. Unfortunately assumptions have a habit of being wrong.
So, before we go any further, lets introduce you to your new best array friend. The UBound() function. This function will tell you the largest available subscript (index) for an array. We can use this information to put in a little user-proofing:
Option Explicit
Dim arrFoo
Dim strFoo
strFoo = InputBox("Please type a sentence")
arrFoo = Split(strFoo, " ")
If UBound(arrFoo) >= 2 Then
WScript.Echo arrFoo(2)
End If
Now we are making sure that there are at least 3 buckets before we try to get the contents of the third bucket.
So far we have created an array from known elements and we have used a function that returns an array. Both of these are fairly simple. Now we will look at a more common way to build arrays. We will look at this both so you can understand how to do it and so that you will understand what makes other data structures easier later. Arrays in VBScript suffer from having a static length. That is when the array is created it is created with a specific number of buckets. If you want to put more data into the array it can be done, but you need to explicitly add a bucket to the array. Let’s say that you wanted to put the name of every file in C:\temp into an array. Your choices would be to count how many files are there then create an array with that many buckets; or add buckets as you list the names. We will focus on the second method since if you understand that, you will understand the first method too.
Option Explicit
Dim oFSO
Dim arrFiles : arrFiles = Array()
Dim oFile
Dim strFileName
Set oFSO = CreateObject("Scripting.Filesystemobject")
For Each oFile in oFSO.GetFolder("C:\Temp").Files
ReDim Preserve arrFiles(UBound(arrFiles)+1)
arrFiles(UBound(arrFiles)) = oFile.Name
Next
For Each strFileName In arrFiles
WScript.Echo strFileName
Next
There is quite a bit of new stuff to talk about here. First, this:
Dim arrFiles : arrFiles = Array()
This is creating a variable named arrFiles then it is telling the script engine to assign an empty array to it. You can get a similar result from:
Dim arrFiles()
But not exactly the same thing. For instance if you use the second method then try to UBound(arrFiles), you will get an error. The way that I do it in the sample though, avoids this error. This is important for when we get ready to add buckets to the array. Speaking of which, the next thing new in the sample is:
ReDim Preserve arrFiles(UBound(arrFiles)+1)
This is the add a bucket step. The ReDim command changes the number of buckets in an array. The Preserve keyword tells ReDim to not empty the buckets as it adds the new one. If you leave that keyword off when you ReDim, you will get an array of the length that you asked for, but all the elements in the array will be empty. The last part of the line is simply saying “Look at the size of the array right now and make it one bigger than that”. It is often common to see code where the author uses an increment variable (n, I, j, k, etc.) to keep track of what size the array is so that they can just add one to make the array larger. I always found this silly since you can simply ask the array how big it is (UBound()) whenever you feel like it. Either way works just fine though. It is also important to know that the first time that you try to ReDim the array, using UBound() will only work if you first created the array the way that I did in the sample (Dim arrFiles : arrFiles = Array()). When you create the array with that method, then the array has a UBound() of -1 (zero buckets). So the first time that you ReDim it to UBound() +1, the new UBound is 0 (one bucket). If instead you create the array by:
Dim arrFiles()
Then the UBound() of the array is actually undefined.
Lastly I want to touch on array iteration. Here is how I iterated through each element of the array:
For Each strFileName In arrFiles
WScript.Echo strFileName
Next
In older versions of WSH you would have had to do it like this:
For n = 0 To UBound(arrFiles)
WScript.Echo arrFiles(n)
Next
Either method is still valid.
That is all on arrays for now, but we will revisit them shortly.
ArrayLists
You can think of ArrayLists as arrays on steroids. First, they are only available to you if .Net is installed on the machine. I can’t remember if a specific version needs to be installed or not. Here is how you would do the last example of listing the files in C:\temp using an arraylist:
Option Explicit
Dim oFSO
Dim alFiles
Dim oFile
Dim strFileName
Set oFSO = CreateObject("Scripting.Filesystemobject")
Set alFiles = CreateObject("System.Collections.ArrayList")
For Each oFile in oFSO.GetFolder("C:\Temp").Files
alFiles.Add(oFile.Name)
Next
For Each strFileName In alFiles
WScript.Echo strFileName
Next
Already our code is a little simpler since we don’t have to worry about adding buckets explicitly. The real benefit of arraylists comes from all the functionality that is built in. To find all the functionality in an arraylist, you should read the class documentation on MSDN (just google for System.Collections.ArrayList). I’ll hit some of the more handy features here.
Let’s say you wanted the arraylist to be sorted:
alFiles.Sort()
Reverse the order:
alFiles.Reverse()
Remove the 5th item in the arraylist:
alFiles.RemoveAt(4)
All in all, anytime that I am tempted to use an array I almost always reach for an arraylist instead.
Less Simple Data Structures
So far all the data structures have been pretty much un-relational. Often however it is desirable to maintain some relationship between different pieces of data.
Multi-Dimensional Arrays
Pretty impressive name. All it really means is that instead of one set of buckets, you have multiple sets. You use them pretty much the same way as single dimensional arrays, you just have to specify the index of each dimension in the array to access a specific piece of data. Let’s take out example that listed all the files in C:\temp and store not only the name but the last modified date/time:
Option Explicit
Dim arrFiles
Dim oFSO
Dim nFileCount
Dim oFile
Set oFSO = CreateObject("Scripting.FileSystemObject")
nFileCount = oFSO.GetFolder("C:\Temp").Files.Count
ReDim arrFiles(nFileCount-1,1)
nFileCount = 0
For Each oFile in oFSO.GetFolder("C:\temp").Files
arrFiles(nFileCount, 0) = oFile.Name
arrFiles(nFileCount, 1) = oFile.DateLastModified
nFileCount = nFileCount + 1
Next
For nFileCount = 0 To UBound(arrFiles, 1)
WScript.Echo arrFiles(nFileCount, 0) & " - " & arrFiles(nFileCount, 1)
Next
The important things to notice here are first the way that the array is created. In this case we had to know how many things we were putting in the array before we created it. This is because the ReDim statement only works on the last dimension of an array. Also notice that we had to use an actual For statement to iterate through the items in the array rather than a For Each statement like we did for the simple array. Now we will look at the cousin of the Multi-Dimensional Array, the Matched Arrays
Matched Arrays Matched arrays serve much the same function of muli-dimensional arrays, but they are subtly different in that they are multiple completely separate structures. Here is the same example as we did for multi-dimesional arrays done with matched arrays:
Option Explicit
Dim arrFileNames
Dim arrFileDates
Dim oFSO
Dim nFileCount
Dim oFile
Set oFSO = CreateObject("Scripting.FileSystemObject")
nFileCount = oFSO.GetFolder("C:\Temp").Files.Count
ReDim arrFileNames(nFileCount-1)
ReDim arrFileDates(nFileCount-1)
nFileCount = 0
For Each oFile in oFSO.GetFolder("C:\temp").Files
arrFileNames(nFileCount) = oFile.Name
arrFileDates(nFileCount) = oFile.DateLastModified
nFileCount = nFileCount + 1
Next
For nFileCount = 0 To UBound(arrFileNames)
WScript.Echo arrFileNames(nFileCount) & " - " & arrFileDates(nFileCount)
Next
You will notice that the code is very similar. All we are doing is storing the two pieces of data that we want for each file in two different arrays. This is a very venerable practice from long ago. That doesn’t make it bad, but it is easily abused.
Dictionaries
This is my personal go-to data structure. That may be because I cut my scripting teeth on Perl and a dictionary is essentially a hash. If you use PHP, all arrays in PHP are associative arrays (dictionaries/hashes). Here is our C:\Temp file information script done with a dictionary:
Option Explicit
Dim dicFiles
Dim oFSO
Dim strFileName
Dim oFile
Set oFSO = CreateObject("Scripting.FilesystemObject")
Set dicFiles = CreateObject("Scripting.Dictionary")
For Each oFile In oFSO.GetFolder("C:\temp").Files
dicFiles.Add oFile.Name, oFile.DateLastModified
Next
For Each strFileName in dicFiles.Keys
WScript.Echo strFileName & " - " & dicFiles(strFileName)
Next
Dictionaries are really just a list of Key-Value pairs. There are definitely some gotchas to watch out for and some situations where dictionaries are not the right answer. The first (and most frequent) gotcha is that all the keys in a dictionary must be unique. The values can be anything, it’s just the key that must be unique. The other nice thing that dictionaries allow is the ability to store and retrieve data using a key. This means that if I want to know the datelastmodified for test.vbs, I would just use this:
dicFiles(“test.vbs”)
To do that with an array I would need to loop through the array looking for the name and keep count of how many items I went through to get there then access another array to get the actual date.
There is a lot more to dictionaries, but for right now I am just trying to touch briefly on several different data structures. The dictionary is completely documented in the WSH documentation.
Disconnected ADO Recordsets ADO Recordsets are commonly used to access the data in a database. They can just as easily be built from scratch and store data that you put into them. Here is how to do our list the files in C:\temp example using a disconnected ADO recordset:
Option Explicit
Const adVARCHAR = 200
Const adOPENSTATIC = 3
Const adUSECLIENT = 3
Dim oFSO
Dim oRS
Dim oFile
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oRS = CreateObject("ADODB.RecordSet")
oRS.CursorType = adOPENSTATIC
oRs.CursorLocation = adUSECLIENT
oRS.Fields.Append "Name", adVARCHAR,255
oRS.Fields.Append "Date", adVARCHAR,255
oRS.Open
For Each oFile In oFSO.GetFolder("C:\temp").Files
oRS.AddNew Array("Name", "Date"), Array(oFile.Name, oFile.DateLastModified)
Next
oRs.MoveFirst
Do While Not oRS.EOF
WScript.Echo oRS("Name") & " - " & oRS("Date")
oRS.MoveNext
Loop
oRS.Close
I’m not going to try to explain all the intricacies of ADO recordsets here. It is a very powerful technology and there are a lot of good sites and books that would do a much better job at it than I would. Essentially you create a recordset, then add the fields that you want each record in the set to hold, then you addrecords populating the fields that you specified. With a recordset there are all kinds of neat things that you can do. I suggest that you go learn them. I generally do not use recordsets just because the set up is more effort than I want to put into a data structure.
Holding XML Data in Memory
XML provides a very structured way to hold data in memory. To use the bucket analogy, it would be as many named buckets as you want, each capable of hold as many more named buckets as you want and each bucket having data potentially written on the side of it instead of put in it. Yes the bucket analogy does fall apart here. Oh well it was good while it lasted. Here is the code to do our little file example:
Option Explicit
Dim oFSO
Dim arrFiles : arrFiles = Array()
Dim oFile
Dim strFileName
Dim oXML
Dim oRoot
Dim oFileNode
Dim oNameNode
Dim oSizeNode
Set oXML = CreateObject("Microsoft.XMLDOM")
Set oRoot = oXML.CreateNode("element", "Files", "")
oXML.AppendChild(oRoot)
Set oFSO = CreateObject("Scripting.Filesystemobject")
For Each oFile in oFSO.GetFolder("C:\Temp").Files
Set oFileNode = oXML.CreateNode("element", "File", "")
Set oNameNode = oXML.CreateNode("element", "Name", "")
Set oSizeNode = oXML.CreateNode("element", "Size", "")
oNameNode.Text = oFile.Name
oSizeNode.Text = oFile.Size
oFileNode.AppendChild(oNameNode)
oFileNode.AppendChild(oSizeNode)
oRoot.AppendChild(oFileNode)
Next
For Each oFileNode In oRoot.SelectNodes("File")
WScript.Echo oFileNode.SelectSingleNode("Name").Text & " - " & oFileNode.SelectSingleNode("Size").Text
'WScript.Echo oFileNode.Text
Next
One of the great powers of XMl is that it comes with it's own query language (XPath) which can be quite powerful. Let's say that you wanted to know the names of any files that had 'WMI' anywhere in their name, but only if the file size was over 200 bytes:
'Show the name of any file that has WMI in the name and the size is greater the 2000 bytes.
oXML.setProperty "SelectionLanguage", "XPath"
For Each oFileNode in oRoot.SelectNodes("File[contains(Name, 'WMI')][Size > 2000]")
WScript.Echo oFileNode.SelectSingleNode("Name").Text
Next
Another powerful advantage of XML is that if you want to have data persistence, it is trivial to save your XML out to a file then load it to work with again later.
Complex Data Structures The way to really complex data structures is to combine as many of the above structures as you need to hold the data in a fashion that makes sense to you. Again, having my roots in Perl makes me prone to creating data structures that are sometimes more complex than they need to be. For instance, let’s say for our file listing example, we wanted the Name, LastModifiedDate, and the Size of each file. One way to do that would be a Dictionary of Dictionaries:
Option Explicit
Dim dicFiles
Dim oFSO
Dim strFileName
Dim oFile
Dim dicTemp
Set oFSO = CreateObject("Scripting.FilesystemObject")
Set dicFiles = CreateObject("Scripting.Dictionary")
For Each oFile In oFSO.GetFolder("C:\temp").Files
Set dicTemp = CreateObject("Scripting.Dictionary")
dicTemp.Add "LastModified", oFile.DateLastModified
dicTemp.Add "Size", oFile.Size
dicFiles.Add oFile.Name, dicTemp
Next
For Each strFileName in dicFiles.Keys
WScript.Echo strFileName & " - " & dicFiles(strFileName)("LastModified") & " - " & dicFiles(strFileName)("Size")
Next
In this example, we created a temporary dictionary with the LastModified date and Size then we added this temp dictionary to the main dictionary with the filename as the key.
It should be noted that for all of the data structures that we have seen so far, pretty much anything can be used for the values. This brings us to how I would really list the files in C:\temp if I had to write that code. The way that I would most likely do it is with a dictionary of file objects:
Option Explicit
Dim dicFiles
Dim oFSO
Dim strFileName
Dim oFile
Set oFSO = CreateObject("Scripting.FilesystemObject")
Set dicFiles = CreateObject("Scripting.Dictionary")
For Each oFile In oFSO.GetFolder("C:\temp").Files
dicFiles.Add oFile.Name, oFile
Next
For Each strFileName in dicFiles.Keys
WScript.Echo strFileName & " - " & dicFiles(strFileName).DateLastModified & " - " & dicFiles(strFileName).Size
Next
Adding the entire file object as the value portion of the dictionary allows me to access any of the properties (or methods) of the file object later on in the code at ease.
Synapses
So to wrap things up, here is the bucket analogy applied to each of the structures that I have discussed:
Plain Variable – This is one bucket that holds one thing.
Array – A collection of numbered buckets from 0 to x.
ArrayList – A collection of numbered buckets from 0 to x where the collection can do fancy things like sort by the contents of the buckets.
Multi-Dimensional Arrays – This is like a collection of numbered buckets where each one has inside of it another collection of numbered buckets.
Matched Arrays – This is two or more collections of buckets where in each collection the nth items are all related in some fashion.
Dictionary – This is a collection of buckets where the buckets all have unique names and can be accessed via these names.
Disconnected ADO Recordsets – This is a collection of shelves. On each shelf are one or more named buckets.
I know that there are probably better ways to explain each of these structures and I will be happy to try to explain any questions that this post brought up.
EDIT: I just realized that I completely left out holding XML data in memory. This is a valuable tool since it comes with a built in query language. When I get a chance I'll add it. Not sue how I'll express the bucket analogy though. :) EDIT: Added the XML section.
<message edited by ebgreen on Sunday, February 14, 2010 6:08 AM>