In Memory Data Structures

Author Message
ebgreen

  • Total Posts : 8227
  • Scores: 98
  • Reward points : 0
  • Joined: 7/12/2005
  • Status: offline
In Memory Data Structures Friday, February 05, 2010 7:42 AM (permalink)
2
This will be my take on in memory data structures. I will be up front that a lot of it is really just my opinion. Regardless, all the code has been tested and should at least give you some starting points. The purpose of data structures is to take data and store it in memory in a fashion that makes it easy to access and manipulate the data at need.
 
Simple Data Structures
 
Plain variable
The simplest data structure is a simple variable. Here is an example:
 
Option Explicit
Dim strFoo
strFoo = “FOO”
WScript.Echo strFoo
 
Here we have stored data (the string FOO) in a simple variable and we can retrieve or modify this data at will.
 
Arrays
Typing a new variable every time you need to store a little piece of data can get old. Easier is to put all the data into a structure where it is simple to access each piece. Here is a simple example:
 
Option Explicit
Dim arrFoo
arrFoo = Array(“FOO”, “BAR”, “FOOBAR”)
WScript.Echo arrFoo(0)
 
I’m not going to go into the way data structures are stored and accessed by the script engine in memory, but if you take the time to look it up it will make things much more clear. One way to think of an array is a series of buckets. We created a single variable (arrFoo) but this time instead of putting a single value into the variable, we put a series of 3 buckets in it. Notice that VBScript (like most languages) starts the numbering of the buckets at 0 instead of one. This is the cause of most problems for people when they first start working with arrays. To access one of the buckets in the array, you just use the variable name and tell it the bucket that you want. So to get the contents of the second bucket, we would use arrFoo(1).
While this example saves us a little typing by not having to type new names for a bunch of variables, the real benefit of arrays (and pretty much the rest of the data structures) is the ability to build them at run time without having to know what the contents will be ahead of time. Let’s say for instance that you wanted the user to type in a sentence and you wanted to tell them the third word in the sentence (don’t ask me why you want to do this, but you do just trust me).
 
Option Explicit
Dim arrFoo
Dim strFoo
strFoo = InputBox("Please type a sentence")
arrFoo = Split(strFoo, " ")
WScript.Echo arrFoo(2)
 
First, notice that this works completely without any knowledge ahead of time of what the user is going to type in. This is just one example of creating an array without having any prior knowledge of the contents or more importantly the size (number of buckets) of the array. We will see a different method of building arrays in a moment. Second, it is important to talk about potential errors here. In this example, if the user had have typed just two words, you would have gotten the dreaded Subscript Out Of Range error. This is the evil side of building arrays without having concrete knowledge of what is going into them. You have to make some assumptions in your code. In this case we assumed that the sentence entered would have at least 3 words. Unfortunately assumptions have a habit of being wrong.
So, before we go any further, lets introduce you to your new best array friend. The UBound() function. This function will tell you the largest available subscript (index) for an array. We can use this information to put in a little user-proofing:
 
Option Explicit
Dim arrFoo
Dim strFoo
strFoo = InputBox("Please type a sentence")
arrFoo = Split(strFoo, " ")
If UBound(arrFoo) >= 2 Then
 WScript.Echo arrFoo(2)
End If
 
Now we are making sure that there are at least 3 buckets before we try to get the contents of the third bucket.
So far we have created an array from known elements and we have used a function that returns an array. Both of these are fairly simple. Now we will look at a more common way to build arrays. We will look at this both so you can understand how to do it and so that you will understand what makes other data structures easier later. Arrays in VBScript suffer from having a static length. That is when  the array is created it is created with a specific number of buckets. If you want to put more data into the array it can be done, but you need to explicitly add a bucket to the array. Let’s say that you wanted to put the name of every file in C:\temp into an array. Your choices would be to count how many files are there then create an array with that many buckets; or add buckets as you list the names. We will focus on the second method since if you understand that, you will understand the first method too.
 
Option Explicit
Dim oFSO
Dim arrFiles : arrFiles = Array()
Dim oFile
Dim strFileName
Set oFSO = CreateObject("Scripting.Filesystemobject")
For Each oFile in oFSO.GetFolder("C:\Temp").Files
 ReDim Preserve arrFiles(UBound(arrFiles)+1)
 arrFiles(UBound(arrFiles)) = oFile.Name
Next
For Each strFileName In arrFiles
 WScript.Echo strFileName
Next
 
There is quite a bit of new stuff to talk about here. First, this:
 
Dim arrFiles : arrFiles = Array()
 
This is creating a variable named arrFiles then it is telling the script engine to assign an empty array to it. You can get a similar result from:
 
Dim arrFiles()
 
But not exactly the same thing. For instance if you use the second method then try to UBound(arrFiles), you will get an error. The way that I do it in the sample though, avoids this error. This is important for when we get ready to add buckets to the array. Speaking of which, the next thing new in the sample is:
ReDim Preserve arrFiles(UBound(arrFiles)+1)
This is the add a bucket step. The ReDim command changes the number of buckets in an array. The Preserve keyword tells ReDim to not empty the buckets as it adds the new one. If you leave that keyword off when you ReDim, you will get an array of the length that you asked for, but all the elements in the array will be empty. The last part of the line is simply saying “Look at the size of the array right now and make it one bigger than that”. It is often common to see code where the author uses an increment variable (n, I, j, k, etc.) to keep track of what size the array is so that they can just add one to make the array larger. I always found this silly since you can simply ask the array how big it is (UBound()) whenever you feel like it. Either way works just fine though. It is also important to know that the first time that you try to ReDim the array, using UBound() will only work if you first created the array the way that I did in the sample (Dim arrFiles : arrFiles = Array()). When you create the array with that method, then the array has a UBound() of -1 (zero buckets). So the first time that you ReDim it to UBound() +1, the new UBound is 0 (one bucket). If instead you create the array by:
 
Dim arrFiles()
 
Then the UBound() of the array is actually undefined.
Lastly I want to touch on array iteration. Here is how I iterated through each element of the array:
 
For Each strFileName In arrFiles
 WScript.Echo strFileName
Next
 
In older versions of WSH you would have had to do it like this:
 
For n = 0 To UBound(arrFiles)
 WScript.Echo arrFiles(n)
Next
 
Either method is still valid.
That is all on arrays for now, but we will revisit them shortly.
 
ArrayLists
You can think of ArrayLists as arrays on steroids. First, they are only available to you if .Net is installed on the machine. I can’t remember if a specific version needs to be installed or not. Here is how you would do the last example of listing the files in C:\temp using an arraylist:
 
Option Explicit
Dim oFSO
Dim alFiles
Dim oFile
Dim strFileName
Set oFSO = CreateObject("Scripting.Filesystemobject")
Set alFiles = CreateObject("System.Collections.ArrayList")
For Each oFile in oFSO.GetFolder("C:\Temp").Files
 alFiles.Add(oFile.Name)
Next
For Each strFileName In alFiles
 WScript.Echo strFileName
Next
 
Already our code is a little simpler since we don’t have to worry about adding buckets explicitly. The real benefit of arraylists comes from all the functionality that is built in. To find all the functionality in an arraylist, you should read the class documentation on MSDN (just google for System.Collections.ArrayList). I’ll hit some of the more handy features here.
Let’s say you wanted the arraylist to be sorted:
 
alFiles.Sort()
 
Reverse the order:
 
alFiles.Reverse()
 
Remove the 5th item in the arraylist:
 
alFiles.RemoveAt(4)
 
All in all, anytime that I am tempted to use an array I almost always reach for an arraylist instead.
 
Less Simple Data Structures
So far all the data structures have been pretty much un-relational. Often however it is desirable to maintain some relationship between different pieces of data.
 
Multi-Dimensional Arrays
Pretty impressive name. All it really means is that instead of one set of buckets, you have multiple sets. You use them pretty much the same way as single dimensional arrays, you just have to specify the index of each dimension in the array to access a specific piece of data. Let’s take out example that listed all the files in C:\temp and store not only the name but the last modified date/time:
 
Option Explicit
Dim arrFiles
Dim oFSO
Dim nFileCount
Dim oFile
Set oFSO = CreateObject("Scripting.FileSystemObject")
nFileCount = oFSO.GetFolder("C:\Temp").Files.Count
ReDim arrFiles(nFileCount-1,1)
nFileCount = 0
For Each oFile in oFSO.GetFolder("C:\temp").Files
 arrFiles(nFileCount, 0) = oFile.Name
 arrFiles(nFileCount, 1) = oFile.DateLastModified
 nFileCount = nFileCount + 1
Next
For nFileCount = 0 To UBound(arrFiles, 1)
 WScript.Echo arrFiles(nFileCount, 0) & " - " & arrFiles(nFileCount, 1)
Next
 
The important things to notice here are first the way that the array is created. In this case we had to know how many things we were putting in the array before we created it. This is because the ReDim statement only works on the last dimension of an array. Also notice that we had to use an actual For statement to iterate through the items in the array rather than a For Each statement like we did for the simple array. Now we will look at the cousin of the Multi-Dimensional Array, the Matched Arrays
 
Matched Arrays
Matched arrays serve much the same function of muli-dimensional arrays, but they are subtly different in that they are multiple completely separate structures. Here is the same example as we did for multi-dimesional arrays done with matched arrays:
 
Option Explicit
Dim arrFileNames
Dim arrFileDates
Dim oFSO
Dim nFileCount
Dim oFile
Set oFSO = CreateObject("Scripting.FileSystemObject")
nFileCount = oFSO.GetFolder("C:\Temp").Files.Count
ReDim arrFileNames(nFileCount-1)
ReDim arrFileDates(nFileCount-1)
nFileCount = 0
For Each oFile in oFSO.GetFolder("C:\temp").Files
 arrFileNames(nFileCount) = oFile.Name
 arrFileDates(nFileCount) = oFile.DateLastModified
 nFileCount = nFileCount + 1
Next
For nFileCount = 0 To UBound(arrFileNames)
 WScript.Echo arrFileNames(nFileCount) & " - " & arrFileDates(nFileCount)
Next
 
You will notice that the code is very similar. All we are doing is storing the two pieces of data that we want for each file in two different arrays. This is a very venerable practice from long ago. That doesn’t make it bad, but it is easily abused.
 
Dictionaries
This is my personal go-to data structure. That may be because I cut my scripting teeth on Perl and a dictionary is essentially a hash. If you use PHP, all arrays in PHP are associative arrays (dictionaries/hashes). Here is our C:\Temp file information script done with a dictionary:
 
Option Explicit
Dim dicFiles
Dim oFSO
Dim strFileName
Dim oFile
Set oFSO = CreateObject("Scripting.FilesystemObject")
Set dicFiles = CreateObject("Scripting.Dictionary")
For Each oFile In oFSO.GetFolder("C:\temp").Files
 dicFiles.Add oFile.Name, oFile.DateLastModified
Next
For Each strFileName in dicFiles.Keys
 WScript.Echo strFileName & " - " & dicFiles(strFileName)
Next
 
Dictionaries are really just a list of Key-Value pairs. There are definitely some gotchas to watch out for and some situations where dictionaries are not the right answer. The first (and most frequent) gotcha is that all the keys in a dictionary must be unique. The values can be anything, it’s just the key that must be unique. The other nice thing that dictionaries allow is the ability to store and retrieve data using a key. This means that if I want to know the datelastmodified for test.vbs, I would just use this:
dicFiles(“test.vbs”)
To do that with an array I would need to loop through the array looking for the name and keep count of how many items I went through to get there then access another array to get the actual date.
There is a lot more to dictionaries, but for right now I am just trying to touch briefly on several different data structures. The dictionary is completely documented in the WSH documentation.
 
Disconnected ADO Recordsets
ADO Recordsets are commonly used to access the data in a database. They can just as easily be built from scratch and store data that you put into them. Here is how to do our list the files in C:\temp example using a disconnected ADO recordset:
 
Option Explicit
Const adVARCHAR = 200
Const adOPENSTATIC = 3
Const adUSECLIENT = 3
Dim oFSO
Dim oRS
Dim oFile
Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oRS = CreateObject("ADODB.RecordSet")
oRS.CursorType = adOPENSTATIC
oRs.CursorLocation = adUSECLIENT
oRS.Fields.Append "Name", adVARCHAR,255
oRS.Fields.Append "Date", adVARCHAR,255
oRS.Open
For Each oFile In oFSO.GetFolder("C:\temp").Files
 oRS.AddNew Array("Name", "Date"), Array(oFile.Name, oFile.DateLastModified)
Next
oRs.MoveFirst
Do While Not oRS.EOF
 WScript.Echo oRS("Name") & " - " & oRS("Date")
 oRS.MoveNext
Loop
oRS.Close
 
I’m not going to try to explain all the intricacies of ADO recordsets here. It is a very powerful technology and there are a lot of good sites and books that would do a much better job at it than I would. Essentially you create a recordset, then add the fields that you want each record in the set to hold, then you addrecords populating the fields that you specified. With a recordset there are all kinds of neat things that you can do. I suggest that you go learn them. I generally do not use recordsets just because the set up is more effort than I want to put into a data structure.

Holding XML Data in Memory
XML provides a very structured way to hold data in memory. To use the bucket analogy, it would be as many named buckets as you want, each capable of hold as many more named buckets as you want and each bucket having data potentially written on the side of it instead of put in it. Yes the bucket analogy does fall apart here. Oh well it was good while it lasted. Here is the code to do our little file example:

Option Explicit
Dim oFSO
Dim arrFiles : arrFiles = Array()
Dim oFile
Dim strFileName
Dim oXML
Dim oRoot
Dim oFileNode
Dim oNameNode
Dim oSizeNode

Set oXML = CreateObject("Microsoft.XMLDOM")
Set oRoot = oXML.CreateNode("element", "Files", "")
oXML.AppendChild(oRoot)


Set oFSO = CreateObject("Scripting.Filesystemobject")
For Each oFile in oFSO.GetFolder("C:\Temp").Files
    Set oFileNode = oXML.CreateNode("element", "File", "")
    Set oNameNode = oXML.CreateNode("element", "Name", "")
    Set oSizeNode = oXML.CreateNode("element", "Size", "")
    oNameNode.Text = oFile.Name
    oSizeNode.Text = oFile.Size
    oFileNode.AppendChild(oNameNode)
    oFileNode.AppendChild(oSizeNode)
    oRoot.AppendChild(oFileNode)
Next

For Each oFileNode In oRoot.SelectNodes("File")
    WScript.Echo oFileNode.SelectSingleNode("Name").Text & " - " & oFileNode.SelectSingleNode("Size").Text
    'WScript.Echo oFileNode.Text
Next

One of the great powers of XMl is that it comes with it's own query language (XPath) which can be quite powerful. Let's say that you wanted to know the names of any files that had 'WMI' anywhere in their name, but only if the file size was over 200 bytes:

'Show the name of any file that has WMI in the name and the size is greater the 2000 bytes.
oXML.setProperty "SelectionLanguage", "XPath"
For Each oFileNode in oRoot.SelectNodes("File[contains(Name, 'WMI')][Size > 2000]")
    WScript.Echo oFileNode.SelectSingleNode("Name").Text
Next

Another powerful advantage of XML is that if you want to have data persistence, it is trivial to save your XML out to a file then load it to work with again later.

 
Complex Data Structures
The way to really complex data structures is to combine as many of the above structures as you need to hold the data in a fashion that makes sense to you. Again, having my roots in Perl makes me prone to creating data structures that are sometimes more complex than they need to be. For instance, let’s say for our file listing example, we wanted the Name, LastModifiedDate, and the Size of each file. One way to do that would be a Dictionary of Dictionaries:
Option Explicit
Dim dicFiles
Dim oFSO
Dim strFileName
Dim oFile
Dim dicTemp
Set oFSO = CreateObject("Scripting.FilesystemObject")
Set dicFiles = CreateObject("Scripting.Dictionary")
For Each oFile In oFSO.GetFolder("C:\temp").Files
 Set dicTemp = CreateObject("Scripting.Dictionary")
 dicTemp.Add "LastModified", oFile.DateLastModified
 dicTemp.Add "Size", oFile.Size
 dicFiles.Add oFile.Name, dicTemp
Next
For Each strFileName in dicFiles.Keys
 WScript.Echo strFileName & " - " & dicFiles(strFileName)("LastModified") & " - " & dicFiles(strFileName)("Size")
Next
 
In this example, we created a temporary dictionary with the LastModified date and Size then we added this temp dictionary to the main dictionary with the filename as the key.
It should be noted that for all of the data structures that we have seen so far, pretty much anything can be used for the values. This brings us to how I would really list the files in C:\temp if I had to write that code. The way that I would most likely do it is with a dictionary of file objects:
 
Option Explicit
Dim dicFiles
Dim oFSO
Dim strFileName
Dim oFile
Set oFSO = CreateObject("Scripting.FilesystemObject")
Set dicFiles = CreateObject("Scripting.Dictionary")
For Each oFile In oFSO.GetFolder("C:\temp").Files
 dicFiles.Add oFile.Name, oFile
Next
For Each strFileName in dicFiles.Keys
 WScript.Echo strFileName & " - " & dicFiles(strFileName).DateLastModified & " - " & dicFiles(strFileName).Size
Next
 
Adding the entire file object as the value portion of the dictionary allows me to access any of the properties (or methods) of the file object later on in the code at ease.
 
Synapses
So to wrap things up, here is the bucket analogy applied to each of the structures that I have discussed:
 
Plain Variable – This is one bucket that holds one thing.
Array – A collection of numbered buckets from 0 to x.
ArrayList – A collection of numbered buckets from 0 to x where the collection can do fancy things like sort by the contents of the buckets.
Multi-Dimensional Arrays – This is like a collection of numbered buckets where each one has inside of it another collection of numbered buckets.
Matched Arrays – This is two or more collections of buckets where in each collection the nth items are all related in some fashion.
Dictionary – This is a collection of buckets where the buckets all have unique names and can be accessed via these names.
Disconnected ADO Recordsets – This is a collection of shelves. On each shelf are one or more named buckets.
I know that there are probably better ways to explain each of these structures and I will be happy to try to explain any questions that this post brought up.


EDIT: I just realized that I completely left out holding XML data in memory. This is a valuable tool since it comes with a built in query language. When I get a chance I'll add it. Not sue how I'll express the bucket analogy though. :)

EDIT: Added the XML section.
<message edited by ebgreen on Sunday, February 14, 2010 6:08 AM>
"... when you are good and crazy, oooh, oooh, oooh, the sky is the limit!" - The Tick
Goog places to start:http://www.visualbasicscript.com/m_24727/tm.htm
http://www.visualbasicscript.com/m_47117/tm.htm
 
#1

    Online Bookmarks Sharing: Share/Bookmark

    Jump to:

    Current active users

    There are 0 members and 1 guests.

    Icon Legend and Permission

    • New Messages
    • No New Messages
    • Hot Topic w/ New Messages
    • Hot Topic w/o New Messages
    • Locked w/ New Messages
    • Locked w/o New Messages
    • Read Message
    • Post New Thread
    • Reply to message
    • Post New Poll
    • Submit Vote
    • Post reward post
    • Delete my own posts
    • Delete my own threads
    • Rate post

    2000-2012 ASPPlayground.NET Forum Version 3.9