Fastest way to get a sorted directory to an array

Author Message
AMBience

  • Total Posts : 31
  • Scores: 0
  • Reward points : 0
  • Joined: 7/24/2008
  • Status: offline
Fastest way to get a sorted directory to an array Friday, December 09, 2011 12:54 PM (permalink)
0
This sub uses the good old DOS DIR command to populate a string array with folders and filenames. This may sound a bit crap, but it's much,much faster than the file system object and has the added benefit of DIR's powerful multi-sorting, attribute filters and filename wildcards (as well as a few of my own)

As you may know the FSO files collection uses the same kind of sort order that DOS does (later created files & folders are always last) so you have to sort the files yourself. Using the FSO to populate an array with say "C:\Windows\System32" (a very good test folder that has 2000+ files in it) and then sorting it is very slow in VBS...about 8 seconds on my PC. This sub takes 0.4 seconds to do exactly the same thing. It works very simply like this:-

Get your temp folder,
Create a BAT file there, this DIRs your requested folder with > redirection to a text file,
Run the BAT hidden and wait for it to finish,
Load the large text file it created with the FSO and grab filenames to an array,
Delete temp files.

If you comment out the delete temp files at the end and look at \TEMP\DirToArray.txt you'll see there's info that I've skipped grabbing (all the volume stuff, file sizes, dates & times) so the sub is open for expansion.

Have fun!

 Option Explicit

'------------------ Demo ------------------

Dim strDemo()

DirToArray "C:\", strDemo, True, 3, "*.*", "GEN", ""  
 
msgbox strDemo(0), 64, "First item"
msgbox strDemo(UBound(strDemo)), 64, "Last item"


'--------------- Dir To Array ---------------

'*****************
'By Alan Bond 2011
'*****************

'strDir = The folder you want to read
'strArray = The string array you want to populate
'blnFullPath = Add the full path to each file & folder (boolean)
'intFilesFolders = Filter types, 1=just files, 2=just folders, 3=both files and folders
'strWildcard = Wildcard for file types. *.* is default
'strSort = Sort order, default is "GEN" [G]roup folders 1st, sort by [E]xtension then by [N]ame
'strAttrib = Filter by attributes, default is show all types (empty "")

'Use minus before a letter to omit attribute filters or reverse the sort order:-
'strSort = "-N" (show reverse sorted names)
'strAttrib = "H" (show only hidden files\folders)
'strAttrib = "-H" (show everything except hidden files\folders)
'strAttrib = "-H-S" (show everything except hidden AND system files\folders, the default Windows)

'For full descriptions of sort order (/O) & attributes (/A) type DIR/? in a DOSBOX

Sub DirToArray(strDir, strArray, blnFullPath, intFilesFolders, strWildcard, strSort, strAttrib)
 Dim objWSS, objFSO
 Dim strTempDir, objFile, strLineRead, strName, intItem, blnSkipNext
 
 '---- FileSystem to create a BAT (and read it's output), and Shell to run the BAT
 Set objFSO = CreateObject("Scripting.FileSystemObject")
 Set objWSS = CreateObject("WScript.Shell")
 
 '---- Path doesn't exist (silent error)
 If Not objFSO.FolderExists(strDir) Then
 Redim strArray(0) : strArray(0) = "*PATH-ERROR*"
 Exit Sub
 End If
 
 '---- Add slash to requested dir if not found
 If Right(strDir, 1) <> "\" Then strDir = strDir & "\"

 '---- Get your temp folder
 strTempDir = objFSO.GetSpecialFolder(2) & "\"
 
 '---- Make a BAT file in temp that has DIR and redirection to a text file
 Set objFile = objFSO.CreateTextFile(strTempDir & "DirToArray.BAT", True) 
 objFile.Write "DIR " & Chr(34) & strDir & strWildcard & Chr(34) & " /O" & strSort & _
 " /A" & strAttrib & " /-C > " & Chr(34) & strTempDir & "DirToArray.txt" & Chr(34)
 objFile.Close
 
 '---- Run the BAT hidden and wait until it's finished
 objWSS.Run strTempDir & "DirToArray.BAT", 16, True
 
 '---- Open the DOS output for reading
 intItem = 0
 Set objFile = objFSO.OpenTextFile(strTempDir & "DirToArray.TXT", 1) 
 While Not objFile.AtEndOfStream 
 strLineRead = objFile.ReadLine    
 
 '---- 1st two digits of the date on the left is used to determine if it's a file\folder
 If IsNumeric(Left(strLineRead,2)) Then

 '---- Folders & file names are char 37 to the end of the line
 strName = Mid(strLineRead, 37, Len(strLineRead)-36)
 
 '---- Ignore "." and ".." (which unfortunately have dates) and process filenames
 If Left(strName,1) <> "." Then
 blnSkipNext = False
 
 '---- If item is a folder then add a slash to the end (this is the best way to determine 
 'a file that has no extension from a folder without using more arrays,variables etc)
 If Mid(strLineRead, 22, 1) = "<" Then 
 strName = strName & "\"
 If intFilesFolders = 1 Then blnSkipNext = True 'You don't want folders
 Else
 If intFilesFolders = 2 Then blnSkipNext = True 'You don't want files
 End If
 
 '---- If not skipping a file\folder
 If blnSkipNext = False Then    
 
 '---- If you want a full path
 If blnFullPath Then strName = strDir & strName
 
 '---- Redim, set and go to next array item
 Redim Preserve strArray(intItem)
 strArray(intItem) = strName : intItem = intItem + 1
 End If
 End If
 End If
 Wend
 objFile.Close
 
 '---- Your wildcard, filter or empty folder has no items at all (not an error, but stops UBound problems)
 If intItem = 0 Then Redim strArray(0) : strArray(0) = "*EMPTY*"
 
 '---- Delete temp files (comment these out if you wish to inspect them)
 objFSO.DeleteFile strTempDir & "DirToArray.TXT"
 objFSO.DeleteFile strTempDir & "DirToArray.BAT"
End Sub 

 
#1
    59cobalt

    • Total Posts : 969
    • Scores: 91
    • Reward points : 0
    • Joined: 7/17/2011
    • Status: offline
    Re:Fastest way to get a sorted directory to an array Saturday, December 10, 2011 1:52 AM (permalink)
    0
    AMBience
    As you may know the FSO files collection uses the same kind of sort order that DOS does (later created files & folders are always last) so you have to sort the files yourself. Using the FSO to populate an array with say "C:\Windows\System32" (a very good test folder that has 2000+ files in it) and then sorting it is very slow in VBS...about 8 seconds on my PC. This sub takes 0.4 seconds to do exactly the same thing.
    The quick draft below takes about 2 seconds for sorting by (access|modification|creation) time, and about 5 seconds for sorting by name. Although it obviously does take somewhat longer than just leaving the sorting to "dir", the script below is certainly a lot better to maintain than a script that would read the output from text files created by on-the-fly generated batch scripts. I strongly advise against using that kind of approach just to save a couple seconds sorting time. If your time constraints are that narrow, any scripting language would be the wrong choice anyway.
    Const LAST_ACCESSED = &h01
    Const LAST_MODIFIED = &h02
    Const CREATED       = &h04
    Const FILENAME            = &h08
    
    Sub QuickSort(arr, left, right, prop)
     Dim pivot, leftIndex, rightIndex, buffer
    
     leftIndex = left
     rightIndex = right
    
     If right - left > 0 Then
     pivot = Int((left + right) / 2)
    
     While leftIndex <= pivot And rightIndex >= pivot
     On Error Resume Next
     While GetProp(arr(leftIndex), prop) < GetProp(arr(pivot), prop) And leftIndex <= pivot
     leftIndex = leftIndex + 1
     Wend
     While GetProp(arr(rightIndex), prop) > GetProp(arr(pivot), prop) And rightIndex >= pivot
     rightIndex = rightIndex - 1
     Wend
    
     Set buffer = arr(leftIndex)
     Set arr(leftIndex) = arr(rightIndex)
     Set arr(rightIndex) = buffer
    
     leftIndex = leftIndex + 1
     rightIndex = rightIndex - 1
     If leftIndex - 1 = pivot Then
     rightIndex = rightIndex + 1
     pivot = rightIndex
     ElseIf rightIndex + 1 = pivot Then
     leftIndex = leftIndex - 1
     pivot = leftIndex
     End If
     Wend
    
     QuickSort arr, left, pivot-1, prop
     QuickSort arr, pivot+1, right, prop
     End If
    End Sub
    
    Function GetProp(f, prop)
     Select Case prop
     Case CREATED
     GetProp = f.DateCreated
     Case LAST_ACCESSED
     GetProp = f.DateLastAccessed
     Case LAST_MODIFIED
     GetProp = f.DateLastModified
     Case Else
     GetProp = f.Name
     End Select
    End Function
    
    WScript.StdErr.WriteLine Time
    
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set fldr = fso.GetFolder("C:\WINNT\system32")
    
    arr = Array()
    
    For Each f In fldr.Files
     ReDim Preserve arr(UBound(arr)+1)
     Set arr(UBound(arr)) = f
    Next
    
    QuickSort arr, 0, UBound(arr), FILENAME
    
    For i = 0 To UBound(arr)
     WScript.Echo arr(i).Name
    Next
    
    WScript.StdErr.WriteLine Time

     
    #2
      ehvbs

      • Total Posts : 3320
      • Scores: 112
      • Reward points : 0
      • Joined: 6/22/2005
      • Location: Germany
      • Status: offline
      Re:Fastest way to get a sorted directory to an array Saturday, December 10, 2011 10:41 AM (permalink)
      0
      Speed is just one feature that make the "shell out to dir" approach attractive;
      depending on your task and the context, filtering by wildcards, ordering, or
      even the recursive traversal (/s) may make your task easier.

      If you (can) get rid of the .bat files and the array, the code is easy and
      maintainable - e.g.:

       Dim sDir  : sDir   = "C:\WINDOWS\system32"
       Dim sPat  : sPat   = "\x*.dll"
       Dim sOrd  : sOrd   = "/o:n"
       Dim sCmd  : sCmd   = "%comspec% /c dir /b " & sOrd & " """ & sDir & sPat & """"
       WScript.Echo sCmd
       Dim oExec : Set oExec = CreateObject("WScript.Shell").Exec(sCmd)
      
       Do Until oExec.Status = cnWshFinished
       If Not oExec.StdOut.AtEndOfStream Then
       WScript.Echo oExec.StdOut.ReadLine()
       End If
       Loop
       Do Until oExec.StdOut.AtEndOfStream
       WScript.Echo oExec.StdOut.ReadLine()
       Loop
      


      Instead of an array, you could use a disconnected recordset:

      Sub AddFileTo(oRS, oFile)
       oRS.AddNew
       oRS.Fields( "sPath"  ).Value = oFile.Path
       oRS.Fields( "sName"  ).Value = oFile.Name
       oRS.Fields( "lSize"  ).Value = oFile.Size
       oRS.Fields( "dtLMod" ).Value = oFile.DateLastModified
       oRS.Update
      End Sub
      
      Dim sDir : sDir    = "C:\WINDOWS\system32"
      Dim oRS  : Set oRS = CreateObject("ADODB.RecordSet")
      
      oRS.Fields.Append "sPath" , adVarWChar, 255
      oRS.Fields.Append "sName" , adVarWChar, 255
      oRS.Fields.Append "lSize" , adInteger
      oRS.Fields.Append "dtLMod", adDate
      
      oRS.Open
      
      Dim oFile
      For Each oFile In goFS.GetFolder(sDir).Files
       AddFileTo oRS, oFile
      Next
      
      oRS.Sort = "dtLMod" ' "lSize" ' "sName"
      
      oRS.MoveFirst
      Do Until oRS.EOF
       WScript.Echo oRS.GetString( adClipString, 1, vbCrLf, vbCrLf, "<Null>")
      Loop
      


      That's especially attractive, if you need the data in a collection that
      can be processed/displayed in different orders. Another sample, this time
      using dir /s:

       Dim sDir  : sDir      = "C:\Perl"
       Dim sPat  : sPat      = "" ' "\*.xls" ' "\x*.dll"
       Dim sCmd  : sCmd      = "%comspec% /c dir /a:-d /-c /s /t:a """ & sDir & sPat & """"
       WScript.Echo sCmd
       Dim reCut : Set reCut = New cRE.initPF(_
       "^(?:\sDirectory\sof\s(.*?))\s*$|^(\d\d).(\d\d).(\d{4})\s+(\d\d):(\d\d)\s+(\d+)\s+(.*?)\s*$", "gimx" _
       )
       Dim oMTS, sStaticPath
       Dim oExec : Set oExec = CreateObject("WScript.Shell").Exec(sCmd)
       Dim oRS   : Set oRS   = CreateObject("ADODB.RecordSet")
      
       oRS.Fields.Append "sPath" , adVarWChar, 255
       oRS.Fields.Append "sName" , adVarWChar, 255
       oRS.Fields.Append "lSize" , adInteger
       oRS.Fields.Append "dtLMod", adDate
      
       oRS.Open
      
       Do Until oExec.Status = cnWshFinished
       If Not oExec.StdOut.AtEndOfStream Then
       AddMtsTo oRS, reCut.Execute(oExec.StdOut.ReadLine()), sStaticPath
       End If
       Loop
       Do Until oExec.StdOut.AtEndOfStream
       AddMtsTo oRS, reCut.Execute(oExec.StdOut.ReadLine()), sStaticPath
       Loop
      
       If Not oRS.EOF Then
       oRS.Sort = "dtLMod" ' "lSize" ' "sName"
       oRS.MoveFirst
       Do Until oRS.EOF
       WScript.Echo oRS.GetString( adClipString, 1, vbCrLf, vbCrLf, "<Null>")
       Loop
       End If
      


      and the corresponding Add Sub:

      Sub AddMtsTo(oRS, oMTS, sStaticPath)
       If 1 = oMTS.Count Then
       If "" = oMTS(0).SubMatches(0) Then
       Dim dtX : dtX = DateSerial( _
       CInt(oMTS(0).SubMatches(3)), CByte(oMTS(0).SubMatches(2)), CByte(oMTS(0).SubMatches(1)) _
       ) + TimeSerial( _
       CByte(oMTS(0).SubMatches(4)), CByte(oMTS(0).SubMatches(5)), 0 _
       )
       oRS.AddNew
       oRS.Fields( "sPath"  ).Value = sStaticPath
       oRS.Fields( "sName"  ).Value = oMTS(0).SubMatches(7)
       oRS.Fields( "lSize"  ).Value = CLng(oMTS(0).SubMatches(6))
       oRS.Fields( "dtLMod" ).Value = dtX
       oRS.Update
       Else
       sStaticPath = oMTS(0).SubMatches(0)
       End If
       End If
      End Sub
      

       
      #3
        59cobalt

        • Total Posts : 969
        • Scores: 91
        • Reward points : 0
        • Joined: 7/17/2011
        • Status: offline
        Re:Fastest way to get a sorted directory to an array Saturday, December 10, 2011 12:31 PM (permalink)
        0
        While I'll agree that the best approach will depend on the actual requirements, I seriously doubt that I would consider the OP's approach (generating batch scripts and reading output files created by those batch scripts) for any kind of requirement.

        ehvbs
        Speed is just one feature that make the "shell out to dir" approach attractive; depending on your task and the context, filtering by wildcards, ordering, or even the recursive traversal (/s) may make your task easier.
        True, although directory traversal or filtering what is added to the array with a regular expression are not that complicated.

        ehvbs
        That's especially attractive, if you need the data in a collection that can be processed/displayed in different orders. Another sample, this time using dir /s:

        [...]
        Dim reCut : Set reCut = New cRE.initPF(_
        "^(?:\sDirectory\sof\s(.*?))\s*$|^(\d\d).(\d\d).(\d{4})\s+(\d\d):(\d\d)\s+(\d+)\s+(.*?)\s*$", "gimx" _
        )
        I would tend to disagree with the above, though. This code will break whenever the date or time format is changed. I'd prefer to avoid having to depend on a particular date/time/number format if possible.

        And about the processing/displaying in different order: as you probably have seen, my sample code already allows ordering by either of the three dates (the filename is actually a fallback option). Extending GetProp() with other attributes, like size, would be a trivial change.
         
        #4
          ZvbscriptUser

          • Total Posts : 1
          • Scores: 0
          • Reward points : 0
          • Joined: 12/10/2011
          • Status: offline
          Re:Fastest way to get a sorted directory to an array Saturday, December 10, 2011 6:09 PM (permalink)
          0
          59colbalt:  While I like the Getfolders approach, there are times where the performance issues involved require the use of the "dir" command, and that is when the drive being accessed is a networked drive. 
           
          I had for quite a while, used the Getfolders, and the "For Each F in fldrs" approach, on a networked drive.  Then, for some reason, my company switched the connection to a slow ethernet connection.   I found that loading an array with file information was taking minutes.    When I used the command prompt and the dir, it took about 10 secs.
           
          I tested it tonight.  I started a separate system in a virtual drive, and created a drive letter to a networked connection.   I monitored the connection between the networked drive in Wireshark.  Both the dir /b and the Set Fldr = fso.GetFolder("T:")
          caused the download of the directory information, both being around the same size.
           
          But when loading the individual files into the array, using the "For Each f in Fldr.files", Wireshark reports a network transmission of "QUERY_PATH_INFO" and a reply for each file being added to the array.  When involving about 2000 files on an overloaded network connection, the transaction became bogged down.
           
          So, I found the lesser of two evils on my HTA was to tolerate 5 short command windows using the exec approach to stick the dir command into stdout, as opposed to waiting 2-3 minutes for the array to load using the "for each" approach. 
           
          I'd love to hear of a way to avoid network tranmissions on the "for each".
           
          #5
            59cobalt

            • Total Posts : 969
            • Scores: 91
            • Reward points : 0
            • Joined: 7/17/2011
            • Status: offline
            Re:Fastest way to get a sorted directory to an array Monday, December 12, 2011 7:00 AM (permalink)
            0
            That's probably the file objects refreshing their data every time they're touched. It's not that much of an issue in the For Each loop, but will kill you in the sorting procedure. I don't think there's a way to suppress these calls, though. If you can't run the script on the remote server, I guess you'll have to either resort to one of the methods ehvbs outlined, or build your own file equivalent objects with static information.
            Class MyFile
             Dim Name, Path, Size, DateCreated, DateLastModified, DateLastAccessed
            End Class
            
            For Each f In fldr.Files
             Set newFile = New MyFile
             With f
             newFile.Name             = .Name
             newFile.Path             = .Path
             newFile.Size             = .Size
             newFile.DateCreated      = .DateCreated
             newFile.DateLastModified = .DateLastModified
             newFile.DateLastAccessed = .DateLastAccessed
             End With
             ReDim Preserve arr(UBound(arr)+1)
             Set arr(UBound(arr)) = newFile
            Next
            The array buildup will take some more time that way, due to creating the MyFile objects and assigning them the respective values of the File objects. However, the time scales with only O(n) and can even be reduced to some extent by omitting attributes that are not needed. Sorting the MyFile objects takes virtually no time, since no additional (expensive) remote lookups are required.
            Repeated dynamic lookups during the quicksort would be far more expensive, because the quicksort algorithm scales with something between O(n*log(n)) in the best and O(n²) in the worst case.
            In a quick test setup I generated 2000 files with random names and measured the time for array buildup and sorting for both File and MyFile objects. The results were:
            • File objects:
              • array buildup: 2 seconds
              • sorting: 2 minutes 11 seconds
            • MyFile objects:
              • array buildup: 8-20 seconds, depending on the number of attributes
              • sorting: less than a second
             
            #6

              Online Bookmarks Sharing: Share/Bookmark

              Jump to:

              Current active users

              There are 0 members and 1 guests.

              Icon Legend and Permission

              • New Messages
              • No New Messages
              • Hot Topic w/ New Messages
              • Hot Topic w/o New Messages
              • Locked w/ New Messages
              • Locked w/o New Messages
              • Read Message
              • Post New Thread
              • Reply to message
              • Post New Poll
              • Submit Vote
              • Post reward post
              • Delete my own posts
              • Delete my own threads
              • Rate post

              2000-2012 ASPPlayground.NET Forum Version 3.9