All Forums >> [Scripting] >> Post a VBScript >> Super-Fast Sort an array [updated] Do you like VisualBasicScript.com? Link to us and help spread the word about our forum. Thanks!
To run the script below you have to create a text file named "test.txt" and copy paste the text to sort, preferably with many many lines! (And have this text file near the vbs file.)
My add-on is splitting the whole text into small ones, themselves sorted in an array after their first letter only. Then it splits these small texts into small arrays and send these arrays for sorting with the InsertionSort function. Only one small problem (the world is not perfect): It works only if the first letter of your items are different. If you have a list in where all the items begin with the same letter for example:
item1 item32 item75 item6 etc
It will take much more time! Depending on the purpose of your script it can be an issue or not be an issue.
I also tried to recurse my SortArray function to do a complete sorting with this method but it didn't work.
< Message edited by Fredledingue -- 7/18/2006 7:44:11 AM >
First off... wow. I am constantly amazed by the scripting prowess displayed by some of the folks in these forums. if I ever had to write a sort routine by scratch, I don't think I could do it, especially not after seeing the code that you put together. Insertion sorts, bubble sorts... meh... I look att that code for that function and just go 'duh'..... it doesn't make any sense to me.
But since I am pretty code-dumb, and can't write my own sort routines, I've always had to rely on the built-in ones that vbscript gives us:
Now, the above code is pretty quick, and - I think - does a pretty good job of sorting text. But, since I am always on the lookout for a better way of doing things, I gave your function a try. I grabbed a copy of a log file and performed two different timings on it, and our GAL dump, removed the first two columns so that each line started with a different letter, and performed the final test on it.
31k file(486 lines) - log file Super fast sort = 0.046875 fSortArray = 0.03125
250K file(3888 lines) - log file Super fast sort = 3.6875 fsortarray = 0.140625
2meg file(7998 long lines) - GAL dump Super fast sort = 7.265625 fSortArray = 0.28125
I still feel that my testing is not right, since I am getting such wildly varying results, so I would appreciate it if you and others would give this a try and post the results.
_____________________________
"There's the one man who learns by reading, the two men that learn by watching, and the rest of us have to pee on the electric fence for ourselves." - Roy Rogers
"Would you like to touch my monkey?" - Dieter (Mike Meyers)
I couldn't test your function because wscript can't create the ActiveX component "System.Collections.ArrayList". :( Probably because I run w98se. I thought I had some ActiveX update, but no.
Anyway if there is a pre-build ActiveX function that is called ".Sort" and used for sorting arrays, it's likely to be much faster than anything we could do in raw VBS. It's very simple: in Excell you can insert realy huge amount of lines and it will sort them in a fraction of a hundred of second. I'v been always wondering how is it possible. Hence my pasionate interrest for sort scripts.
Now it's surprising that you got so poor results with my function. I tested it on 2000 lines and got no more than 0.27 sec.
Then I tested it on a divX log of 5000 lines: InsertionSort: 9.7 seconds Super-Fast Sort: 1.7 seconds
The DivX log (and maybe other log files) are not optimized for using with my function because each line starts with a number, so the big array can be split in only 10 smaller arrays. And "1" and "2" take the bulk of the text while "9" takes very little, so the division is not optimized neither. Also it takes a lot of time to append lines to small arrays when they grow not that small. And the fast concatenation "flush" method with dictionary is not possible here.
ehvbs
Very well observed! I found it too yesterday and updated the code
(1) For "System.Collections.ArrayList" you need the dot.net runtime (1.1 is the one I'm sure about). You'll have to check/decide, whether you can/want to install it.
(2) I think something went wrong with your update of the code; I still see the
TempArray(TempArrayNum) = TempArray(TempArrayNum) & Word & VBCrlf
My add-on is splitting the whole text into small ones, themselves sorted in an array after their first letter only. Then it splits these small texts into small arrays and send these arrays for sorting with the InsertionSort function.
Based on this description it sounds sort of like a radix sort. Yours is more of a hybrid since you're only sorting into "buckets" once then you sort what's in each "bucket". With a radix sort the following list would take four passes: Dog, Feet, Junk, Fun, Cat, Dirt, Fool
First Sort would be based on the least significant factor. Group None: Dog, Fun, Cat Group L: Fool Group K: Junk Group T: Feet, Dirt
Second Sort (next least significant): Group G: Dog Group E: Feet Group N: Junk, Fun Group O: Fool Group R: Dirt Group T: Cat
Third Sort (next least significant): Group A: Cat Group E: Feet Group I: Dirt Group O: Dog, Fool Gruop U: Junk, Fun
Final Sort (most significant): Group C: Cat Group D: Dirt, Dog Group F: Feet, Fool, Fun Group J: Junk
Final List Cat, Dirt, Dog, Feet, Fool, Fun, Junk
The radix sort is stable - meaning two items that are the same will not swap places - which is why, in this example, the final sort of "D" and "F" preserves the ordering from the previous sorts.
With your sort it would do the following: Split into C, D, F, J lists. Cat Dog, Dirt Feet, Fun, Fool Junk
C is sorted
Sort D (one swap) Dirt, Dog (swapped)
Sort F (one pass, one swap) Feet (pass), Fool, Fun (swapped)
Due to the requirements of needing .NET installed to use the arraysort ActiveX component, I came up with a workaround. This function will accept either an array or a string, and will sort it. If the arraysort activex is available, it will use it for the array, but if it isn't it will use the DOS sort routine to perform the sort. kinda kludgy, but... I was able to sort a 40,000 line (750KB) list in 0.53125 seconds with this, using the DOS sort routine. A 4.4MB file containing our company's GAL (8,000 lines, each line around 600 characters long) was sorted in 0.7 seconds. I think that this is an acceptable amount of overhead for a sort routine, don't you?
You can modify the sort command based on your particular OS, as some have different switches/arguments. This function has been tested on XP, 2003, and Windows NT 4.0.
< Message edited by DiGiTAL.SkReAM -- 10/10/2006 1:26:34 AM >
_____________________________
"There's the one man who learns by reading, the two men that learn by watching, and the rest of us have to pee on the electric fence for ourselves." - Roy Rogers
"Would you like to touch my monkey?" - Dieter (Mike Meyers)
Not to be facetious but....instead of going through all the trouble of using ActiveX controls and command prompt workarounds.....why not use JScript in part to do the the dirty work? I'm able to read a 16mb textfile with 133,350lines, sort it, and rewrite the result to a new file in an average of 9 seconds:
Now if you NEED/want to use as much vbscript as possible, maybe use a .wsf instead?
Just an idea for an alternative to activeX and other external additions. plus using the JScript .sort method will allow you a much easier way to custom sort your results
ORIGINAL: TNO Not to be facetious but....instead of going through all the trouble of using ActiveX controls and command prompt workarounds.....why not use JScript in part to do the the dirty work? I'm able to read a 16mb textfile with 133,350lines, sort it, and rewrite the result to a new file in an average of 9 seconds:
Now if you NEED/want to use as much vbscript as possible, maybe use a .wsf instead?
Just an idea for an alternative to activeX and other external additions. plus using the JScript .sort method will allow you a much easier way to custom sort your results
Yer right int hat the jscript .sort method would be "better" or at least integral to the language. However, in my case, I am limited to vbs as the 'standard' scripting language and cannot deviate. Also, I owuldn't know HOW to go about writing a .vbs file that includes a jscript command. I know that it can be done in ASP, but as I am writing in pure .vbs for CLI execution, I don't think it is possible. It would be great if you could prove me wrong, as there are a number of things that jscript does faster than vbs.
_____________________________
"There's the one man who learns by reading, the two men that learn by watching, and the rest of us have to pee on the electric fence for ourselves." - Roy Rogers
"Would you like to touch my monkey?" - Dieter (Mike Meyers)
...However, in my case, I am limited to vbs as the 'standard' scripting language and cannot deviate. Also, I owuldn't know HOW to go about writing a .vbs file that includes a jscript command. I know that it can be done in ASP, but as I am writing in pure .vbs for CLI execution, I don't think it is possible. It would be great if you could prove me wrong, as there are a number of things that jscript does faster than vbs.
Back when I was playing with script components and such for DHTML behaviors I learned about the Windows Script Components (*.wsc). Though with my new JScript library extensions I can avoid using these files, I can see such cases where it would still be a good idea, such as this. This will allow you to treat a pure scripting file as an ActiveX component that can be used in any COM enabled language (talk about deployability )
Now heres how we make our component:
First make a text file with the wsc extension in this case I'm going to call it sort.wsc
I'm also going to make my vbs file: monkeys.vbs
Next, inside my wsc file I'm going to put the following, which is the most basic format for a single JScript method:
Explanation: MyComponent.Sort is the name of my component when I reference it in my vbscript, it can be pretty much whatever I want it to be. FileSort is the method of the MyComponent.Sort (simply put, the name of the function I will be calling). And the parameters of my function are fin & fout. Since XML is used to define a wsc file, its obvious that throwing any kind of script inside of it would screw up the XML parser with its special charcters (&, {,}, etc.. ), so a <![CDATA[ ... ]]> section is included so it doesn't go psycho when it encounters a special character.
So now lets throw in the JScript that does all the dirty work for the sorting:
function FileSort(fin,fout){ var fso; fso = new ActiveXObject("Scripting.FileSystemObject"); var ForReading = 1, ForWriting = 2; var objFile; var fulltext=fso.OpenTextFile(fin, ForReading).ReadAll(); fulltext=fulltext.split("\n"); fulltext=fulltext.sort(); fulltext=fulltext.join("\n"); objFile=fso.CreateTextFile(fout) objFile.Write(fulltext) objFile.Close(); }
]]> </script> </component> </package>
So now in our vbs file I'll put this:
quote:
' Creates instance of my script component. Set obj = CreateObject("MyComponent.Sort")
'Sort the file FileSort("File To Be Sorted","Sorted File") Call obj.FileSort("M94503VD.txt","Sorted.txt") MsgBox("Complete")
Now the final step is to register our wsc file so we can use it: Right Click on the wsc file and hit Register so we can reference it from anywhere regardless of the wsc file's location.
Now save the vbs file, double click it and let the magic begin
This was just a crash course on what is possible now to "Extend" the life and functionality of your scripts in a zero-deployable, zero-plugin, zero-third-party environment. Now you can rely on nothing but your current version of WSH to accomplish your tasks.
<grovel>That just rules! That xml has to be used kinda sucks, but the rest... awesome!</grovel>
_____________________________
"There's the one man who learns by reading, the two men that learn by watching, and the rest of us have to pee on the electric fence for ourselves." - Roy Rogers
"Would you like to touch my monkey?" - Dieter (Mike Meyers)
Function InsertionSort(arrX) Dim nUB : nUB = UBound(arrX) Dim i, j, v For i=1 To nUB v=arrX(i) j=i Do While arrX(j-1) > v arrX(j) = arrX(j-1) j=j-1 If j<=0 Then Exit Do End If Loop arrX(j) = v Next End Function
I was wondering if someone could point out what this function is actually doing, or point me to where i can find out. I know that it sorts an array, which is what I want, and I have succesfully adapted it to work in the modified script engine in the program i need it for, But, I want to know what its actually doing. I dont want to just take code and use it i want to learn how to write it myself. Any help would be greatly apreciated.
There are lot's of web site dedicated to sorting algorithms. Just google for something like "sort bubble insertion quick". Add "demo" if you want some kind of visualisation.