Photo Gallery Member List Search Calendars FAQ Ticket List Log Out


Comparing 2 files

 
Logged in as: Guest
arrSession:exec spGetSession 2,2,30370
 Active Users: There are 0 members and 0 guests.
 Users viewing this topic: none
 

 

 
  
  Printable Version
All Forums >> [Scripting] >> WSH & Client Side VBScript >> Comparing 2 files
  Do you like VisualBasicScript.com? Link to us and help spread the word about our forum. Thanks!
Page: [1] 2   next >   >>
Login
Message << Older Topic   Newer Topic >>
 Comparing 2 files - 1/27/2006 9:33:49 AM   
  hamboy

 

Posts: 94
Score: 6
Joined: 7/11/2005
Status: offline
Hi I'm totally new to VBScript & beginner in programming, so hopefully someone can give me a hand to start off...


I want to compare 2 files to see if the data in the two are identical. however, both files are formatted differently so I guess I will need to parse the strings...

Can someone show me that first step in how to setup the code for vbscripting?

my two files are like this:
map.txt
---------
"P-MON-OMP-1:314" : "P-PQ0506"
"P-MON-OMP-1:315" : "P-PQ0507"
"P-MON-OMP-1:316" : "P-PQ0503"
..etc

report.csv
----------
"P-PQ0502","MAGOG - RTE 112/AUT 10","MON1_ECP","CELL 310"
"P-PQ0503","ROCK FOREST - BOURQUE/MI VALLON","MON1_ECP","CELL 316"
"P-PQ0504","SHERBROOKE - PEPIN/PORTLAND","MON1_ECP","CELL 312"
"P-PQ0505","FLEURIMONT - DES JONQUILLES/PAPINEAU","MON1_ECP","CELL 313"
"P-PQ0506","SHERBROOKE - MARQUETTE/CATHEDRALE","MON1_ECP","CELL 314"
"P-PQ0507","MAGOG - SHERBROOKE/ST-PATRICE","MON1_ECP","CELL 315"
"P-PQ0508","ROCK FOREST - BOURQUE/JOYAL","MON1_ECP","CELL 317"
"P-PQ0509","BROMPTONVILLE - ECOSSAIS/AUT 10","MON1_ECP","CELL 311"
etc...

< Message edited by hamboy -- 1/27/2006 10:00:21 AM >
 
 
Post #: 1
 
 RE: Comparing 2 files - 1/27/2006 10:05:20 AM   
  hamboy

 

Posts: 94
Score: 6
Joined: 7/11/2005
Status: offline
Will it be a good idea to put all these into arrays?

ie.
"P-MON-OMP-1:314" : "P-PQ0506"
parse the above line and put it into array line:
line1(0)=P
line1(1)=MON
line1(2)=OMP
line1(3)=1
line1(4)=314
line1(5)=P-PQ0506

and "P-MON-OMP-1:315" : "P-PQ0507"
would be array line2...etc.

then I can access lineX(5) and compare it with the csv file... I have the idea, but not sure how to code it

(in reply to hamboy)
 
 
Post #: 2
 
 RE: Comparing 2 files - 1/28/2006 4:30:56 AM   
  ehvbs

 

Posts: 2204
Score: 50
Joined: 6/22/2005
From: Germany
Status: offline
Hi hamboy,

I believe that your problem could be solved in a very nice (interesting, elegant, reusable)
way by using ADO/Text Driver to treat your two files as tables and asking your question(s)
in SQL. But to think about it, I need some more information.

Please give meaningful names/short descriptions to the columns of your files. (I can see
that map.Col2 and report.Col1 provide the relation between the files, but not knowing
the meaning, I can't guess whether they are unique.)

Please give exhaustive sample data. Is it possible that map.txt contains lines like:
   "P-MON-OMP-1:316" : "P-PQ0503"
   "P-MON-OMP-1:316" : "P-PQ0512"
   "P-MON-OMP-1:345" : "P-PQ0503"

Please state your question(s) explicitly: What do you mean by compare? Is that good:
   "P-MON-OMP-1:316" : "P-PQ0503"
   "P-PQ0503","ROCK FOREST - BOURQUE/MI VALLON","MON1_ECP","CELL 316"
Would that be bad:
   "P-MON-OMP-1:316" : "P-PQ0503"
   "P-PQ0503","ROCK FOREST - BOURQUE/MI VALLON","MON1_ECP","CELL 415"
Are you interested in lines from map that have no counterpart in report? Or vice
versa?

ehvbs

(in reply to hamboy)
 
 
Post #: 3
 
 RE: Comparing 2 files - 1/30/2006 7:54:09 AM   
  hamboy

 

Posts: 94
Score: 6
Joined: 7/11/2005
Status: offline
Hi ehvbs,
Basically in both files, each line can be considered as a record.
For example, the line ("P-MON-OMP-1:314" : "P-PQ0506") is one data.
From here, I need to retrieve certain parts (from left to right), "MON", "1", "314", "P-PQ0506".
From report.csv, by setting "P-PQ0506" as the key, we can see that we get
"P-PQ0506","SHERBROOKE - MARQUETTE/CATHEDRALE","MON1_ECP","CELL 314"

Now from the data we got from map.txt ("MON", "1", "314", "P-PQ0506"),
we need to concatenate "MON" and "1" -> "MON1"
and concatenate "CELL" with "314" -> "CELL 314"
As you can see, from the report.csv line with "P-PQ0506", we have "MON1" after the 2nd comma,
and "CELL 314" after 3rd comma, which is the same as what we got from map.txt. This is considered as a matching data.
Note: it is probably better to split "MON1_ECP" string from report.csv to "MON1" for easier comparison IMO.
Unmatching data is anything like "CELL 382" instead of "CELL 314" or "XXX1" instead of "MON1". Any variations in the 4 data
collected constitutes unmatching data.
Yes it is possible for map.txt to have:
  "P-MON-OMP-1:316" : "P-PQ0503"
  "P-MON-OMP-1:316" : "P-PQ0512"
  "P-MON-OMP-1:345" : "P-PQ0503"
Values and letters are not restricted. What's imiportant is that the correct parts are taken (as shown from above example) to properly compare it with report.csv
What I mean by compare is the 4 parts taht are retrieved (ie "MON", "1", "314", "P-PQ0506") after manipulated to ("MON1", "CELL 314", "P-PQ0506") must be exactly the same as the specified parts in report.csv. Anything different is considered unmatching and "P-PQ0506" string (for example if it had "CELL 382" in report.csv instead of "CELL 314") should be listed in a unmatch.txt output file.

Please let me know if you have any other questions

(in reply to ehvbs)
 
 
Post #: 4
 
 RE: Comparing 2 files - 1/30/2006 9:30:12 AM   
  ehvbs

 

Posts: 2204
Score: 50
Joined: 6/22/2005
From: Germany
Status: offline
Hi hamboy,

thanks for the info; while I work it thru carefully, you could look at this
first step of my efforts:

If possible at all, please copy map.txt and report.csv to a new directory
and save this


      

as schema.ini to this folder. Then save this


      

to frstest.vbs and this


      

to library.vbs. Now open a DOS box (command prompt), change to
that directory and execute cscript frstest.exe.

If you get output like


      

all is well - if not, please tell me about the error messages.

ehvbs

 

(in reply to hamboy)
 
 
Post #: 5
 
 RE: Comparing 2 files - 1/31/2006 4:00:26 AM   
  hamboy

 

Posts: 94
Score: 6
Joined: 7/11/2005
Status: offline
Wow cool thanks ehvbs... it ran with no problems and exactly the same as your output.

Just to make things easier, i nicely reformated report.csv and map.txt. This will hopefully make it easier to help me and also allow me to understand vbscript better.

Report.csv & map.txt attached. Same thing as before, taking map.txt data and comparing it against report.csv. Anything that's different (ie. CELL Numer and/or Reporting System) should be outputed by listing it's corresponding Element ID in "Unmatching" list. If for example, P-MB0010 from map.txt can not be found in report.csv, list it under a "Not Found" list

In the meantime, I'll look over and try to learn from ur code. Thanks!

http://s46.yousendit.com/d.aspx?id=13P54JDUTKTR90FJ5FT3KXECAQ

http://s46.yousendit.com/d.aspx?id=1H0KALNF28D5L1PCE845C687LS

< Message edited by hamboy -- 1/31/2006 4:12:35 AM >

(in reply to ehvbs)
 
 
Post #: 6
 
 RE: Comparing 2 files - 1/31/2006 4:33:50 AM   
  ehvbs

 

Posts: 2204
Score: 50
Joined: 6/22/2005
From: Germany
Status: offline
Hi hamboy,

I'm glad about your success. But if you followed the link to the schema.ini docs
you'll have realized by now, that changing the file formats in the middle of the
project is a second best idea. I downloaded your files and think the new formats
need reworking. Do you have complete control over the format? Which one is
easier for you? If I were you, I'd  prefer to  stick to the old one (no reformatting
needed). But that's your decision.

ehvbs
 

(in reply to hamboy)
 
 
Post #: 7
 
 RE: Comparing 2 files - 1/31/2006 5:29:48 AM   
  hamboy

 

Posts: 94
Score: 6
Joined: 7/11/2005
Status: offline
You're right... it took awhile to reformat everything so coding would be easier.. but i guess that defeats the purpose of automating and making use of programming. I don't have control of how the data is outputed, but i am able to modify it.. ie using excel. It would be easier if everytninig was automated instead of going through and reformating it everytime.

the original files can be found below:
http://s65.yousendit.com/d.aspx?id=25KCMB709PYRI1RKR71H6YJBTN
http://s65.yousendit.com/d.aspx?id=1LA3YBSFCHWRC38Y1UL1DLK7GS

< Message edited by hamboy -- 3/1/2006 6:43:56 AM >

(in reply to ehvbs)
 
 
Post #: 8
 
 RE: Comparing 2 files - 1/31/2006 6:10:06 AM   
  ehvbs

 

Posts: 2204
Score: 50
Joined: 6/22/2005
From: Germany
Status: offline
Hi hamboy,

It will take some time to analyse the original format. In the meantime you may to this

(1)  Replace getdispSq() in library.vbs with this more stable version (provides for empty recordsets):

      

(2) In schema.ini disable the 'very old' defs for map.txt by commenting out the file name and add
      the new defs;

      

(3) In frstest.vbs insert the new sub doOLdFormat()


      

and insert a call to it before  the call of doMain00():

    ExecuteGlobal goFS.OpenTextFile( "library.vbs" ).ReadAll

   WScript.Quit doOLdFormat()
   WScript.Quit doMain00()

(4) Expected output (shortened):


      

I will wrestle with the original files!

ehvbs

(in reply to hamboy)
 
 
Post #: 9
 
 RE: Comparing 2 files - 1/31/2006 8:39:34 AM   
  ehvbs

 

Posts: 2204
Score: 50
Joined: 6/22/2005
From: Germany
Status: offline
What I did to play with original_Report.csv:

(1) defs for original_Report.csv in schema.ini:

      

(2) some addings to library.vbs:

      

(3) added to frstest.vbs:

   ExecuteGlobal goFS.OpenTextFile( "library.vbs" ).ReadAll
   WScript.Quit doOrgReport()


      

You can see from this:

(a) Element ID is Primary Key in original_Report.csv
(b) if you play around with the different sSQL statements

    sSQL    =   "SELECT TOP 10 " + Join( aFields, ", " ) + " FROM original_Report.csv"
    sSQL    =   "SELECT        " + Join( aFields, ", " ) + " FROM original_Report.csv"
    sSQL    =   "SELECT        " + Join( aFields, ", " ) + " FROM original_Report.csv " _
              + "WHERE  IIF( Cell_Number IS NULL, FALSE " _
              + "        , IIF( NOT ISNUMERIC( MID( Cell_Number, 6 ) ), FALSE  " _
              + "           , 0 < CLNG( MID( Cell_Number, 6 ) ) ) )"

     you realize, there is a lot of junk in original_Report.csv. Am I right in assuming
     that those records which don't get caught by the last statement can be ignored
     when we have to think about matches?

(in reply to ehvbs)
 
 
Post #: 10
 
 RE: Comparing 2 files - 1/31/2006 9:33:43 AM   
  hamboy

 

Posts: 94
Score: 6
Joined: 7/11/2005
Status: offline
Yes that's correct. When we compare, we use map.txt to compare against report.csv since map.txt has less "records".
All of those junk data from report.csv that are not in map.txt can be ignored.

The other way around, if Element ID in map.txt is not found in report.csv, then you output the Element ID to some output "Not Found" list.

(in reply to ehvbs)
 
 
Post #: 11
 
 RE: Comparing 2 files - 1/31/2006 9:46:40 AM   
  ehvbs

 

Posts: 2204
Score: 50
Joined: 6/22/2005
From: Germany
Status: offline
Thanks! I'm working on original_map.txt now. Does the code I posted work for you?

(in reply to hamboy)
 
 
Post #: 12
 
 RE: Comparing 2 files - 1/31/2006 9:56:29 AM   
  hamboy

 

Posts: 94
Score: 6
Joined: 7/11/2005
Status: offline
thanks ehvbs ! the code is working perfectly.

I notice the output of report.csv contains 1239 records. from the map.txt, all of the Element ID either starts with P- or Q-.
By filtering the data in report.csv with only those that start with "P-" or "Q-", there's a total of 1792 records.

Are we losing any records from report.csv that may be in map.txt?

< Message edited by hamboy -- 1/31/2006 9:59:00 AM >

(in reply to ehvbs)
 
 
Post #: 13
 
 RE: Comparing 2 files - 1/31/2006 10:08:32 AM   
  ehvbs

 

Posts: 2204
Score: 50
Joined: 6/22/2005
From: Germany
Status: offline
We start with 1956 records in original_report.csv. If we filter junk out of original_report.csv and
export only the good ones to another file, it's difficult to say how many that will be. Perhaps a
good way to gain confidence in our work would be to create test_map.txt and text_report.csv
with just about representative records 10 records and to specify the disired output. Would you be
willing to do this? You can post these examples here, I think.

(in reply to hamboy)
 
 
Post #: 14
 
 RE: Comparing 2 files - 1/31/2006 10:38:00 AM   
  hamboy

 

Posts: 94
Score: 6
Joined: 7/11/2005
Status: offline
nice filtering ehvbs...
i skimmed through and looked at a few unique entries in map.txt that should be kept, and your output's got it all there..

map.txt will not need to be filtered for "junk" data since it's the main file that's being used to compare, so all records are relevent.
if any "Element ID" from map.txt does not exist in report.csv, ie. "P-MON-OMP-2:222" : "P-MON2TEST" or "P-TOR-OMP-3:1" : "" then just output the string in "Not Found" list.. or however you prefer.

(in reply to ehvbs)
 
 
Post #: 15
 
 RE: Comparing 2 files - 1/31/2006 11:02:18 AM   
  ehvbs

 

Posts: 2204
Score: 50
Joined: 6/22/2005
From: Germany
Status: offline
Hi hamboy,

I have to stop now (it's very late in Germany) and I will be away tomorrow; therefore I will
post all my code here; I hope you can use it. The day after tomorrow I should be back.