Login | |
|
 |
RE: Error-check a text file - 6/1/2007 12:12:00 AM
|
|
 |
|
| |
SAPIENScripter
Posts: 276
Score: 2
Joined: 11/1/2006
From: SAPIEN Technologies
Status: offline
|
I think regular expressions are the only way you can do this, but probably only if there is alway the same number of columns for every line of the file. If row one as 6 columns and row two has 5 columns, I don't see how you could have any standardized method. Assuming this is not an issue, you could try writing one long pattern taking everything into account including spaces (\s) but that might be overwhelming. An easier approach might be to take each line and split it into an array on the space character. Then use a shorter regex pattern where you can use ^ and $ to match the entire string. You should know that the first string block needs to match regexA, the second RegexB and so on. This doesn't help with spaces, but maybe that's ok. I might just create a new file using the Write method to create a known good file with single spaces. One alternative might be to make a first pass on the file line by line using a broad regex pattern that looks for the right number of columns separated by a single space. Good luck with this. You've got your hands full.
_____________________________
Jeffery Hicks Windows PowerShell MVP SAPIEN Technologies - Scripting, Simplified. www.SAPIEN.com Follow Me: http://www.twitter.com/JeffHicks
|
|
| |
|
|
|
 |
RE: Error-check a text file - 6/3/2007 8:57:56 PM
|
|
 |
|
| |
ehvbs
Posts: 2223
Score: 50
Joined: 6/22/2005
From: Germany
Status: offline
|
Hi markmcrobie, I'm unsure what the first character does (^) ^means: start of string; so "^a" will match "ab" but not "ba", while "a" will match both In character class definitions [^...] means: Not (match characters not in list) I guess \d{1,6} looks for digits only, 1 to 6 chars in length yes: \d means Digit; { <minimum/at least>, <maximum, at most> } I'm unsure of the next character (-) out of character class definitions [] - just means a literal - I guess \s means space yes/no: \s means whitespace (including tabs) Unsure of the ( plain () are used to capture the contained (part of the) match for further use (\1.., submatches); at the same time they bracket the scope of operators like * or + or {mi,ma} I guess [0-9A-Z] looks for alphanumeric characters, and i guess the + means any continuous sequence of these yes: a character class (set) definition matches any character in the list at the given position (or negated: [^...] none) * = zero ore more, + 1 or more Unsure of the )* closing the scope/capture opened by (; the resulting part may occur zero or more times (*) Again, I guess [0-9A-Z] looks for alphanumeric characters, and i guess the + means any continuous sequence of these yes Unsure of the $ $ means "end of string" (cf ^: start of string) The VBScript Docs contain an "Introduction to RegExps" chapter; there is an interactive RegExps tester posted by mikesok (?) in the "Post your Script" forum. Have fun working with RegExps! ehvbs
|
|
| |
|
|
|
| |
|
|
 |
|
 |
|
|