Login | |
|
 |
RE: Help With RegEx - 6/20/2006 5:33:34 AM
|
|
 |
|
| |
ebgreen
Posts: 4595
Score: 29
Joined: 7/12/2005
Status: offline
|
Here is my solution for using the DOM: arrTest = Array("<div id=""123"" class=""someClass"">", _ "some text", _ "</div>", _ "<div id=""456"" class=""someClass"">", _ "some text", _ "</div>", _ "<div id=""789"" class=""someClass"">", _ "some text", _ "</div>") strTest = Join(arrTest, VbCrLf) strTestID = "456" Set oIE = CreateObject("InternetExplorer.Application") oIE.Navigate "About:Blank" oIE.Document.Body.InsertAdjacentHTML "afterbegin", strTest Set colDivs = oIE.Document.GetElementsByTagName("DIV") For Each oDiv In colDivs If oDiv.ID <> strTestID Then oIE.Document.Body.RemoveChild oDiv End If Next strResult = oIE.Document.Body.InnerHTML WScript.Echo strResult oIE.Quit
_____________________________
"... when you are good and crazy, oooh, oooh, oooh, the sky is the limit!" - The Tick Goog places to start:http://www.visualbasicscript.com/m_24727/tm.htm http://www.visualbasicscript.com/m_47117/tm.htm
|
|
| |
|
|
|
 |
RE: Help With RegEx - 6/21/2006 12:12:05 AM
|
|
 |
|
| |
ebgreen
Posts: 4595
Score: 29
Joined: 7/12/2005
Status: offline
|
Hey ehvbs, 1) Though I never used HTMLFile before, wasn't sure that it is a 'state of the art' component, and couldn't find any decent documentation, I decided to use it instead of "InternetExplorer.Application", because (a) reading some postings from S. Fulton and M. Harris made me realize that loading a complete HTML page into IE could trigger scripts - something that shouldn't be done by a 'pimp my pages automagically' script You are correct that loading an entire page would cause the OnLoad event to fire so any code in the OnLoad sub would fire. In this case my reading of the situation is that the OP had a string of HTML that he just wanted to clean up. In that case he has complete control over the HTML being loaded and can simply not load any code that he doesn't want to run. (b) I never succeeded in replacing the About:Blank page with a complete HTML page - do you know the magic incantation? I'm not positive what you mean, but if you look at my code, I use this: oIE.Document.Body.InsertAdjacentHTML "afterbegin", strTest to load the HTML that needs to be cleaned up into the About:Blank page. Is that what you are asking? (2) There is a second difference between our solutions: I use oDIV.parentElement.removeChild oDIV while you use oIE.Document.Body.RemoveChild oDiv This will fail if you get an oDIV (from GetElementsByTagName()) which isn't a child of Body. Again, my solution is purely for cleaning up a string of HTML that the user has total control of. In this case since I explicitly load the HTML into an otherwise blank page, everything (including all of the oDivs) are a child of .Body. If I were doing this same procedure on a complete already assembled page then I would certainly use.ParentElement. Your solution is therefore the more general of our two. Now I'm waiting for some RegExp lover to show us how to get rid of those DIVs by RegExp force! (That's only half a joke, because you can't use DOM reliably, if your input isn't decent HTML; so (pre)processing with string/RegExp operations may be necessary for getting a job done.) I can do it with regex if you want, but there are a few things that I would need to know. As was pointed out in the other forum where the OP posted this question, unless this is exactly the way the data will always be formatted, then nested divs will be the death of any regex solution. (NOTE that nested divs would probably break my IE solution as well since I used .Body.RemoveChild but using .ParentElement.RemoveChild should accomodate nested divs).
_____________________________
"... when you are good and crazy, oooh, oooh, oooh, the sky is the limit!" - The Tick Goog places to start:http://www.visualbasicscript.com/m_24727/tm.htm http://www.visualbasicscript.com/m_47117/tm.htm
|
|
| |
|
|
|
 |
RE: Help With RegEx - 6/21/2006 2:24:18 AM
|
|
 |
|
| |
ebgreen
Posts: 4595
Score: 29
Joined: 7/12/2005
Status: offline
|
I've done some further research and to handle nested divs, you would need to use this code (I'm not using code tags because they are annoying right now): arrTest = Array("<div>", _ "<div id=""123"" class=""someClass"">", _ "some text", _ "</div>", _ "<div id=""456"" class=""someClass"">", _ "some text", _ "</div>", _ "<div id=""789"" class=""someClass"">", _ "some text", _ "</div>", _ "</div") strTest = Join(arrTest, VbCrLf) strTestID = "456" Set oIE = CreateObject("InternetExplorer.Application") oIE.Navigate "About:Blank" oIE.Document.Body.InsertAdjacentHTML "afterbegin", strTest Set colDivs = oIE.Document.GetElementsByTagName("DIV") For Each oDiv In colDivs If oDiv.ID <> strTestID And Not oDiv.ChildNodes.Length > 1 Then oDiv.ParentElement.RemoveChild oDiv End If Next strResult = oIE.Document.Body.InnerHTML WScript.Echo strResult oIE.Quit
_____________________________
"... when you are good and crazy, oooh, oooh, oooh, the sky is the limit!" - The Tick Goog places to start:http://www.visualbasicscript.com/m_24727/tm.htm http://www.visualbasicscript.com/m_47117/tm.htm
|
|
| |
|
|
|
| |
|
|
 |
|
 |
|
|