Photo Gallery Member List Search Calendars FAQ Ticket List Log Out


Help With RegEx

 
Logged in as: Guest
arrSession:exec spGetSession 2,3,35553
 Active Users: There are 0 members and 0 guests.
 Users viewing this topic: none
 

 

 
  
  Printable Version
All Forums >> [Scripting] >> ASP >> Help With RegEx
  Do you like VisualBasicScript.com? Link to us and help spread the word about our forum. Thanks!
Page: [1]
Login
Message << Older Topic   Newer Topic >>
 Help With RegEx - 6/19/2006 10:59:24 AM   
  SmiLie

 

Posts: 2
Score: 0
Joined: 6/19/2006
Status: offline
Hi Guys,

New here, need some help with Regular Expressions and asp.
I have an HTML on the back-end that looks like this:

<div id="123" class="someClass">
...
</div>
<div id="456" class="someClass">
...
</div>
<div id="789" class="someClass">
...
</div>

I need to remove all <div></div> blocks (including text inside) from that HTML with a regular expression, but only leave one that has provided ID (let's say 456).
Here's my function that doesn't work:

Private Function RemoveInactiveGroups(ByVal strHTML, ByVal intActiveGroupID)    'as string
   Set oRegEx = New RegExp
   With oRegEx
       .Pattern = "<div id=""[^" & intActiveGroupID & "]""[^<]*?</div>"
       .IgnoreCase = True
       .Global = True
       .MultiLine = True
   End With
   RemoveInactiveGroups = oRegEx.Replace(strHTML, "")
   Set oRegEx = nothing
End Function

Please help!

P.S. The text editor on "Post New Thread" page doesn't work in FireFox, in case you didn't know
 
 
Post #: 1
 
 RE: Help With RegEx - 6/19/2006 11:18:13 PM   
  ehvbs

 

Posts: 2012
Score: 48
Joined: 6/22/2005
From: Germany
Status: offline
Hi SmiLie,

you shouldn't use RegExps, but DOM to inspect/edit HTML. If you experiment with this
code

      
you'll see

(1)  "multiline" - it doesn't mean what (I think that) you think it means
       [^456] - will exclude string containing 4 or 5 or 6

(2)  it's easy to fall into the gap between logical specs (HTML) (a div with an id) and
       physical text (RegExp) (<div id ..., div     id ...)

(3)  you can't use [^<]*?</div> to get everything including the closing </div> because
       there may be tags (including DIVs) in you DIVs.

(4)  "<div[\s\S]*?</div>" will handle included tags other than DIV correctly, but
       fails for nested DIVs - because RegExps aren't suitable for nested structures
       like HTML

(5)  Using DOM is easy - as long as you have valid HTML. Just one loop with an easy to
       understand if clause to get at the DIV you want to keep; easy to modify if your
       specs change/become clearer:

       For Each oDIV In oDOC.getElementsByTagName( "div" )
          - work on all DIVs, will delete the DIV from Hell if sKeepId = 456
   
       For Each oDIV In oDOC.getElementById( "bdyALL" ).childNodes 
          - work on all DIV  children of body, will keep the DIV from Hell if sKeepId = 222
 

(in reply to SmiLie)
 
 
Post #: 2
 
 RE: Help With RegEx - 6/20/2006 5:33:34 AM   
  ebgreen


Posts: 4595
Score: 29
Joined: 7/12/2005
Status: offline
Here is my solution for using the DOM:

arrTest = Array("<div id=""123"" class=""someClass"">", _
               "some text", _
               "</div>", _
               "<div id=""456"" class=""someClass"">", _
               "some text", _
               "</div>", _
               "<div id=""789"" class=""someClass"">", _
               "some text", _
               "</div>")
strTest = Join(arrTest, VbCrLf)
strTestID = "456"
Set oIE = CreateObject("InternetExplorer.Application")
oIE.Navigate "About:Blank"
oIE.Document.Body.InsertAdjacentHTML "afterbegin", strTest
Set colDivs = oIE.Document.GetElementsByTagName("DIV")
For Each oDiv In colDivs
   If oDiv.ID <> strTestID Then
       oIE.Document.Body.RemoveChild oDiv
   End If
Next
strResult = oIE.Document.Body.InnerHTML
WScript.Echo strResult
oIE.Quit

_____________________________

"... when you are good and crazy, oooh, oooh, oooh, the sky is the limit!" - The Tick
Goog places to start:http://www.visualbasicscript.com/m_24727/tm.htm
http://www.visualbasicscript.com/m_47117/tm.htm

(in reply to ehvbs)
 
 
Post #: 3
 
 RE: Help With RegEx - 6/20/2006 8:08:57 AM   
  ehvbs

 

Posts: 2012
Score: 48
Joined: 6/22/2005
From: Germany
Status: offline
Hi ebgreen,

thanks for your interest/code/ideas. Seeing more than one way to solve
a problem really is the fun part of programming!

(1) Though I never used HTMLFile before, wasn't sure that it is a 'state
    of the art' component, and couldn't find any decent documentation, I
    decided to use it instead of "InternetExplorer.Application", because

      (a) reading some postings from S. Fulton and M. Harris made me
          realize that loading a complete HTML page into IE could
          trigger scripts - something that shouldn't be done by a
          'pimp my pages automagically' script

      (b) I never succeeded in replacing the About:Blank page with a
          complete HTML page - do you know the magic incantation?

(2) There is a second difference between our solutions: I use

        oDIV.parentElement.removeChild oDIV

    while you use

        oIE.Document.Body.RemoveChild oDiv

    This will fail if you get an oDIV (from GetElementsByTagName())
    which isn't a child of Body.

Now I'm waiting for some RegExp lover to show us how to get rid of
those DIVs by RegExp force! (That's only half a joke, because you can't
use DOM reliably, if your input isn't decent HTML; so (pre)processing
with string/RegExp operations may be necessary for getting a job done.)

ehvbs

(in reply to ebgreen)
 
 
Post #: 4
 
 RE: Help With RegEx - 6/20/2006 12:28:46 PM   
  SmiLie

 

Posts: 2
Score: 0
Joined: 6/19/2006
Status: offline
Thanks guys,

but the question remains, will this solution (DOM) scale into a speed-hungry web app?
ebgreen,if this is you posting on Tek-Tips, I've asked the same question there as well.

Still unable to use FireFox  

(in reply to ehvbs)
 
 
Post #: 5
 
 RE: Help With RegEx - 6/20/2006 6:12:34 PM   
  ehvbs

 

Posts: 2012
Score: 48
Joined: 6/22/2005
From: Germany
Status: offline
Hi SmiLie,

if this modification has to be done at delivery time consider this approach:

css:

someClassHide
{ ...
    display: none;
}

someClassShow
  { ...
     display: block;   
  }

format for all DIVs

<div id = "nnn" class = "someClassHide" ...

simple string op to change the class name for DIV with selected Id:

sSelectId = "456"
sSearch   = "<div id = """ + sSelectId + """ class = ""someClassHide"""
sReplace = "<div id = """ + sSelectId + """ class = ""someClassShow"""
sHTML     = ... ? ...
sHTML     = Replace( sSearch, sReplace )

Not tested, needs improving, but should be as fast as possible.

(in reply to SmiLie)
 
 
Post #: 6
 
 RE: Help With RegEx - 6/21/2006 12:12:05 AM   
  ebgreen


Posts: 4595
Score: 29
Joined: 7/12/2005
Status: offline
Hey ehvbs,

1) Though I never used HTMLFile before, wasn't sure that it is a 'state
    of the art' component, and couldn't find any decent documentation, I
    decided to use it instead of "InternetExplorer.Application", because

      (a) reading some postings from S. Fulton and M. Harris made me
          realize that loading a complete HTML page into IE could
          trigger scripts - something that shouldn't be done by a
          'pimp my pages automagically' script

You are correct that loading an entire page would cause the OnLoad event to fire so any code in the OnLoad sub would fire. In this case my reading of the situation is that the OP had a string of HTML that he just wanted to clean up. In that case he has complete control over the HTML being loaded and can simply not load any code that he doesn't want to run.

      (b) I never succeeded in replacing the About:Blank page with a
          complete HTML page - do you know the magic incantation?
I'm not positive what you mean, but if you look at my code, I use this:
oIE.Document.Body.InsertAdjacentHTML "afterbegin", strTest
to load the HTML that needs to be cleaned up into the About:Blank page. Is that what you are asking?


(2) There is a second difference between our solutions: I use

        oDIV.parentElement.removeChild oDIV

    while you use

        oIE.Document.Body.RemoveChild oDiv

    This will fail if you get an oDIV (from GetElementsByTagName())
    which isn't a child of Body.
Again, my solution is purely for cleaning up a string of HTML that the user has total control of. In this case since I explicitly load the HTML into an otherwise blank page, everything (including all of the oDivs) are a child of .Body. If I were doing this same procedure on a complete already assembled page then I would certainly use.ParentElement. Your solution is therefore the more general of our two.

Now I'm waiting for some RegExp lover to show us how to get rid of
those DIVs by RegExp force! (That's only half a joke, because you can't
use DOM reliably, if your input isn't decent HTML; so (pre)processing
with string/RegExp operations may be necessary for getting a job done.)
I can do it with regex if you want, but there are a few things that I would need to know. As was pointed out in the other forum where the OP posted this question, unless this is exactly the way the data will always be formatted, then nested divs will be the death of any regex solution. (NOTE that nested divs would probably break my IE solution as well since I used .Body.RemoveChild but using .ParentElement.RemoveChild should accomodate nested divs).

_____________________________

"... when you are good and crazy, oooh, oooh, oooh, the sky is the limit!" - The Tick
Goog places to start:http://www.visualbasicscript.com/m_24727/tm.htm
http://www.visualbasicscript.com/m_47117/tm.htm

(in reply to ehvbs)
 
 
Post #: 7
 
 RE: Help With RegEx - 6/21/2006 2:19:39 AM   
  ehvbs

 

Posts: 2012
Score: 48
Joined: 6/22/2005
From: Germany
Status: offline
Thanks ebgreen! While I wait for some response from SmiLie, I'll think about your comments.

ehvbs

(in reply to ebgreen)
 
 
Post #: 8
 
 RE: Help With RegEx - 6/21/2006 2:24:18 AM   
  ebgreen


Posts: 4595
Score: 29
Joined: 7/12/2005
Status: offline
I've done some further research and to handle nested divs, you would need to use this code (I'm not using code tags because they are annoying right now):

arrTest = Array("<div>", _
               "<div id=""123"" class=""someClass"">", _
               "some text", _
               "</div>", _
               "<div id=""456"" class=""someClass"">", _
               "some text", _
               "</div>", _
               "<div id=""789"" class=""someClass"">", _
               "some text", _
               "</div>", _
               "</div")
strTest = Join(arrTest, VbCrLf)
strTestID = "456"
Set oIE = CreateObject("InternetExplorer.Application")
oIE.Navigate "About:Blank"
oIE.Document.Body.InsertAdjacentHTML "afterbegin", strTest
Set colDivs = oIE.Document.GetElementsByTagName("DIV")
For Each oDiv In colDivs
   If oDiv.ID <> strTestID And Not oDiv.ChildNodes.Length > 1 Then
       oDiv.ParentElement.RemoveChild oDiv
   End If
Next
strResult = oIE.Document.Body.InnerHTML
WScript.Echo strResult
oIE.Quit

_____________________________

"... when you are good and crazy, oooh, oooh, oooh, the sky is the limit!" - The Tick
Goog places to start:http://www.visualbasicscript.com/m_24727/tm.htm
http://www.visualbasicscript.com/m_47117/tm.htm

(in reply to ehvbs)
 
 
Post #: 9
 
 
 
  

If you found our site useful please link to us <a href="http://www.visualbasicscript.com">VisualBasicScript.com</a>.
All Forums >> [Scripting] >> ASP >> Help With RegEx Page: [1]
Jump to:





New Messages No New Messages
Hot Topic w/ New Messages Hot Topic w/o New Messages
Locked w/ New Messages Locked w/o New Messages
 Post New Thread
 Reply to Message
 Post New Poll
 Submit Vote
 Delete My Own Post
 Delete My Own Thread
 Rate Posts