This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

parse the html

Old posts that have not been replied to for several years.
Locked
R
Reynaldo
Halfop
Posts: 54
Joined: Wed May 11, 2005 2:51 am

parse the html

Post by Reynaldo »

i've parse the html to $html

Code: Select all

    regsub -all "\n" $html "" html
    set nopage [string first "<div id=date>" $html 0]
    set news [string range $html $nopage [expr [string first "<ul>" $html $nopage] - 1]]
the output $html:

Code: Select all

<div id=tanggal> 
Tuesday              , 
27/09/2005 09:11              EST</div>
Today news is bla bla bla bla.              </a></div>
<div id=summary> 
Bla bla bla bla bla news today.            </div>
<div id=titlebiru>Read also :</div>
<ul>

How to regexp the date, news topic, and the news? into $variable, so the output will be:
Tuesday, 27/09/2005 09:11 EST, Today news is bla bla bla. Bla bla bla bla news today.
User avatar
De Kus
Revered One
Posts: 1361
Joined: Sun Dec 15, 2002 11:41 am
Location: Germany

Post by De Kus »

I'd suggest to run a regexp on the complete html without string range and regsub changes.
if you are too unfamilar with regexpand dont want to link to the complete page I'd recommned you to try out this one:
http://forum.egghelp.org/viewtopic.php?t=9972
De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens...
User avatar
demond
Revered One
Posts: 3073
Joined: Sat Jun 12, 2004 9:58 am
Location: San Francisco, CA
Contact:

Post by demond »

suppose you want to extract what is contained between some opening and closing tags:

Code: Select all

[demond@whitepine demond]$ tclsh8.4
% set str "<tag attr=foo>some text</tag>"
<tag attr=foo>some text</tag>
% regexp {<tag.*?>(.*?)</tag>} $str -> str
1
% set str
some text
%
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use

Code: Select all

 tag when posting logs, code
User avatar
De Kus
Revered One
Posts: 1361
Joined: Sun Dec 15, 2002 11:41 am
Location: Germany

Post by De Kus »

Tip: use '[^>]' or '[^<]' instead of '.' in cases where <tag> </tag> is not unique, because regex tends to match the widest match, not the shortest match.
De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens...
User avatar
demond
Revered One
Posts: 3073
Joined: Sat Jun 12, 2004 9:58 am
Location: San Francisco, CA
Contact:

Post by demond »

De Kus wrote:Tip: use '[^>]' or '[^<]' instead of '.' in cases where <tag> </tag> is not unique, because regex tends to match the widest match, not the shortest match.
there's no need of that, '*?' is a non-greedy quantifier, see the docs
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use

Code: Select all

 tag when posting logs, code
User avatar
De Kus
Revered One
Posts: 1361
Joined: Sun Dec 15, 2002 11:41 am
Location: Germany

Post by De Kus »

ah sorry. must have missed it, hmm, should try if it is actually faster than the other variant.
De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens...
Locked