View previous topic :: View next topic |
Author |
Message |
Reynaldo Halfop
Joined: 11 May 2005 Posts: 54
|
Posted: Tue Sep 27, 2005 1:41 am Post subject: parse the html |
|
|
i've parse the html to $html
Code: | regsub -all "\n" $html "" html
set nopage [string first "<div id=date>" $html 0]
set news [string range $html $nopage [expr [string first "<ul>" $html $nopage] - 1]] |
the output $html:
Code: |
<div id=tanggal>
Tuesday ,
27/09/2005 09:11 EST</div>
Today news is bla bla bla bla. </a></div>
<div id=summary>
Bla bla bla bla bla news today. </div>
<div id=titlebiru>Read also :</div>
<ul>
|
How to regexp the date, news topic, and the news? into $variable, so the output will be:
Tuesday, 27/09/2005 09:11 EST, Today news is bla bla bla. Bla bla bla bla news today. |
|
Back to top |
|
 |
De Kus Revered One

Joined: 15 Dec 2002 Posts: 1361 Location: Germany
|
Posted: Tue Sep 27, 2005 4:06 am Post subject: |
|
|
I'd suggest to run a regexp on the complete html without string range and regsub changes.
if you are too unfamilar with regexpand dont want to link to the complete page I'd recommned you to try out this one:
http://forum.egghelp.org/viewtopic.php?t=9972 _________________ De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens... |
|
Back to top |
|
 |
demond Revered One

Joined: 12 Jun 2004 Posts: 3073 Location: San Francisco, CA
|
Posted: Wed Sep 28, 2005 12:51 am Post subject: |
|
|
suppose you want to extract what is contained between some opening and closing tags:
Code: |
[demond@whitepine demond]$ tclsh8.4
% set str "<tag attr=foo>some text</tag>"
<tag attr=foo>some text</tag>
% regexp {<tag.*?>(.*?)</tag>} $str -> str
1
% set str
some text
%
|
_________________ connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code |
|
Back to top |
|
 |
De Kus Revered One

Joined: 15 Dec 2002 Posts: 1361 Location: Germany
|
Posted: Wed Sep 28, 2005 3:40 am Post subject: |
|
|
Tip: use '[^>]' or '[^<]' instead of '.' in cases where <tag> </tag> is not unique, because regex tends to match the widest match, not the shortest match. _________________ De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens... |
|
Back to top |
|
 |
demond Revered One

Joined: 12 Jun 2004 Posts: 3073 Location: San Francisco, CA
|
Posted: Wed Sep 28, 2005 4:01 am Post subject: |
|
|
De Kus wrote: | Tip: use '[^>]' or '[^<]' instead of '.' in cases where <tag> </tag> is not unique, because regex tends to match the widest match, not the shortest match. |
there's no need of that, '*?' is a non-greedy quantifier, see the docs _________________ connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code |
|
Back to top |
|
 |
De Kus Revered One

Joined: 15 Dec 2002 Posts: 1361 Location: Germany
|
Posted: Wed Sep 28, 2005 5:02 am Post subject: |
|
|
ah sorry. must have missed it, hmm, should try if it is actually faster than the other variant. _________________ De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens... |
|
Back to top |
|
 |
|