egghelp.org community Forum Index
[ egghelp.org home | forum home ]
egghelp.org community
Discussion of eggdrop bots, shell accounts and tcl scripts.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

parse the html

 
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    egghelp.org community Forum Index -> Archive
View previous topic :: View next topic  
Author Message
Reynaldo
Halfop


Joined: 11 May 2005
Posts: 54

PostPosted: Tue Sep 27, 2005 1:41 am    Post subject: parse the html Reply with quote

i've parse the html to $html

Code:
    regsub -all "\n" $html "" html
    set nopage [string first "<div id=date>" $html 0]
    set news [string range $html $nopage [expr [string first "<ul>" $html $nopage] - 1]]


the output $html:
Code:

<div id=tanggal>
Tuesday              ,
27/09/2005 09:11              EST</div>
Today news is bla bla bla bla.              </a></div>
<div id=summary>
Bla bla bla bla bla news today.            </div>
<div id=titlebiru>Read also :</div>
<ul>



How to regexp the date, news topic, and the news? into $variable, so the output will be:
Tuesday, 27/09/2005 09:11 EST, Today news is bla bla bla. Bla bla bla bla news today.
Back to top
View user's profile Send private message
De Kus
Revered One


Joined: 15 Dec 2002
Posts: 1361
Location: Germany

PostPosted: Tue Sep 27, 2005 4:06 am    Post subject: Reply with quote

I'd suggest to run a regexp on the complete html without string range and regsub changes.
if you are too unfamilar with regexpand dont want to link to the complete page I'd recommned you to try out this one:
http://forum.egghelp.org/viewtopic.php?t=9972
_________________
De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens...
Back to top
View user's profile Send private message MSN Messenger
demond
Revered One


Joined: 12 Jun 2004
Posts: 3073
Location: San Francisco, CA

PostPosted: Wed Sep 28, 2005 12:51 am    Post subject: Reply with quote

suppose you want to extract what is contained between some opening and closing tags:
Code:

[demond@whitepine demond]$ tclsh8.4
% set str "<tag attr=foo>some text</tag>"
<tag attr=foo>some text</tag>
% regexp {<tag.*?>(.*?)</tag>} $str -> str
1
% set str
some text
%

_________________
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code
Back to top
View user's profile Send private message Visit poster's website
De Kus
Revered One


Joined: 15 Dec 2002
Posts: 1361
Location: Germany

PostPosted: Wed Sep 28, 2005 3:40 am    Post subject: Reply with quote

Tip: use '[^>]' or '[^<]' instead of '.' in cases where <tag> </tag> is not unique, because regex tends to match the widest match, not the shortest match.
_________________
De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens...
Back to top
View user's profile Send private message MSN Messenger
demond
Revered One


Joined: 12 Jun 2004
Posts: 3073
Location: San Francisco, CA

PostPosted: Wed Sep 28, 2005 4:01 am    Post subject: Reply with quote

De Kus wrote:
Tip: use '[^>]' or '[^<]' instead of '.' in cases where <tag> </tag> is not unique, because regex tends to match the widest match, not the shortest match.


there's no need of that, '*?' is a non-greedy quantifier, see the docs
_________________
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code
Back to top
View user's profile Send private message Visit poster's website
De Kus
Revered One


Joined: 15 Dec 2002
Posts: 1361
Location: Germany

PostPosted: Wed Sep 28, 2005 5:02 am    Post subject: Reply with quote

ah sorry. must have missed it, hmm, should try if it is actually faster than the other variant.
_________________
De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens...
Back to top
View user's profile Send private message MSN Messenger
Display posts from previous:   
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    egghelp.org community Forum Index -> Archive All times are GMT - 4 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Forum hosting provided by Reverse.net

Powered by phpBB © 2001, 2005 phpBB Group
subGreen style by ktauber