egghelp.org community Forum Index
[ egghelp.org home | forum home ]
egghelp.org community
Discussion of eggdrop bots, shell accounts and tcl scripts.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Parse url from web content

 
Post new topic   Reply to topic    egghelp.org community Forum Index -> Scripting Help
View previous topic :: View next topic  
Author Message
Elfriede
Halfop


Joined: 07 Aug 2007
Posts: 67

PostPosted: Fri May 21, 2010 7:35 am    Post subject: Parse url from web content Reply with quote

Hopefully someone can tell me whats wrong on that. Im going to parse a url out of a webpage, but all i got is Data: many times ^^ Ive searched alot on this Forum, but im not getting the point, how to parse :/ I just wanna output the first matching url.

Code:

bind pub - !geturl geturl:proc
proc geturl:proc {nick host handle channel text} {
   set url [lindex $text 0]
   set token [::http::geturl $url]
   set content [::http::data $token]
   ::http::cleanup $content
   foreach line [split $content \n] {
      if {[regexp -nocase {http(.*?)} $content match url]} {
         sendmsg #test "Data: [join $url]"
      }
   }
}
Back to top
View user's profile Send private message
nml375
Revered One


Joined: 04 Aug 2006
Posts: 2857

PostPosted: Fri May 21, 2010 2:26 pm    Post subject: Reply with quote

Try using the greedy quantifier * instead of the non-greedy *?
Also, the output is most likely not a list, so don't use join. Similarly, $text is a string, not a list, so use split before attempting to use lindex:
Next, use $line, not $content in your regular expression, otherwize the foreach loop would be pretty pointless...
Code:
bind pub - !geturl geturl:proc
proc geturl:proc {nick host handle channel text} {
   set url [lindex [split $text] 0]
   set token [::http::geturl $url]
   set content [::http::data $token]
   ::http::cleanup $content
   foreach line [split $content \n] {
      if {[regexp -nocase {http(.*)} $line match url]} {
         sendmsg #test "Data: $url"
      }
   }
}

_________________
NML_375, idling at #eggdrop@IrcNET
Back to top
View user's profile Send private message
Elfriede
Halfop


Joined: 07 Aug 2007
Posts: 67

PostPosted: Fri May 21, 2010 3:45 pm    Post subject: Reply with quote

Many thanks for ur answer, but the output looks atm like:

Data: ://imdb.de/title/... Û

The http is cutted and theres a space after the url, where the output should end - can u please add that ? Smile

PS: how to stop eg on first match ? ^^
Back to top
View user's profile Send private message
nml375
Revered One


Joined: 04 Aug 2006
Posts: 2857

PostPosted: Fri May 21, 2010 5:04 pm    Post subject: Reply with quote

If you want the full line matching the url, please use $match instead of $url in your sendmsg command.

To stop further processing within the foreach-loop, use the break command just after the sendmsg command.
_________________
NML_375, idling at #eggdrop@IrcNET
Back to top
View user's profile Send private message
Elfriede
Halfop


Joined: 07 Aug 2007
Posts: 67

PostPosted: Sat May 22, 2010 3:21 am    Post subject: Reply with quote

Many thanks!!! Now its working, like ive wanted it Smile
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    egghelp.org community Forum Index -> Scripting Help All times are GMT - 4 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Forum hosting provided by Reverse.net

Powered by phpBB © 2001, 2005 phpBB Group
subGreen style by ktauber