egghelp.org community Forum Index
[ egghelp.org home | forum home ]
egghelp.org community
Discussion of eggdrop bots, shell accounts and tcl scripts.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Scraping a little bit of text

 
Post new topic   Reply to topic    egghelp.org community Forum Index -> Scripting Help
View previous topic :: View next topic  
Author Message
paulOr
Voice


Joined: 01 Nov 2008
Posts: 10

PostPosted: Mon Feb 23, 2009 4:29 pm    Post subject: Scraping a little bit of text Reply with quote

Code:
package require http
setudef flag serverinfo
variable serverquery "http://www.imghostr.net/"
variable servertimeout 10

bind pub - "!imghostr" checkserver


proc checkserver {nick host hand chan rest} {
       # chanset catch, use .chanset #yourchan +serverinfo to enable
       if {[lsearch -exact [channel info $chan] +serverinfo] == -1} { return 0 }
       # browser agent
       set http [::http::config -useragent "Mozilla"]

       # get url with error control
       catch {set http [::http::geturl "$::serverquery" -timeout [expr 1000 * $::servertimeout]]} error

       # case 1, no socket
       if {[string match -nocase "*couldn't open socket*" $error]} {
              putserv "privmsg $chan : Cannot open socket. Try again later."
              ::http::cleanup $http
              return 0
       }

       # case 2, timeout
       if { [::http::status $http] == "timeout" } {
              putserv "privmsg $chan : Website has timed out. Try again later."
              ::http::cleanup $http
              return 0
       }

       # case 3, success, get html
       set html [::http::data $http]

       # scrape the page
       if {![regexp -- {<li><label>Currently Hosting:</label>.*?</li>} $html - s_login]} {set s_login Unknown}

       # reformat scraped information and message to irc.
       puthelp "privmsg $chan :images : $s_login"
       return 1
}


So i done some searching and found what i think should do the job, iv added in the HTML sarounding what im wanting to show.

http://imghostr.net <-- i want the current image count: Currently Hosting ### Images.

Can anyone see where im going wrong?
Back to top
View user's profile Send private message
Papillon
Owner


Joined: 15 Feb 2002
Posts: 724
Location: *.no

PostPosted: Mon Feb 23, 2009 6:20 pm    Post subject: Reply with quote

try:
Code:
if {![regexp -- {<li><label>Currently Hosting:</label>(.+)</li>} $html - s_login]} {set s_login Unknown}

_________________
Elen sila lúmenn' omentielvo
Back to top
View user's profile Send private message MSN Messenger
arfer
Master


Joined: 26 Nov 2004
Posts: 436
Location: Manchester, UK

PostPosted: Mon Feb 23, 2009 7:38 pm    Post subject: Reply with quote

Code:

package require http

setudef flag images

set vTimeout 10
set vUrl http://imghostr.net/

bind PUB - !images pImages

proc pImages {nick uhost hand channel txt} {
    global vTimeout vUrl
    if {[channel get $channel images]} {
        set agent [::http::config -useragent "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"]
        if {![catch {set http [::http::geturl $vUrl -timeout [expr {$vTimeout * 1000}]]}]} {
            switch -- [::http::status $http] {
                "timeout" {putserv "PRIVMSG $channel :attempt to scrape $vUrl timed out after $vTimeout seconds"}
                "error" {putserv "PRIVMSG $channel :attempt to scrape $vUrl returned error [::http::error $http]"}
                "ok" {
                    switch -- [::http::ncode $http] {
                        200 {
                            regexp -- {Currently Hosting:\</label\>(.+?)Images} [::http::data $http] -> images
                            if {([info exists images]) && ([regexp -- {[0-9]+} $images])} {
                                putserv "PRIVMSG $channel :$vUrl is currently hosting [string trim $images] images"
                            } else {putserv "PRIVMSG $channel :the number of images hosted by $vUrl could not be found"}
                        }
                        default {putserv "PRIVMSG $channel :attempt to scrape $vUrl returned ncode [::http::ncode $http]"}
                    }
                }
            }
            ::http::cleanup $http
        } else {putserv "PRIVMSG $channel :attempted connection to $vUrl failed"}
    }
    return 0
}

_________________
I must have had nothing to do
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    egghelp.org community Forum Index -> Scripting Help All times are GMT - 4 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Forum hosting provided by Reverse.net

Powered by phpBB © 2001, 2005 phpBB Group
subGreen style by ktauber