| View previous topic :: View next topic |
| Author |
Message |
paulOr Voice
Joined: 01 Nov 2008 Posts: 10
|
Posted: Mon Feb 23, 2009 4:29 pm Post subject: Scraping a little bit of text |
|
|
| Code: | package require http
setudef flag serverinfo
variable serverquery "http://www.imghostr.net/"
variable servertimeout 10
bind pub - "!imghostr" checkserver
proc checkserver {nick host hand chan rest} {
# chanset catch, use .chanset #yourchan +serverinfo to enable
if {[lsearch -exact [channel info $chan] +serverinfo] == -1} { return 0 }
# browser agent
set http [::http::config -useragent "Mozilla"]
# get url with error control
catch {set http [::http::geturl "$::serverquery" -timeout [expr 1000 * $::servertimeout]]} error
# case 1, no socket
if {[string match -nocase "*couldn't open socket*" $error]} {
putserv "privmsg $chan : Cannot open socket. Try again later."
::http::cleanup $http
return 0
}
# case 2, timeout
if { [::http::status $http] == "timeout" } {
putserv "privmsg $chan : Website has timed out. Try again later."
::http::cleanup $http
return 0
}
# case 3, success, get html
set html [::http::data $http]
# scrape the page
if {![regexp -- {<li><label>Currently Hosting:</label>.*?</li>} $html - s_login]} {set s_login Unknown}
# reformat scraped information and message to irc.
puthelp "privmsg $chan :images : $s_login"
return 1
} |
So i done some searching and found what i think should do the job, iv added in the HTML sarounding what im wanting to show.
http://imghostr.net <-- i want the current image count: Currently Hosting ### Images.
Can anyone see where im going wrong? |
|
| Back to top |
|
 |
Papillon Owner

Joined: 15 Feb 2002 Posts: 724 Location: *.no
|
Posted: Mon Feb 23, 2009 6:20 pm Post subject: |
|
|
try:
| Code: | | if {![regexp -- {<li><label>Currently Hosting:</label>(.+)</li>} $html - s_login]} {set s_login Unknown} |
_________________ Elen sila lúmenn' omentielvo |
|
| Back to top |
|
 |
arfer Master

Joined: 26 Nov 2004 Posts: 436 Location: Manchester, UK
|
Posted: Mon Feb 23, 2009 7:38 pm Post subject: |
|
|
| Code: |
package require http
setudef flag images
set vTimeout 10
set vUrl http://imghostr.net/
bind PUB - !images pImages
proc pImages {nick uhost hand channel txt} {
global vTimeout vUrl
if {[channel get $channel images]} {
set agent [::http::config -useragent "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"]
if {![catch {set http [::http::geturl $vUrl -timeout [expr {$vTimeout * 1000}]]}]} {
switch -- [::http::status $http] {
"timeout" {putserv "PRIVMSG $channel :attempt to scrape $vUrl timed out after $vTimeout seconds"}
"error" {putserv "PRIVMSG $channel :attempt to scrape $vUrl returned error [::http::error $http]"}
"ok" {
switch -- [::http::ncode $http] {
200 {
regexp -- {Currently Hosting:\</label\>(.+?)Images} [::http::data $http] -> images
if {([info exists images]) && ([regexp -- {[0-9]+} $images])} {
putserv "PRIVMSG $channel :$vUrl is currently hosting [string trim $images] images"
} else {putserv "PRIVMSG $channel :the number of images hosted by $vUrl could not be found"}
}
default {putserv "PRIVMSG $channel :attempt to scrape $vUrl returned ncode [::http::ncode $http]"}
}
}
}
::http::cleanup $http
} else {putserv "PRIVMSG $channel :attempted connection to $vUrl failed"}
}
return 0
}
|
_________________ I must have had nothing to do |
|
| Back to top |
|
 |
|