| View previous topic :: View next topic |
| Author |
Message |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
|
| Back to top |
|
 |
danzigrules Voice
Joined: 02 Aug 2007 Posts: 17
|
Posted: Sun Aug 12, 2007 7:06 am Post subject: |
|
|
It is working great. Haven't had one problem with it at all.
Thanks again, rosc
danzigrules |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Sun Aug 12, 2007 11:37 am Post subject: |
|
|
cool, I was worried it would choke on tcl special chars in the urls.. I'm still a little concerned this script might possibly be breakable, making it produce an error, if someone makes a really screwed up webpage title then makes the script grab it..
I tried on my own test page with some tcl special chars and it didn't give an error but it did spit out {} chars which is how tcl protects/splits chars..
But I guess there's sufficient protection between the delay time config and the user flag permissions, so you don't get clobbered by idiots spamming url's in a chan. |
|
| Back to top |
|
 |
danzigrules Voice
Joined: 02 Aug 2007 Posts: 17
|
Posted: Sun Aug 12, 2007 8:30 pm Post subject: |
|
|
ok, may have found the first problem, well not really a problem, but is this the type of special characters you are talking about?
[danzigrules]http://www.break.com/rushhour3/man-vs-kids-karate-war.html
[GodBot] Man Vs Kids Karate War Video
Not a big deal to me, but thought I would post it here. |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Sun Aug 12, 2007 11:44 pm Post subject: |
|
|
Those are just html codes, which you can add to the [string map] line near the bottom. You'll prolly end up adding a hundred or more eventually (my dictionary script has about as many). Just put
" " " "
at the end of the line with "return [string map { so for and so on, that will replace   with a space. Read the manpage for 'string' to know the proper syntax if you run into probs adding to the string map. |
|
| Back to top |
|
 |
flashy Voice
Joined: 01 May 2006 Posts: 24
|
Posted: Sun Aug 19, 2007 12:03 am Post subject: |
|
|
can u make it ignore images posted on a channel ie .jpg's please.
..jpg - No title found. |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Sun Aug 19, 2007 3:32 am Post subject: |
|
|
Change the pubm bind's mask to be more specific, eg:
bind pubm $urltitle(pubmflags) {*://*.htm?} pubm:urltitle
should work. |
|
| Back to top |
|
 |
flashy Voice
Joined: 01 May 2006 Posts: 24
|
Posted: Sun Aug 19, 2007 5:26 am Post subject: |
|
|
| thank you will try. |
|
| Back to top |
|
 |
cruxing Voice
Joined: 05 Sep 2007 Posts: 9
|
Posted: Sat Jan 12, 2008 4:50 pm Post subject: |
|
|
I actually edited this by simply changing the error message output for "No title found" to "". This puts it under the character limit and simply causes it to not be reported since the string is too short. Seemed like the easiest/best way, even if a bit hacky, since it allows for html/html/xml/xhtml/php/asp etc all to report back without having to get real complex with more regex or whatever.
I do have another question though -- if a webpage is lagged, there seems to be a long delay, which is natural, but the title ends up getting pasted 2-3 times. Usually 3, simply because the lag is rarely in that special in between point for 2.
I'm very novice, and I can't for the life of me figure out why this is occurring. As a temporary workaround I've just shortened the timeout to about 2000, which causes it to just error out on the slow sites instead of reporting it 3 times a few seconds later, but I'd really appreciate it if someone could take a look and see if they catch something simple.
Thanks! |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Sat Jan 12, 2008 7:13 pm Post subject: |
|
|
| Are people triggering it multiple times in their impatience? |
|
| Back to top |
|
 |
cruxing Voice
Joined: 05 Sep 2007 Posts: 9
|
Posted: Sun Jan 13, 2008 5:03 am Post subject: |
|
|
Ah, no, def. not. It's occurring just fine with myself as the only person triggering it. I added some echos to watch the process and try and figure out where it was looping, and to the best of my observation the entire cycle is getting rerun.
It only happens on slow pages, the slower the site, the more it echos. So of course if I try to test it now nothing seems wrong since it's the middle of the night...
Essentially, the proc pubm:urltitle occurs, followed right away by the proc urltitle, then a brief pause (perhaps .5-1 second), then the catch and cleanup, where it then rolls right back into the proc pubm:urltitle instantly and the cycle repeats.
The cycle seems to 'end' whenever the results from the very first http get is successful, and I've had it report 2 and 3 times, depending on the lag of the site. I've yet to pull off 4, but I'm not sure if I've found a site slow enough to accomplish that yet.
I'm not that coding inclined, unfortunately, but a friend and I have dug through it and we can't for the life of us figure out what would cause it to repeat like that. |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Sun Jan 13, 2008 11:37 am Post subject: |
|
|
| Give me a url to duplicate the problem. |
|
| Back to top |
|
 |
cruxing Voice
Joined: 05 Sep 2007 Posts: 9
|
Posted: Sun Jan 13, 2008 2:06 pm Post subject: |
|
|
http://www.webware.com/8301-1_109-9848317-2.html was the first one I noticed it with and was using it last night to test, although it seems faster now. Doesn't appear to be duplicating it as of this moment.
4chan links during the day usually work as well. As would anything recently slashdotted/wanged/farked, likely.
[10:30] <@a> http://www.webware.com/8301-1_109-9848317-2.html
[10:30] <@bot> URL: Bloggers behaving badly: Gizmodo messes with CES flat screens | Webware : Cool Web apps for everyone
[10:30] <@bot> URL: Bloggers behaving badly: Gizmodo messes with CES flat screens | Webware : Cool Web apps for everyone
[10:40] <@c> http://www.webware.com/8301-1_109-9848317-2.html
[10:40] <@bot> URL: Bloggers behaving badly: Gizmodo messes with CES flat screens | Webware : Cool Web apps for everyone
[10:40] <@a> weird
[13:58] <@c> hm hm hm
[13:59] <@c> http://www.dieselsweeties.com/archive.php?s=1924
[13:59] <@bot> URL: diesel sweeties: pixelated robot romance web comic
[13:59] <@c> called only once that time?
[13:59] <@a> yup
[13:59] <@bot> http://www.webware.com/8301-1_109-9848317-2.html
[13:59] <@c> so what is the order of the logs
[13:59] <@bot> URL: Bloggers behaving badly: Gizmodo messes with CES flat screens | Webware : Cool Web apps for everyone
[13:59] <@a> heh this one called 3 times
[13:59] <@bot> URL: Bloggers behaving badly: Gizmodo messes with CES flat screens | Webware : Cool Web apps for everyone
[14:00] <@bot> URL: Bloggers behaving badly: Gizmodo messes with CES flat screens | Webware : Cool Web apps for everyone
Any link that has done this has been notably slow for all of us when loading it, so I'm certain it's not limited to the shell the bot resides on. |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Sun Jan 13, 2008 3:22 pm Post subject: |
|
|
I am not able to reproduce the problem with any of those urls.
Have you modified the script? |
|
| Back to top |
|
 |
cruxing Voice
Joined: 05 Sep 2007 Posts: 9
|
Posted: Sun Jan 13, 2008 3:49 pm Post subject: |
|
|
The only modifications I've made have been output ones. The shortened URL: and squelching the error messages by making them "". While I can't imagine they'd impact it, hey, who knows. I certainly don't!
Presently, as I mentioned, none of those links are currently running slow enough to duplicate. They're all only returning 1 hit for me as well.
It is somewhat difficult to test until there's a website sufficiently slow, which makes it a little trickier, heh.
Modifications made:
| Code: |
puthelp "PRIVMSG $chan :URL: $urtitle"
if {[string match -nocase "*couldn't open socket*" $error]} {
return "wtf, srsly."
}
if { [::http::status $http] == "timeout" } {
return ""
}
and...
if {[regexp -nocase {<title>(.*?)</title>} $data match title]} {
return [string map { {href=} "" \" "" } $title]
} else {
return ""
}
|
|
|
| Back to top |
|
 |
|