egghelp.org community Forum Index
[ egghelp.org home | forum home ]
egghelp.org community
Discussion of eggdrop bots, shell accounts and tcl scripts.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

URL Title grabber
Goto page 1, 2  Next
 
Post new topic   Reply to topic    egghelp.org community Forum Index -> Script Support & Releases
View previous topic :: View next topic  
Author Message
rosc2112
Revered One


Joined: 19 Feb 2006
Posts: 1454
Location: Northeast Pennsylvania

PostPosted: Sat Aug 11, 2007 1:49 pm    Post subject: URL Title grabber Reply with quote

Just a little script to grab url titles when a url is posted in channel:

http://members.dandy.net/~fbn/urltitle.tcl.txt

Uploaded to archive as well.
Back to top
View user's profile Send private message
danzigrules
Voice


Joined: 02 Aug 2007
Posts: 17

PostPosted: Sun Aug 12, 2007 7:06 am    Post subject: Reply with quote

It is working great. Haven't had one problem with it at all.


Thanks again, rosc


danzigrules
Back to top
View user's profile Send private message
rosc2112
Revered One


Joined: 19 Feb 2006
Posts: 1454
Location: Northeast Pennsylvania

PostPosted: Sun Aug 12, 2007 11:37 am    Post subject: Reply with quote

cool, I was worried it would choke on tcl special chars in the urls.. I'm still a little concerned this script might possibly be breakable, making it produce an error, if someone makes a really screwed up webpage title then makes the script grab it..

I tried on my own test page with some tcl special chars and it didn't give an error but it did spit out {} chars which is how tcl protects/splits chars..

But I guess there's sufficient protection between the delay time config and the user flag permissions, so you don't get clobbered by idiots spamming url's in a chan.
Back to top
View user's profile Send private message
danzigrules
Voice


Joined: 02 Aug 2007
Posts: 17

PostPosted: Sun Aug 12, 2007 8:30 pm    Post subject: Reply with quote

ok, may have found the first problem, well not really a problem, but is this the type of special characters you are talking about?

[danzigrules]http://www.break.com/rushhour3/man-vs-kids-karate-war.html
[GodBot] Man Vs Kids Karate War Video

Not a big deal to me, but thought I would post it here.
Back to top
View user's profile Send private message
rosc2112
Revered One


Joined: 19 Feb 2006
Posts: 1454
Location: Northeast Pennsylvania

PostPosted: Sun Aug 12, 2007 11:44 pm    Post subject: Reply with quote

Those are just html codes, which you can add to the [string map] line near the bottom. You'll prolly end up adding a hundred or more eventually (my dictionary script has about as many). Just put

" " " "

at the end of the line with "return [string map { so for and so on, that will replace &nbsp with a space. Read the manpage for 'string' to know the proper syntax if you run into probs adding to the string map.
Back to top
View user's profile Send private message
flashy
Voice


Joined: 01 May 2006
Posts: 24

PostPosted: Sun Aug 19, 2007 12:03 am    Post subject: Reply with quote

can u make it ignore images posted on a channel ie .jpg's please.
..jpg - No title found.
Back to top
View user's profile Send private message
rosc2112
Revered One


Joined: 19 Feb 2006
Posts: 1454
Location: Northeast Pennsylvania

PostPosted: Sun Aug 19, 2007 3:32 am    Post subject: Reply with quote

Change the pubm bind's mask to be more specific, eg:

bind pubm $urltitle(pubmflags) {*://*.htm?} pubm:urltitle

should work.
Back to top
View user's profile Send private message
flashy
Voice


Joined: 01 May 2006
Posts: 24

PostPosted: Sun Aug 19, 2007 5:26 am    Post subject: Reply with quote

thank you will try.
Back to top
View user's profile Send private message
cruxing
Voice


Joined: 05 Sep 2007
Posts: 9

PostPosted: Sat Jan 12, 2008 4:50 pm    Post subject: Reply with quote

I actually edited this by simply changing the error message output for "No title found" to "". This puts it under the character limit and simply causes it to not be reported since the string is too short. Seemed like the easiest/best way, even if a bit hacky, since it allows for html/html/xml/xhtml/php/asp etc all to report back without having to get real complex with more regex or whatever.

I do have another question though -- if a webpage is lagged, there seems to be a long delay, which is natural, but the title ends up getting pasted 2-3 times. Usually 3, simply because the lag is rarely in that special in between point for 2.

I'm very novice, and I can't for the life of me figure out why this is occurring. As a temporary workaround I've just shortened the timeout to about 2000, which causes it to just error out on the slow sites instead of reporting it 3 times a few seconds later, but I'd really appreciate it if someone could take a look and see if they catch something simple.

Thanks!
Back to top
View user's profile Send private message
rosc2112
Revered One


Joined: 19 Feb 2006
Posts: 1454
Location: Northeast Pennsylvania

PostPosted: Sat Jan 12, 2008 7:13 pm    Post subject: Reply with quote

Are people triggering it multiple times in their impatience?
Back to top
View user's profile Send private message
cruxing
Voice


Joined: 05 Sep 2007
Posts: 9

PostPosted: Sun Jan 13, 2008 5:03 am    Post subject: Reply with quote

Ah, no, def. not. It's occurring just fine with myself as the only person triggering it. I added some echos to watch the process and try and figure out where it was looping, and to the best of my observation the entire cycle is getting rerun.

It only happens on slow pages, the slower the site, the more it echos. So of course if I try to test it now nothing seems wrong since it's the middle of the night... Smile

Essentially, the proc pubm:urltitle occurs, followed right away by the proc urltitle, then a brief pause (perhaps .5-1 second), then the catch and cleanup, where it then rolls right back into the proc pubm:urltitle instantly and the cycle repeats.

The cycle seems to 'end' whenever the results from the very first http get is successful, and I've had it report 2 and 3 times, depending on the lag of the site. I've yet to pull off 4, but I'm not sure if I've found a site slow enough to accomplish that yet.

I'm not that coding inclined, unfortunately, but a friend and I have dug through it and we can't for the life of us figure out what would cause it to repeat like that.
Back to top
View user's profile Send private message
rosc2112
Revered One


Joined: 19 Feb 2006
Posts: 1454
Location: Northeast Pennsylvania

PostPosted: Sun Jan 13, 2008 11:37 am    Post subject: Reply with quote

Give me a url to duplicate the problem.
Back to top
View user's profile Send private message
cruxing
Voice


Joined: 05 Sep 2007
Posts: 9

PostPosted: Sun Jan 13, 2008 2:06 pm    Post subject: Reply with quote

http://www.webware.com/8301-1_109-9848317-2.html was the first one I noticed it with and was using it last night to test, although it seems faster now. Doesn't appear to be duplicating it as of this moment.

4chan links during the day usually work as well. As would anything recently slashdotted/wanged/farked, likely.

[10:30] <@a> http://www.webware.com/8301-1_109-9848317-2.html
[10:30] <@bot> URL: Bloggers behaving badly: Gizmodo messes with CES flat screens | Webware : Cool Web apps for everyone
[10:30] <@bot> URL: Bloggers behaving badly: Gizmodo messes with CES flat screens | Webware : Cool Web apps for everyone

[10:40] <@c> http://www.webware.com/8301-1_109-9848317-2.html
[10:40] <@bot> URL: Bloggers behaving badly: Gizmodo messes with CES flat screens | Webware : Cool Web apps for everyone
[10:40] <@a> weird

[13:58] <@c> hm hm hm
[13:59] <@c> http://www.dieselsweeties.com/archive.php?s=1924
[13:59] <@bot> URL: diesel sweeties: pixelated robot romance web comic
[13:59] <@c> called only once that time?
[13:59] <@a> yup
[13:59] <@bot> http://www.webware.com/8301-1_109-9848317-2.html
[13:59] <@c> so what is the order of the logs
[13:59] <@bot> URL: Bloggers behaving badly: Gizmodo messes with CES flat screens | Webware : Cool Web apps for everyone
[13:59] <@a> heh this one called 3 times
[13:59] <@bot> URL: Bloggers behaving badly: Gizmodo messes with CES flat screens | Webware : Cool Web apps for everyone
[14:00] <@bot> URL: Bloggers behaving badly: Gizmodo messes with CES flat screens | Webware : Cool Web apps for everyone

Any link that has done this has been notably slow for all of us when loading it, so I'm certain it's not limited to the shell the bot resides on.
Back to top
View user's profile Send private message
rosc2112
Revered One


Joined: 19 Feb 2006
Posts: 1454
Location: Northeast Pennsylvania

PostPosted: Sun Jan 13, 2008 3:22 pm    Post subject: Reply with quote

I am not able to reproduce the problem with any of those urls.

Have you modified the script?
Back to top
View user's profile Send private message
cruxing
Voice


Joined: 05 Sep 2007
Posts: 9

PostPosted: Sun Jan 13, 2008 3:49 pm    Post subject: Reply with quote

The only modifications I've made have been output ones. The shortened URL: and squelching the error messages by making them "". While I can't imagine they'd impact it, hey, who knows. I certainly don't!

Presently, as I mentioned, none of those links are currently running slow enough to duplicate. They're all only returning 1 hit for me as well.

It is somewhat difficult to test until there's a website sufficiently slow, which makes it a little trickier, heh.

Modifications made:
Code:

puthelp "PRIVMSG $chan :URL: $urtitle"

if {[string match -nocase "*couldn't open socket*" $error]} {
   return "wtf, srsly."
}

if { [::http::status $http] == "timeout" } {
   return ""
}

and...

if {[regexp -nocase {<title>(.*?)</title>} $data match title]} {
   return [string map { {href=} "" \" "" } $title]
} else {
   return ""
}

Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    egghelp.org community Forum Index -> Script Support & Releases All times are GMT - 4 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Forum hosting provided by Reverse.net

Powered by phpBB © 2001, 2005 phpBB Group
subGreen style by ktauber