This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

Durby - 0.3.0. New project page on code.google.com

Support & discussion of released scripts, and announcements of new releases.
l
lee8oi
Halfop
Posts: 63
Joined: Sat Jun 04, 2011 2:05 pm
Location: Michigan,United States.
Contact:

Durby - 0.3.0. New project page on code.google.com

Post by lee8oi »

Download the latest version of Durby on google code at https://code.google.com/p/durby/

Straight from the Readme:

Theres a short story here. Webby script is a project by speechles designed
to handle the task of grabbing information from web links and testing regexp in channels. It was used to replace the problematic http information grabbing code in other scripts such as the unofficial Incith google script.

'Durby' was originally a project called durltitle which was intended to be a script that grabs urls from channel messages and returns the title content. After for working with speechles in fixing the elusive utf-8 bugs in webby I decided it would be smarter to base a title grabber script on a proven system instead of reinventing the wheel. So Durltitle merged with Webby and thus created 'Durby'.

Durby adds these new features & changes to webby:

Urlwatch - for grabbing urls from channel messages and returning the information automatically.

Pattern Ignore - Allows you to configure the script to ignore urls that match predefined ignore patterns.

Nick Ignore - Allows you to configure the script to ignore requests & urls posted in channel by certain nicks. Useful for ignoring other bots.

Verbose Mode - Can be enabled by default or used on demand with the --verbose switch to append the urls type info and description to the results. Durby defaults to simply showing title and tiny url.

Title Collection - Enabled by default. Sets durby to collect titles and display the results at once instead of posting each result individually when urlwatch finds multiple links. (Verbose mode disables this feature).
Last edited by lee8oi on Wed May 02, 2012 1:59 pm, edited 29 times in total.
l
lee8oi
Halfop
Posts: 63
Joined: Sat Jun 04, 2011 2:05 pm
Location: Michigan,United States.
Contact:

Durltitle - Modern fork of 'urltitle'. Latest version: 0.1.3

Post by lee8oi »

Old message. No longer applies to current project.
Last edited by lee8oi on Mon Oct 31, 2011 1:33 pm, edited 1 time in total.
l
lee8oi
Halfop
Posts: 63
Joined: Sat Jun 04, 2011 2:05 pm
Location: Michigan,United States.
Contact:

Durltitle - Modern fork of 'urltitle'. Latest version: 0.2.2

Post by lee8oi »

Old message. No longer applies to project.
Last edited by lee8oi on Mon Oct 31, 2011 1:33 pm, edited 2 times in total.
User avatar
Anahel
Halfop
Posts: 48
Joined: Fri Jul 03, 2009 6:18 pm
Location: Dom!

Post by Anahel »

nice script lee8oi but i've problem with unicode,

Code: Select all

<~tomek> http://fishki.net/comment.php?id=20554
<+Nekomimi> URL Title: Неприятности случаются везде (20 фото) - Fishki.Net | Фишкина картинка
this is title when i was using v0.1 - eveything is fine but v0.2 breaks it: (yet 0.2 display titles for japanese sites when 0.1 couldn't)

Code: Select all

<~tomek> http://fishki.net/comment.php?id=20554
<+Nekomimi> Tytul: Íåïðèÿòíîñòè ñëó÷àþòñÿ âåçäå (20 ôîòî) - Fishki.Net | Ôèøêèíà êàðòèíêà
my bot is patched to use utf-8

and second thing:

Code: Select all

<~tomek> http://i.huffpost.com/gadgets/slideshows/193500/slide_193500_409426_large.jpg
<+Nekomimi> Tytul: Unable to retrieve title for http://i.huffpost.com/gadgets/slideshows/193500/slide_193500_409426_large.jpg
both v0.1 and v0.2 want to display title for images (when it should ignore all jpg/gif/png and other)
l
lee8oi
Halfop
Posts: 63
Joined: Sat Jun 04, 2011 2:05 pm
Location: Michigan,United States.
Contact:

Thanks for the helpful reply.

Post by lee8oi »

Thanks for your helpful reply. I hadn't fully considered the utf-8 or language related issues. I will certainly look into this right away. If necessary I'll try a new approach starting from 0.1 again. Thanks again Anahel!
l
lee8oi
Halfop
Posts: 63
Joined: Sat Jun 04, 2011 2:05 pm
Location: Michigan,United States.
Contact:

Durltitle - Modern fork of 'urltitle'. Latest version: 0.2.3

Post by lee8oi »

Old message. No longer applies to current project.
Last edited by lee8oi on Mon Oct 31, 2011 1:34 pm, edited 2 times in total.
User avatar
Anahel
Halfop
Posts: 48
Joined: Fri Jul 03, 2009 6:18 pm
Location: Dom!

Re: Durltitle - Modern fork of 'urltitle'. Latest version: 0

Post by Anahel »

lee8oi wrote:Update for v0.2
3.Added a utf-8 fix. Appears to work correctly on bots with and without the utf-8 hack/patch found on http://eggwiki.org/Utf-8
you didn't update your github
l
lee8oi
Halfop
Posts: 63
Joined: Sat Jun 04, 2011 2:05 pm
Location: Michigan,United States.
Contact:

Re: Durltitle - Modern fork of 'urltitle'. Latest version: 0

Post by lee8oi »

version removed.
Last edited by lee8oi on Tue Oct 25, 2011 3:00 pm, edited 1 time in total.
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

Code: Select all

<speechl3s> !webby http://fishki.net/comment.php?id=20554 
<sp33chy> Íåïðèÿòíîñòè ñëó÷àþòñÿ âåçäå (20 ôîòî) - Fishki.Net | Ôèøêèíà êàðòèíêà ( http://is.gd/aZpbBM )( 200; text/html; cp1251; 76204 bytes )
<sp33chy> Íåïðèÿòíîñòè ñëó÷àþòñÿ âåçäå (20 ôîòî) - Ôèøêè.ÍÅÒ - Ñàéò Õîðîøåãî Íàñòðîåíèÿ!
What if, it isn't utf-8? Forcing detection isn't a fix all to this situation.

Code: Select all

catch {exec $urltitle(curlpath) -U "tcluser:anonymous" --location --max-redirs \
$urltitle(maxdirs) --connect-timeout $urltitle(timeout) --max-time \
$urltitle(maxtime) --insecure "$url"} html
...
return [htmlparse::mapEscapes $title]
You do realize, using "curl" and "html-parse" seriously affects the audience it will entertain. This means, the more things you require, the more people stop using it.

Not enough people realize this, seriously. Isn't some random jab at your face rabbit punch style. It is straight up words of wisdom. The less you require, the bare absolute minimum. The easier it is then to mirror the useful of that script. Most people get angry, having to "jump-thru-hoops" to make a script useful and abandon that one, and move to the next.

Rather than use the same old methods, like everybody else does (which SCREAMS to me of laziness), evolve your skills, reduce your dependencies, grow your skills, become unique. Thats what I have done and you can as well. Remove that "curl" limitation, and ditch your dependence on html-parse (it is flawed, patched/unpatched bots). Told HM2K on efnet the same, his scripts are so dependent on libs that it is ridiculously implausible that a new user would ever try to use anything of his.

Check out my google,webby,birdy scripts for a better way of converting html entities. Learn http-package and how to follow redirects, associate cookies, etc.. Basically making a pure-tcl curl.. Webby does this, it's really not that hard.. You create a far better impact, when you keep all this in mind. Make it simple to use, useful out-of-the-box, and respectful of every language. You shall have every user of an eggdrop using your stuff... ;)
Last edited by speechles on Tue Oct 25, 2011 2:43 pm, edited 1 time in total.
User avatar
Anahel
Halfop
Posts: 48
Joined: Fri Jul 03, 2009 6:18 pm
Location: Dom!

Re: Durltitle - Modern fork of 'urltitle'. Latest version: 0

Post by Anahel »

lee8oi wrote:
Anahel wrote:
lee8oi wrote:Update for v0.2
3.Added a utf-8 fix. Appears to work correctly on bots with and without the utf-8 hack/patch found on http://eggwiki.org/Utf-8
you didn't update your github
lol whoops. hehe. Well thats ok. I have another update to add AND github is updated :)

4.Added ignore system. Patterns can be added in the configuration section. urls are compared to the patterns to determin wether to ignore the url or retrieve title.
this update broke it even more:

Code: Select all

<~tomek> http://fishki.net/comment.php?id=20554
<+Nekomimi> URL Title: Íåïðèÿòíîñòè ñëó÷àþòñÿ âåçäå (20 ôîòî) - Fishki.Net | Ã&#148;èøêèíà êàðòèíêà
<~tomek> http://www.nicovideo.jp/video_top
<+Nekomimi> URL Title: ã&#131;&#139;ã&#130;³ã&#131;&#139;ã&#130;³å&#139;&#149;ç&#148;»(å&#142;&#159;宿)
l
lee8oi
Halfop
Posts: 63
Joined: Sat Jun 04, 2011 2:05 pm
Location: Michigan,United States.
Contact:

back to the drawing board. The return to 0.1

Post by lee8oi »

ok. Thanks speechles. Great words of wisdom. Right now I'm mostly just experimenting. I'll take another look at 0.1 and work with the http package and figure things out. I did poke around in webby some but I'm not quite advanced enough to sort that mess out. I liked 'urltitle' because it gave me a simpler starting point to work from. Easier to wrap my head around whats happening. But thanks again for your heads up. Its probably a good time to scrap what I have so far before I find myself dissappointed later. This script isn't being submitted to the archive anyways. So no big loss. Just a learning experience. Forgive me if my next attempt seems lazy too....
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Re: back to the drawing board. The return to 0.1

Post by speechles »

lee8oi wrote:ok. Thanks speechles. Great words of wisdom. Right now I'm mostly just experimenting. I'll take another look at 0.1 and work with the http package and figure things out. I did poke around in webby some but I'm not quite advanced enough to sort that mess out. I liked 'urltitle' because it gave me a simpler starting point to work from. Easier to wrap my head around whats happening. But thanks again for your heads up. Its probably a good time to scrap what I have so far before I find myself dissappointed later. This script isn't being submitted to the archive anyways. So no big loss. Just a learning experience. Forgive me if my next attempt seems lazy too....
wait, then collaboration is in place. Because notice "webby" and it's reply? Pretty much isn't readable russian. When it can be...

And lazy means, that you aren't learning core items tcl offers. http package within tcl is more powerful than cURL could ever hope to be. Expecting "curl" to be available on every system eggdrop is on, will add comments to your thread here on egghelp. Those comments consisting of mostly, where do I get curl for windows? or for .. <insert os here>? It just adds another complexity and dynamic that you hop-skip past. So it is hurting you, your scripts. More than the users. It shows that you know how to walk and run in tcl. But you lack the skills to build a bike, or a car.

Code: Select all

set text [encoding convertto "utf-8" [encoding convertfrom cp-1251 $text] $text]
There are ways to get there. A patched bot would not require the "convertto" part. I just haven't had the time to test it all out on a variety of platforms/bots both patched and unpatched yet. This Thursday and Friday I have some days off from my real-life full time job. At that point get together with me, here or via iRC and we can help each other. Webby can detect charset conflicts and reports charset it detected for just this reason. So that I could one day, find a reason to fix it. I expected to have had it done long ago. But it never quite "got there"... heh (read this as LAZINESS) ... yeah, I'm human. I suffer from it too.
Last edited by speechles on Tue Oct 25, 2011 3:16 pm, edited 1 time in total.
l
lee8oi
Halfop
Posts: 63
Joined: Sat Jun 04, 2011 2:05 pm
Location: Michigan,United States.
Contact:

Re: back to the drawing board. The return to 0.1

Post by lee8oi »

speechles wrote:
lee8oi wrote:ok. Thanks speechles. Great words of wisdom. Right now I'm mostly just experimenting. I'll take another look at 0.1 and work with the http package and figure things out. I did poke around in webby some but I'm not quite advanced enough to sort that mess out. I liked 'urltitle' because it gave me a simpler starting point to work from. Easier to wrap my head around whats happening. But thanks again for your heads up. Its probably a good time to scrap what I have so far before I find myself dissappointed later. This script isn't being submitted to the archive anyways. So no big loss. Just a learning experience. Forgive me if my next attempt seems lazy too....
wait, then collaboration is in place. Because notice "webby" and it's reply? Pretty much isn't readable russian. When it can be...

And lazy means, that you aren't learning core items tcl offers. http package within tcl is more powerful than cURL could ever hope to be. Expecting "curl" to be available on every system eggdrop is on, will add comments to your thread here on egghelp. Those comments consisting of mostly, where do I get curl for windows? or for .. <insert os here>? It just adds another complexity and dynamic that you hop-skip past. So it is hurting you, your scripts. More than the users. It shows that you know how to walk and run in tcl. But you lack the skills to build a bike, or a car.

Code: Select all

set text [encoding convertto [encoding convertfrom cp-1251 $text] $text]
There are ways to get there. I just haven't had the time to test it. This Thursday and Friday I have some days off from my real-life full time job. At that point get together with me, here or via iRC and we can help each other. Webby can detect charset conflicts and reports charset it detected for just this reason. So that I could one day, find a reason to fix it. I
I would certainly love to work with you. When I use:

Code: Select all

set urtitle [encoding convertto utf-8 $urtitle]

it seems to work in my irc client. Even trying the link Anahel is testing with correctly shows the text for me. I added this to 0.1.3 and tested it. It works for me. But it broke for Anahel. Any ideas?

p.s. github is updated to 0.1.4 now with the above code bit added for utf-8 fix.
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Re: back to the drawing board. The return to 0.1

Post by speechles »

Code: Select all

set urtitle [encoding convertto utf-8 $urtitle]

it seems to work in my irc client. Even trying the link Anahel is testing with correctly shows the text for me. I added this to 0.1.3 and tested it. It works for me. But it broke for Anahel. Any ideas?

------------------------

A patched bot, has a default system encoding of "utf-8".
An unpatched bot, has a default system encoding of usually, iso8859-1.

Http-package will fall back to default if it cannot detect the charset. This makes some troublesome sites that never work unpatched. Work patched with no encoding changes required. It's obvious why.

But the magic to build a "fits all" website charset detector is my intent. So that piece of code, every eggdrop developer can use, and submit changes to. So that the last metroid is in captivity. The world can be at peace..

Time for work.. bbl ;)
l
lee8oi
Halfop
Posts: 63
Joined: Sat Jun 04, 2011 2:05 pm
Location: Michigan,United States.
Contact:

Re: back to the drawing board. The return to 0.1

Post by lee8oi »

I just ran 0.1.4 on both patched and unpatched bots. it returns the title text correctly for me. Of course this version doesn't have good redirect support. What channel are you hiding in anyways? I want to talk to you.
Post Reply