This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

links - url and img collector

Support & discussion of released scripts, and announcements of new releases.
Post Reply
n
ngtjah
Voice
Posts: 5
Joined: Sun Mar 30, 2014 11:27 pm

links - url and img collector

Post by ngtjah »

A web page and scripts that collect and display links and images from a IRC channel with the help of an eggdrop bot's channel log.

https://github.com/ngtjah/links

It can do some other things like integrate with twitter and pocket.

This is a little project that I've been working on for a few years driven by my little IRC community and I thought this might be a good place to share it and get some feedback.

Check out the github wiki for more information.

-ngtjah
N
Nocty
Voice
Posts: 15
Joined: Tue Jun 17, 2014 3:18 pm

Works! Except for YouTube...

Post by Nocty »

So thanks to your excellent guide I was able to follow the directions and, with a little bit of troubleshooting and only minimal Linux knowledge (running this on Debian), I was able to get this script mostly working.

For some reason, however, it does not seem to like the YouTube links I tested it with, and seems to decide they are error pages.

Output of Links_logs.log

Code: Select all

This URL: https://www.youtube.com/watch?v=HMUDVMiITOU&feature=kp doesn't really exist!! Return Code: 501

$VAR1 = bless( {
                 'isdupe' => 0,
                 'date' => '2014-06-17 14:10:31',
                 'parseline' => '[14:10:31:2014-06-17] <Nocty> https://www.youtube.com/watch?v=HMUDVMiITOU&feature=kp',
                 'body' => 'https://www.youtube.com/watch?v=HMUDVMiITOU&feature=kp',
                 'mimetype_returncode' => 501,
                 'type' => 'irc',
                 'announcer' => 'Nocty',
                 'mimetype' => 'text/plain',
                 'www_url' => 'https://www.youtube.com/watch?v=HMUDVMiITOU&feature=kp'
               }, 'Links' );
n
ngtjah
Voice
Posts: 5
Joined: Sun Mar 30, 2014 11:27 pm

Post by ngtjah »

Hi Nocty, Glad you were able to get it working! I was hoping I had not missed anything in the guide. You are the first user I have heard back from since releasing this. Thanks for the feedback.

I will take a look at this problem. I have seen some issues like this before with a 5xx error on my site. I'm thinking something must be going on with the "feature=kp" in the URL. Does it work OK without that?

If you have any more issues please feel free to report them on the github page under issues and I should get back to you a bit faster.

-ngtjah
n
ngtjah
Voice
Posts: 5
Joined: Sun Mar 30, 2014 11:27 pm

Re: Works! Except for YouTube...

Post by ngtjah »

Hey Nocty,

I found the issue. Some of my recent code didn't make it to the repository. If you grab the new Links.pm file you should be good to go.

-ngtjah
N
Nocty
Voice
Posts: 15
Joined: Tue Jun 17, 2014 3:18 pm

Thanks!

Post by Nocty »

:D

Awesome, will grab to test now, I had been looking to replace the current link logger I have because it's ugly as sin, but as you might imagine YouTube links are pretty common in the channel.

I will grab the new .pm and let you know how things work, but I'm pretty sure you're correct; I noticed that it was indeed catching some YT links but not all of them.

Thanks again for all your hard work!
N
Nocty
Voice
Posts: 15
Joined: Tue Jun 17, 2014 3:18 pm

Same error

Post by Nocty »

Still getting the same error with the updated links.pm; it seems to not like anything as far as the & parameter in the URL

Code: Select all

 This URL: https://www.youtube.com/watch?v=HMUDVMiITOU&t=10 doesn't really exist!! Return Code: 500

$VAR1 = bless( {
                 'isdupe' => 0,
                 'date' => '2014-07-24 11:30:46',
                 'parseline' => '[11:30:46:2014-07-24] <@Nocty> https://www.youtube.com/watch?v=HMUDVMiITOU&t=10',
                 'body' => 'https://www.youtube.com/watch?v=HMUDVMiITOU&t=10',
                 'mimetype_returncode' => 500,
                 'type' => 'irc',
                 'announcer' => 'Nocty',
                 'mimetype' => 'text/plain',
                 'www_url' => 'https://www.youtube.com/watch?v=HMUDVMiITOU&t=10'
               }, 'Links' ); 
You could potentially strip anything starting at the first & in a YouTube URL for the parsing, since these options in the URL only instruct the browser to skip to a particular point in the video, enable HD, etc, and wouldn't adversely affect the video title being parsed.

It actually might be a good thing to do regardless, since this would also make it so that

https://www.youtube.com/watch?v=HMUDVMiITOU&t=10 (skipping to 10 seconds in)
and
https://www.youtube.com/watch?v=HMUDVMiITOU&t=20 (skipping to 20 seconds in)

would both be truncated as

https://www.youtube.com/watch?v=HMUDVMiITOU

And would not result in multiple entries for the same video, but different timestamps, being entered in the DB.
n
ngtjah
Voice
Posts: 5
Joined: Sun Mar 30, 2014 11:27 pm

Post by ngtjah »

Interesting...
I can't seem to replicate the issue on my system...possibly some differences in our perl modules... I do see that now you are receiving a 500 error now, where before it was a 501. 500 is less specific an error than 501 so that doesn't really help... hmmm..

Could it be that it works with http and not https?

Would you also paste the full log from this entry starting from "initializing links object"? Can you show me the log from a youtube that does work as well?

I could create an option to strip the URL paramaters like you suggested, but lets make sure we know where the issue is first.

thanks!
N
Nocty
Voice
Posts: 15
Joined: Tue Jun 17, 2014 3:18 pm

You're onto something

Post by Nocty »

Actually it looks like you're definitely onto something; it would appear that none of my HTTPS links are parsing correctly.

Code: Select all

 Initialize Links Object
Checking existance of site in database..
not dupe
This URL: https://fbcdn-sphotos-e-a.akamaihd.net/hphotos-ak-xfp1/t1.0-9/1513695_314741348650880_1231833456_n.jpg doesn't really exist!! Return Code: 501

$VAR1 = bless( {
                 'isdupe' => 0,
                 'date' => '2014-07-18 03:01:17',
                 'parseline' => '[03:01:17:2014-07-18] <Frank> https://fbcdn-sphotos-e-a.akamaihd.net/hphotos-ak-xfp1/t1.0-9/1513695_314741348650880_1231833456_n.jpg btfo',
                 'body' => 'https://fbcdn-sphotos-e-a.akamaihd.net/hphotos-ak-xfp1/t1.0-9/1513695_314741348650880_1231833456_n.jpg btfo',
                 'mimetype_returncode' => 501,
                 'type' => 'irc',
                 'announcer' => 'Frank',
                 'mimetype' => 'text/plain',
                 'www_url' => 'https://fbcdn-sphotos-e-a.akamaihd.net/hphotos-ak-xfp1/t1.0-9/1513695_314741348650880_1231833456_n.jpg'
               }, 'Links' );
0 entries added/updated to the database.
done. 
as far as my versions

Version: 5.836-1 (libwww-perl)
Version: 1.30-1 (libmime-types-perl)
N
Nocty
Voice
Posts: 15
Joined: Tue Jun 17, 2014 3:18 pm

Perplexing - EDIT: FIXED!

Post by Nocty »

Yeah something is definitely not right on my end, I wrote a simple Perl script to compare MIME type responses based on this:

http://stackoverflow.com/questions/5237 ... pe-in-perl

and it is returning different types for the same link for HTTP vs HTTPS

Code: Select all

user@AALurker:~/links$ perl test.pl
Trying https://fbcdn-sphotos-e-a.akamaihd.net/hphotos-ak-xfp1/t1.0-9/1513695_314741348650880_1231833456_n.jpg
The type is text/plain
Trying http://fbcdn-sphotos-e-a.akamaihd.net/hphotos-ak-xfp1/t1.0-9/1513695_314741348650880_1231833456_n.jpg
The type is image/jpeg
Any thoughts? I'm going to try installing LWP from something other than the Debian package manager.

EDIT: Fixed it! :P

I think the version of LWP installed via Debian's software center was too old to support HTTPS or didn't include

http://search.cpan.org/~mschilli/LWP-Pr ... ttps-6.06/

as a caveat, there was only a brief span of time where the HTTPS module was included by default, my version was too old, anything past 6.02 is too new because
This module used to be bundled with the libwww-perl, but it was unbundled in v6.02 in order to be able to declare its dependencies properly for the CPAN tool-chain. Applications that need https support can just declare their dependency on LWP::Protocol::https and will no longer need to know what underlying modules to install.
I was able to resolve this issue by re-installing via the CPAN shell. Using sudo or at a root console:

Code: Select all

root@AALurker: perl -MCPAN -eshell (may need to initialize, answer all questions with default answer)
cpan> install Bundle::LWP (again answer default or "yes" for all questions)
cpan> install LWP::Protocol::https (again answer default or "yes" for all questions)
Once I did the above, links_logs.log output was:

Code: Select all

Initialize Links Object
Checking existance of site in database..
not dupe
Remote Server Mime Type: text/html 
Title: Blaze Loves His Kennel (ORIGINAL) Husky Says No to Kennel - Funny - YouTube
Entering site...
MYSQL:INSERT INTO links (site, announcer, edate, type, title, filename, twidth, theight, width, height, appid) VALUES ('https://www.youtube.com/watch?v=hCRDskZrUMU', 'Nocty', '2014-07-25 10:39:46', 'irc', 'Blaze Loves His Kennel (ORIGINAL) Husky Says No to Kennel - Funny - YouTube', NULL, NULL, NULL, NULL, NULL, NULL)
Announcer : Nocty   URL : https://www.youtube.com/watch?v=hCRDskZrUMU

$VAR1 = bless( {
                 'isdupe' => 0,
                 'date' => '2014-07-25 10:39:46',
                 'parseline' => '[10:39:46:2014-07-25] <@Nocty> https://www.youtube.com/watch?v=hCRDskZrUMU',
                 'body' => 'https://www.youtube.com/watch?v=hCRDskZrUMU',
                 'www_img' => 'https://www.youtube.com/watch?v=hCRDskZrUMU',
                 'mimetype_returncode' => '200',
                 'title' => 'Blaze Loves His Kennel (ORIGINAL) Husky Says No to Kennel - Funny - YouTube',
                 'type' => 'irc',
                 'announcer' => 'Nocty',
                 'mimetype' => 'text/html',
                 'www_url' => 'https://www.youtube.com/watch?v=hCRDskZrUMU'
               }, 'Links' );
1 entry added/updated to the database.
done.
n
ngtjah
Voice
Posts: 5
Joined: Sun Mar 30, 2014 11:27 pm

Post by ngtjah »

NICE! Enjoy!
N
Nocty
Voice
Posts: 15
Joined: Tue Jun 17, 2014 3:18 pm

Thanks!

Post by Nocty »

Thanks a ton for your hard work, great script! I posted the same fix as a comment on your github in case anyone else is a Linux scrub like me and has the same issue.
Post Reply