| View previous topic :: View next topic |
| Author |
Message |
ngtjah Voice
Joined: 30 Mar 2014 Posts: 5
|
Posted: Mon Mar 31, 2014 12:02 am Post subject: links - url and img collector |
|
|
A web page and scripts that collect and display links and images from a IRC channel with the help of an eggdrop bot's channel log.
https://github.com/ngtjah/links
It can do some other things like integrate with twitter and pocket.
This is a little project that I've been working on for a few years driven by my little IRC community and I thought this might be a good place to share it and get some feedback.
Check out the github wiki for more information.
-ngtjah |
|
| Back to top |
|
 |
Nocty Voice
Joined: 17 Jun 2014 Posts: 15
|
Posted: Tue Jun 17, 2014 3:28 pm Post subject: Works! Except for YouTube... |
|
|
So thanks to your excellent guide I was able to follow the directions and, with a little bit of troubleshooting and only minimal Linux knowledge (running this on Debian), I was able to get this script mostly working.
For some reason, however, it does not seem to like the YouTube links I tested it with, and seems to decide they are error pages.
Output of Links_logs.log
| Code: |
This URL: https://www.youtube.com/watch?v=HMUDVMiITOU&feature=kp doesn't really exist!! Return Code: 501
$VAR1 = bless( {
'isdupe' => 0,
'date' => '2014-06-17 14:10:31',
'parseline' => '[14:10:31:2014-06-17] <Nocty> https://www.youtube.com/watch?v=HMUDVMiITOU&feature=kp',
'body' => 'https://www.youtube.com/watch?v=HMUDVMiITOU&feature=kp',
'mimetype_returncode' => 501,
'type' => 'irc',
'announcer' => 'Nocty',
'mimetype' => 'text/plain',
'www_url' => 'https://www.youtube.com/watch?v=HMUDVMiITOU&feature=kp'
}, 'Links' );
|
|
|
| Back to top |
|
 |
ngtjah Voice
Joined: 30 Mar 2014 Posts: 5
|
Posted: Mon Jul 21, 2014 10:59 pm Post subject: |
|
|
Hi Nocty, Glad you were able to get it working! I was hoping I had not missed anything in the guide. You are the first user I have heard back from since releasing this. Thanks for the feedback.
I will take a look at this problem. I have seen some issues like this before with a 5xx error on my site. I'm thinking something must be going on with the "feature=kp" in the URL. Does it work OK without that?
If you have any more issues please feel free to report them on the github page under issues and I should get back to you a bit faster.
-ngtjah |
|
| Back to top |
|
 |
ngtjah Voice
Joined: 30 Mar 2014 Posts: 5
|
Posted: Wed Jul 23, 2014 3:26 pm Post subject: Re: Works! Except for YouTube... |
|
|
Hey Nocty,
I found the issue. Some of my recent code didn't make it to the repository. If you grab the new Links.pm file you should be good to go.
-ngtjah |
|
| Back to top |
|
 |
Nocty Voice
Joined: 17 Jun 2014 Posts: 15
|
Posted: Thu Jul 24, 2014 11:24 am Post subject: Thanks! |
|
|
:D
Awesome, will grab to test now, I had been looking to replace the current link logger I have because it's ugly as sin, but as you might imagine YouTube links are pretty common in the channel.
I will grab the new .pm and let you know how things work, but I'm pretty sure you're correct; I noticed that it was indeed catching some YT links but not all of them.
Thanks again for all your hard work! |
|
| Back to top |
|
 |
Nocty Voice
Joined: 17 Jun 2014 Posts: 15
|
Posted: Thu Jul 24, 2014 11:49 am Post subject: Same error |
|
|
Still getting the same error with the updated links.pm; it seems to not like anything as far as the & parameter in the URL
| Code: | This URL: https://www.youtube.com/watch?v=HMUDVMiITOU&t=10 doesn't really exist!! Return Code: 500
$VAR1 = bless( {
'isdupe' => 0,
'date' => '2014-07-24 11:30:46',
'parseline' => '[11:30:46:2014-07-24] <@Nocty> https://www.youtube.com/watch?v=HMUDVMiITOU&t=10',
'body' => 'https://www.youtube.com/watch?v=HMUDVMiITOU&t=10',
'mimetype_returncode' => 500,
'type' => 'irc',
'announcer' => 'Nocty',
'mimetype' => 'text/plain',
'www_url' => 'https://www.youtube.com/watch?v=HMUDVMiITOU&t=10'
}, 'Links' ); |
You could potentially strip anything starting at the first & in a YouTube URL for the parsing, since these options in the URL only instruct the browser to skip to a particular point in the video, enable HD, etc, and wouldn't adversely affect the video title being parsed.
It actually might be a good thing to do regardless, since this would also make it so that
https://www.youtube.com/watch?v=HMUDVMiITOU&t=10 (skipping to 10 seconds in)
and
https://www.youtube.com/watch?v=HMUDVMiITOU&t=20 (skipping to 20 seconds in)
would both be truncated as
https://www.youtube.com/watch?v=HMUDVMiITOU
And would not result in multiple entries for the same video, but different timestamps, being entered in the DB. |
|
| Back to top |
|
 |
ngtjah Voice
Joined: 30 Mar 2014 Posts: 5
|
Posted: Thu Jul 24, 2014 8:32 pm Post subject: |
|
|
Interesting...
I can't seem to replicate the issue on my system...possibly some differences in our perl modules... I do see that now you are receiving a 500 error now, where before it was a 501. 500 is less specific an error than 501 so that doesn't really help... hmmm..
Could it be that it works with http and not https?
Would you also paste the full log from this entry starting from "initializing links object"? Can you show me the log from a youtube that does work as well?
I could create an option to strip the URL paramaters like you suggested, but lets make sure we know where the issue is first.
thanks! |
|
| Back to top |
|
 |
Nocty Voice
Joined: 17 Jun 2014 Posts: 15
|
Posted: Fri Jul 25, 2014 8:46 am Post subject: You're onto something |
|
|
Actually it looks like you're definitely onto something; it would appear that none of my HTTPS links are parsing correctly.
| Code: | Initialize Links Object
Checking existance of site in database..
not dupe
This URL: https://fbcdn-sphotos-e-a.akamaihd.net/hphotos-ak-xfp1/t1.0-9/1513695_314741348650880_1231833456_n.jpg doesn't really exist!! Return Code: 501
$VAR1 = bless( {
'isdupe' => 0,
'date' => '2014-07-18 03:01:17',
'parseline' => '[03:01:17:2014-07-18] <Frank> https://fbcdn-sphotos-e-a.akamaihd.net/hphotos-ak-xfp1/t1.0-9/1513695_314741348650880_1231833456_n.jpg btfo',
'body' => 'https://fbcdn-sphotos-e-a.akamaihd.net/hphotos-ak-xfp1/t1.0-9/1513695_314741348650880_1231833456_n.jpg btfo',
'mimetype_returncode' => 501,
'type' => 'irc',
'announcer' => 'Frank',
'mimetype' => 'text/plain',
'www_url' => 'https://fbcdn-sphotos-e-a.akamaihd.net/hphotos-ak-xfp1/t1.0-9/1513695_314741348650880_1231833456_n.jpg'
}, 'Links' );
0 entries added/updated to the database.
done. |
as far as my versions
Version: 5.836-1 (libwww-perl)
Version: 1.30-1 (libmime-types-perl) |
|
| Back to top |
|
 |
Nocty Voice
Joined: 17 Jun 2014 Posts: 15
|
Posted: Fri Jul 25, 2014 11:06 am Post subject: Perplexing - EDIT: FIXED! |
|
|
Yeah something is definitely not right on my end, I wrote a simple Perl script to compare MIME type responses based on this:
http://stackoverflow.com/questions/523773/how-do-i-find-a-links-content-type-in-perl
and it is returning different types for the same link for HTTP vs HTTPS
| Code: | user@AALurker:~/links$ perl test.pl
Trying https://fbcdn-sphotos-e-a.akamaihd.net/hphotos-ak-xfp1/t1.0-9/1513695_314741348650880_1231833456_n.jpg
The type is text/plain
Trying http://fbcdn-sphotos-e-a.akamaihd.net/hphotos-ak-xfp1/t1.0-9/1513695_314741348650880_1231833456_n.jpg
The type is image/jpeg |
Any thoughts? I'm going to try installing LWP from something other than the Debian package manager.
EDIT: Fixed it!
I think the version of LWP installed via Debian's software center was too old to support HTTPS or didn't include
http://search.cpan.org/~mschilli/LWP-Protocol-https-6.06/
as a caveat, there was only a brief span of time where the HTTPS module was included by default, my version was too old, anything past 6.02 is too new because
| Quote: | | This module used to be bundled with the libwww-perl, but it was unbundled in v6.02 in order to be able to declare its dependencies properly for the CPAN tool-chain. Applications that need https support can just declare their dependency on LWP::Protocol::https and will no longer need to know what underlying modules to install. |
I was able to resolve this issue by re-installing via the CPAN shell. Using sudo or at a root console:
| Code: | root@AALurker: perl -MCPAN -eshell (may need to initialize, answer all questions with default answer)
cpan> install Bundle::LWP (again answer default or "yes" for all questions)
cpan> install LWP::Protocol::https (again answer default or "yes" for all questions) |
Once I did the above, links_logs.log output was:
| Code: | Initialize Links Object
Checking existance of site in database..
not dupe
Remote Server Mime Type: text/html
Title: Blaze Loves His Kennel (ORIGINAL) Husky Says No to Kennel - Funny - YouTube
Entering site...
MYSQL:INSERT INTO links (site, announcer, edate, type, title, filename, twidth, theight, width, height, appid) VALUES ('https://www.youtube.com/watch?v=hCRDskZrUMU', 'Nocty', '2014-07-25 10:39:46', 'irc', 'Blaze Loves His Kennel (ORIGINAL) Husky Says No to Kennel - Funny - YouTube', NULL, NULL, NULL, NULL, NULL, NULL)
Announcer : Nocty URL : https://www.youtube.com/watch?v=hCRDskZrUMU
$VAR1 = bless( {
'isdupe' => 0,
'date' => '2014-07-25 10:39:46',
'parseline' => '[10:39:46:2014-07-25] <@Nocty> https://www.youtube.com/watch?v=hCRDskZrUMU',
'body' => 'https://www.youtube.com/watch?v=hCRDskZrUMU',
'www_img' => 'https://www.youtube.com/watch?v=hCRDskZrUMU',
'mimetype_returncode' => '200',
'title' => 'Blaze Loves His Kennel (ORIGINAL) Husky Says No to Kennel - Funny - YouTube',
'type' => 'irc',
'announcer' => 'Nocty',
'mimetype' => 'text/html',
'www_url' => 'https://www.youtube.com/watch?v=hCRDskZrUMU'
}, 'Links' );
1 entry added/updated to the database.
done. |
|
|
| Back to top |
|
 |
ngtjah Voice
Joined: 30 Mar 2014 Posts: 5
|
Posted: Fri Jul 25, 2014 11:46 am Post subject: |
|
|
| NICE! Enjoy! |
|
| Back to top |
|
 |
Nocty Voice
Joined: 17 Jun 2014 Posts: 15
|
Posted: Fri Jul 25, 2014 11:59 am Post subject: Thanks! |
|
|
| Thanks a ton for your hard work, great script! I posted the same fix as a comment on your github in case anyone else is a Linux scrub like me and has the same issue. |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|