| View previous topic :: View next topic |
| Author |
Message |
Goga Halfop
Joined: 19 Sep 2020 Posts: 78
|
Posted: Mon Jan 25, 2021 5:58 am Post subject: urltitle grabber |
|
|
| Code: | # Script to grab titles from webpages - Copyright C.Leonhardt (rosc2112 at yahoo com) Aug.11.2007
# http://members.dandy.net/~fbn/urltitle.tcl.txt
# Loosely based on the tinyurl script by Jer and other bits and pieces of my own..
################################################################################################################
# Usage:
# 1) Set the configs below
# 2) .chanset #channelname +urltitle ;# enable script
# 3) .chanset #channelname +logurltitle ;# enable logging
# Then just input a url in channel and the script will retrieve the title from the corresponding page.
# When reporting bugs, PLEASE include the .set errorInfo debug info!
# Read here: http://forum.egghelp.org/viewtopic.php?t=10215
################################################################################################################
# Configs:
set urltitle(ignore) "bdkqr|dkqr" ;# User flags script will ignore input from
set urltitle(pubmflags) "-|-" ;# user flags required for channel eggdrop use
set urltitle(length) 5 ;# minimum url length to trigger channel eggdrop use
set urltitle(delay) 1 ;# minimum seconds to wait before another eggdrop use
set urltitle(timeout) 60000 ;# geturl timeout (1/1000ths of a second)
################################################################################################################
# Script begins:
package require http ;# You need the http package..
set urltitle(last) 111 ;# Internal variable, stores time of last eggdrop use, don't change..
setudef flag urltitle ;# Channel flag to enable script.
setudef flag logurltitle ;# Channel flag to enable logging of script.
set urltitlever "0.01a"
bind pubm $urltitle(pubmflags) {*://*} pubm:urltitle
proc pubm:urltitle {nick host user chan text} {
global urltitle
if {([channel get $chan urltitle]) && ([expr [unixtime] - $urltitle(delay)] > $urltitle(last)) && \
(![matchattr $user $urltitle(ignore)])} {
foreach word [split $text] {
if {[string length $word] >= $urltitle(length) && \
[regexp {^(f|ht)tp(s|)://} $word] && \
![regexp {://([^/:]*:([^/]*@|\d+(/|$))|.*/\.)} $word]} {
set urltitle(last) [unixtime]
set urtitle [urltitle $word]
if {[string length $urtitle]} {
puthelp "PRIVMSG $chan :$nick: URL Title for $word - \002$urtitle\002"
}
break
}
}
}
if {[channel get $chan logurltitle]} {
foreach word [split $text] {
if {[string match "*://*" $word]} {
putlog "<$nick:$chan> $word -> $urtitle"
}
}
}
# change to return 0 if you want the pubm trigger logged additionally..
return 1
}
proc urltitle {url} {
if {[info exists url] && [string length $url]} {
catch {set http [::http::geturl $url -timeout $::urltitle(timeout)]} error
if {[string match -nocase "*couldn't open socket*" $error]} {
return "Error: couldn't connect..Try again later"
}
if { [::http::status $http] == "timeout" } {
return "Error: connection timed out while trying to contact $url"
}
set data [split [::http::data $http] \n]
::http::cleanup $http
set title ""
if {[regexp -nocase {<title>(.*?)</title>} $data match title]} {
return [string map { {href=} "" \" "" } $title]
} else {
return "No title found."
}
}
}
putlog "Url Title Grabber $urltitlever (rosc) script loaded.."
|
Using up above tcl for urltitle grabber, But It works only when posts with HTTP , If Post come with HTTPS it doesn't Reply at all. |
|
| Back to top |
|
 |
CrazyCat Revered One

Joined: 13 Jan 2002 Posts: 1032 Location: France
|
Posted: Mon Jan 25, 2021 8:49 am Post subject: Re: urltitle grabber |
|
|
Use the code button next time, please.
The tcl doesn't use tls, here is a small modification (not tested)
| Code: | # Script to grab titles from webpages - Copyright C.Leonhardt (rosc2112 at yahoo com) Aug.11.2007
# http://members.dandy.net/~fbn/urltitle.tcl.txt
# Loosely based on the tinyurl script by Jer and other bits and pieces of my own..
################################################################################################################
# Usage:
# 1) Set the configs below
# 2) .chanset #channelname +urltitle ;# enable script
# 3) .chanset #channelname +logurltitle ;# enable logging
# Then just input a url in channel and the script will retrieve the title from the corresponding page.
# When reporting bugs, PLEASE include the .set errorInfo debug info!
# Read here: http://forum.egghelp.org/viewtopic.php?t=10215
################################################################################################################
# Configs:
set urltitle(ignore) "bdkqr|dkqr" ;# User flags script will ignore input from
set urltitle(pubmflags) "-|-" ;# user flags required for channel eggdrop use
set urltitle(length) 5 ;# minimum url length to trigger channel eggdrop use
set urltitle(delay) 1 ;# minimum seconds to wait before another eggdrop use
set urltitle(timeout) 60000 ;# geturl timeout (1/1000ths of a second)
################################################################################################################
# Script begins:
package require http ;# You need the http package..
package require tls
set urltitle(last) 111 ;# Internal variable, stores time of last eggdrop use, don't change..
setudef flag urltitle ;# Channel flag to enable script.
setudef flag logurltitle ;# Channel flag to enable logging of script.
set urltitlever "0.01a"
bind pubm $urltitle(pubmflags) {*://*} pubm:urltitle
proc pubm:urltitle {nick host user chan text} {
global urltitle
if {([channel get $chan urltitle]) && ([expr [unixtime] - $urltitle(delay)] > $urltitle(last)) && \
(![matchattr $user $urltitle(ignore)])} {
foreach word [split $text] {
if {[string length $word] >= $urltitle(length) && \
[regexp {^(f|ht)tp(s|)://} $word] && \
![regexp {://([^/:]*:([^/]*@|\d+(/|$))|.*/\.)} $word]} {
set urltitle(last) [unixtime]
set urtitle [urltitle $word]
if {[string length $urtitle]} {
puthelp "PRIVMSG $chan :$nick: URL Title for $word - \002$urtitle\002"
}
break
}
}
}
if {[channel get $chan logurltitle]} {
foreach word [split $text] {
if {[string match "*://*" $word]} {
putlog "<$nick:$chan> $word -> $urtitle"
}
}
}
# change to return 0 if you want the pubm trigger logged additionally..
return 1
}
proc urltitle {url} {
if {[info exists url] && [string length $url]} {
if {[string match "https://*" $url]} {
::http::register https 443 ::tls::socket
set secure 1
}
catch {set http [::http::geturl $url -timeout $::urltitle(timeout)]} error
if {[string match -nocase "*couldn't open socket*" $error]} {
return "Error: couldn't connect..Try again later"
}
if { [::http::status $http] == "timeout" } {
return "Error: connection timed out while trying to contact $url"
}
set data [split [::http::data $http] \n]
::http::cleanup $http
set title ""
if {[regexp -nocase {<title>(.*?)</title>} $data match title]} {
return [string map { {href=} "" \" "" } $title]
} else {
return "No title found."
}
if {[info exists secure]} {
::http::unregister https
unset secure
}
}
}
putlog "Url Title Grabber $urltitlever (rosc) script loaded.." |
_________________ https://www.eggdrop.fr - French IRC network
Offer me a coffee - Do not ask me help in PM, we are a community. |
|
| Back to top |
|
 |
ComputerTech Master

Joined: 22 Feb 2020 Posts: 393
|
Posted: Mon Jan 25, 2021 3:33 pm Post subject: |
|
|
Tested it CrazyCat, Works Perfect with both http and https  _________________ ComputerTech |
|
| Back to top |
|
 |
m4s Halfop

Joined: 30 Jan 2017 Posts: 97
|
|
| Back to top |
|
 |
ComputerTech Master

Joined: 22 Feb 2020 Posts: 393
|
|
| Back to top |
|
 |
m4s Halfop

Joined: 30 Jan 2017 Posts: 97
|
Posted: Sun Feb 28, 2021 10:06 am Post subject: |
|
|
I've been using m00nie youtube script for a long time which also requires http package.
It works. |
|
| Back to top |
|
 |
ComputerTech Master

Joined: 22 Feb 2020 Posts: 393
|
Posted: Sun Feb 28, 2021 10:17 am Post subject: |
|
|
I can Confirm, it works
| Code: |
<ComputerTech> http://forum.egghelp.org/viewtopic.php?p=109443#109443
<Tech> ComputerTech: URL Title for http://forum.egghelp.org/viewtopic.php?p=109443#109443 - egghelp.org community :: View topic - urltitle grabber
|
However yes for your url https://www.politico.eu/article/trapped-in-germany-covid-coronavirus-nightmare/
i get
| Code: |
<Tech> [14:18:42] Tcl error [pubm:urltitle]: can't read "http": no such variable
|
Perhaps due to ssl issue of the site, because
| Code: |
<ComputerTech> https://www.google.co.uk/
<Tech> ComputerTech: URL Title for https://www.google.co.uk/ - Google
<ComputerTech> https://www.anope.org/
<Tech> ComputerTech: URL Title for https://www.anope.org/ - Anope IRC Services
|
other https url's work  _________________ ComputerTech |
|
| Back to top |
|
 |
m4s Halfop

Joined: 30 Jan 2017 Posts: 97
|
Posted: Sun Feb 28, 2021 10:57 am Post subject: .set errorInfo |
|
|
After .set errorInfo I got
| Code: |
Currently: can't read "http": no such variable
Currently: while executing
Currently: "::http::status $http"
Currently: (procedure "urltitle" line 11)
Currently: invoked from within
Currently: "urltitle $word"
Currently: (procedure "pubm:urltitle" line 7)
Currently: invoked from within
Currently: "pubm:urltitle $_pubm1 $_pubm2 $_pubm3 $_pubm4 $_pubm5"
|
|
|
| Back to top |
|
 |
m4s Halfop

Joined: 30 Jan 2017 Posts: 97
|
Posted: Tue Mar 02, 2021 2:57 pm Post subject: |
|
|
I changed this line:
| Code: | | ::http::register https 443 ::tls::socket |
to
| Code: | | http::register https 443 [list ::tls::socket -autoservername true] |
and the "can't read "http": no such variable" msg disappeared but I got a result with strange characters in the title: https://i.imgur.com/e9cOqzI.jpg
Any idea to avoid this? |
|
| Back to top |
|
 |
Carlin0 Voice

Joined: 04 Dec 2018 Posts: 24 Location: Italy
|
|
| Back to top |
|
 |
m4s Halfop

Joined: 30 Jan 2017 Posts: 97
|
Posted: Wed Mar 03, 2021 12:22 pm Post subject: |
|
|
hello,
not for me.
error msg
Tcl error [::urlmagic::find_urls]: can't read "settings(content-length)": no such element in array
After .set errorInfo
| Code: |
Currently: can't read "settings(content-length)": no such element in array
Currently: while executing
Currently: "set content_length $settings(content-length)"
Currently: (procedure "::urlmagic::get_title" line 12)
Currently: invoked from within
Currently: "${ns}::get_title $url"
Currently: (procedure "::urlmagic::find_urls" line 16)
Currently: invoked from within
Currently: "::urlmagic::find_urls $_pubm1 $_pubm2 $_pubm3 $_pubm4 $_pubm5"
|
|
|
| Back to top |
|
 |
Carlin0 Voice

Joined: 04 Dec 2018 Posts: 24 Location: Italy
|
Posted: Wed Mar 03, 2021 12:35 pm Post subject: |
|
|
In my VPS Debian 10 it works very fine
tested with eggdrop 1.8.4 and 1.9.0-rc3 |
|
| Back to top |
|
 |
|