This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

urltitle grabber

Help for those learning Tcl or writing their own scripts.
Post Reply
G
Goga
Halfop
Posts: 83
Joined: Sat Sep 19, 2020 2:12 am

urltitle grabber

Post by Goga »

Code: Select all

# Script to grab titles from webpages - Copyright C.Leonhardt (rosc2112 at yahoo com) Aug.11.2007 
# http://members.dandy.net/~fbn/urltitle.tcl.txt
# Loosely based on the tinyurl script by Jer and other bits and pieces of my own..

################################################################################################################

# Usage: 

# 1) Set the configs below
# 2) .chanset #channelname +urltitle        ;# enable script
# 3) .chanset #channelname +logurltitle     ;# enable logging
# Then just input a url in channel and the script will retrieve the title from the corresponding page.

# When reporting bugs, PLEASE include the .set errorInfo debug info! 
# Read here: http://forum.egghelp.org/viewtopic.php?t=10215

################################################################################################################

# Configs:

set urltitle(ignore) "bdkqr|dkqr" 	;# User flags script will ignore input from
set urltitle(pubmflags) "-|-" 		;# user flags required for channel eggdrop use
set urltitle(length) 5	 		;# minimum url length to trigger channel eggdrop use
set urltitle(delay) 1 			;# minimum seconds to wait before another eggdrop use
set urltitle(timeout) 60000 		;# geturl timeout (1/1000ths of a second)

################################################################################################################
# Script begins:

package require http			;# You need the http package..
set urltitle(last) 111 			;# Internal variable, stores time of last eggdrop use, don't change..
setudef flag urltitle			;# Channel flag to enable script.
setudef flag logurltitle		;# Channel flag to enable logging of script.

set urltitlever "0.01a"
bind pubm $urltitle(pubmflags) {*://*} pubm:urltitle
proc pubm:urltitle {nick host user chan text} {
	global urltitle
	if {([channel get $chan urltitle]) && ([expr [unixtime] - $urltitle(delay)] > $urltitle(last)) && \
	(![matchattr $user $urltitle(ignore)])} {
		foreach word [split $text] {
			if {[string length $word] >= $urltitle(length) && \
			[regexp {^(f|ht)tp(s|)://} $word] && \
			![regexp {://([^/:]*:([^/]*@|\d+(/|$))|.*/\.)} $word]} {
				set urltitle(last) [unixtime]
				set urtitle [urltitle $word]
				if {[string length $urtitle]} {
					puthelp "PRIVMSG $chan :$nick: URL Title for $word - \002$urtitle\002"
				}
				break
			}
		}
        }
	if {[channel get $chan logurltitle]} {
		foreach word [split $text] {
			if {[string match "*://*" $word]} {
				putlog "<$nick:$chan> $word -> $urtitle"
			}
		}
	}
	# change to return 0 if you want the pubm trigger logged additionally..
	return 1
}

proc urltitle {url} {
	if {[info exists url] && [string length $url]} {
		catch {set http [::http::geturl $url -timeout $::urltitle(timeout)]} error
		if {[string match -nocase "*couldn't open socket*" $error]} {
			return "Error: couldn't connect..Try again later"
		}
		if { [::http::status $http] == "timeout" } {
			return "Error: connection timed out while trying to contact $url"
		}
		set data [split [::http::data $http] \n]
		::http::cleanup $http
		set title ""
		if {[regexp -nocase {<title>(.*?)</title>} $data match title]} {
			return [string map { {href=} "" \" "" } $title]
		} else {
			return "No title found."
		}
	}
}

putlog "Url Title Grabber $urltitlever (rosc) script loaded.."
Using up above tcl for urltitle grabber, But It works only when posts with HTTP , If Post come with HTTPS it doesn't Reply at all.
User avatar
CrazyCat
Revered One
Posts: 1216
Joined: Sun Jan 13, 2002 8:00 pm
Location: France
Contact:

Re: urltitle grabber

Post by CrazyCat »

Use the code button next time, please.

The tcl doesn't use tls, here is a small modification (not tested)

Code: Select all

# Script to grab titles from webpages - Copyright C.Leonhardt (rosc2112 at yahoo com) Aug.11.2007 
# http://members.dandy.net/~fbn/urltitle.tcl.txt
# Loosely based on the tinyurl script by Jer and other bits and pieces of my own..

################################################################################################################

# Usage: 

# 1) Set the configs below
# 2) .chanset #channelname +urltitle        ;# enable script
# 3) .chanset #channelname +logurltitle     ;# enable logging
# Then just input a url in channel and the script will retrieve the title from the corresponding page.

# When reporting bugs, PLEASE include the .set errorInfo debug info! 
# Read here: http://forum.egghelp.org/viewtopic.php?t=10215

################################################################################################################

# Configs:

set urltitle(ignore) "bdkqr|dkqr" 	;# User flags script will ignore input from
set urltitle(pubmflags) "-|-" 		;# user flags required for channel eggdrop use
set urltitle(length) 5	 		;# minimum url length to trigger channel eggdrop use
set urltitle(delay) 1 			;# minimum seconds to wait before another eggdrop use
set urltitle(timeout) 60000 		;# geturl timeout (1/1000ths of a second)

################################################################################################################
# Script begins:

package require http			;# You need the http package..
package require tls
set urltitle(last) 111 			;# Internal variable, stores time of last eggdrop use, don't change..
setudef flag urltitle			;# Channel flag to enable script.
setudef flag logurltitle		;# Channel flag to enable logging of script.

set urltitlever "0.01a"
bind pubm $urltitle(pubmflags) {*://*} pubm:urltitle
proc pubm:urltitle {nick host user chan text} {
	global urltitle
	if {([channel get $chan urltitle]) && ([expr [unixtime] - $urltitle(delay)] > $urltitle(last)) && \
	(![matchattr $user $urltitle(ignore)])} {
		foreach word [split $text] {
			if {[string length $word] >= $urltitle(length) && \
			[regexp {^(f|ht)tp(s|)://} $word] && \
			![regexp {://([^/:]*:([^/]*@|\d+(/|$))|.*/\.)} $word]} {
				set urltitle(last) [unixtime]
				set urtitle [urltitle $word]
				if {[string length $urtitle]} {
					puthelp "PRIVMSG $chan :$nick: URL Title for $word - \002$urtitle\002"
				}
				break
			}
		}
        }
	if {[channel get $chan logurltitle]} {
		foreach word [split $text] {
			if {[string match "*://*" $word]} {
				putlog "<$nick:$chan> $word -> $urtitle"
			}
		}
	}
	# change to return 0 if you want the pubm trigger logged additionally..
	return 1
}

proc urltitle {url} {
	if {[info exists url] && [string length $url]} {
		if {[string match "https://*" $url]} {
			::http::register https 443 ::tls::socket
			set secure 1
		}
		catch {set http [::http::geturl $url -timeout $::urltitle(timeout)]} error
		if {[string match -nocase "*couldn't open socket*" $error]} {
			return "Error: couldn't connect..Try again later"
		}
		if { [::http::status $http] == "timeout" } {
			return "Error: connection timed out while trying to contact $url"
		}
		set data [split [::http::data $http] \n]
		::http::cleanup $http
		set title ""
		if {[regexp -nocase {<title>(.*?)</title>} $data match title]} {
			return [string map { {href=} "" \" "" } $title]
		} else {
			return "No title found."
		}
		if {[info exists secure]} {
			::http::unregister https
			unset secure
		}
	}
}

putlog "Url Title Grabber $urltitlever (rosc) script loaded.."
User avatar
ComputerTech
Master
Posts: 399
Joined: Sat Feb 22, 2020 10:29 am
Contact:

Post by ComputerTech »

Tested it CrazyCat, Works Perfect with both http and https :)
ComputerTech
User avatar
m4s
Halfop
Posts: 97
Joined: Mon Jan 30, 2017 3:24 pm

Error

Post by m4s »

Hello,

I got this error msg in DCC:

Tcl error [pubm:urltitle]: can't read "http": no such variable


The website I used:
https://www.politico.eu/article/trapped ... nightmare/

Can you pls check this script again?
Last edited by m4s on Sun Feb 28, 2021 10:03 am, edited 1 time in total.
User avatar
ComputerTech
Master
Posts: 399
Joined: Sat Feb 22, 2020 10:29 am
Contact:

Post by ComputerTech »

Just tested it again, works :lol:

m4s, you sure you have loaded the http.tcl package?

if not

https://core.tcl-lang.org/tcllib/dir?ci ... me=modules
Last edited by ComputerTech on Sun Feb 28, 2021 10:15 am, edited 1 time in total.
ComputerTech
User avatar
m4s
Halfop
Posts: 97
Joined: Mon Jan 30, 2017 3:24 pm

Post by m4s »

ComputerTech wrote:m4s, you sure you have loaded the http.tcl package?

if not

https://core.tcl-lang.org/tcllib/dir?ci ... me=modules
I've been using m00nie youtube script for a long time which also requires http package.
It works.
User avatar
ComputerTech
Master
Posts: 399
Joined: Sat Feb 22, 2020 10:29 am
Contact:

Post by ComputerTech »

I can Confirm, it works

Code: Select all

<ComputerTech> http://forum.egghelp.org/viewtopic.php?p=109443#109443
<Tech> ComputerTech: URL Title for http://forum.egghelp.org/viewtopic.php?p=109443#109443 - egghelp.org community :: View topic - urltitle grabber
However yes for your url https://www.politico.eu/article/trapped ... nightmare/

i get

Code: Select all

<Tech> [14:18:42] Tcl error [pubm:urltitle]: can't read "http": no such variable
Perhaps due to ssl issue of the site, because

Code: Select all

<ComputerTech> https://www.google.co.uk/
<Tech> ComputerTech: URL Title for https://www.google.co.uk/ - Google
<ComputerTech> https://www.anope.org/
<Tech> ComputerTech: URL Title for https://www.anope.org/ - Anope IRC Services
other https url's work :)
ComputerTech
User avatar
m4s
Halfop
Posts: 97
Joined: Mon Jan 30, 2017 3:24 pm

.set errorInfo

Post by m4s »

After .set errorInfo I got

Code: Select all

Currently: can't read "http": no such variable
Currently:     while executing
Currently: "::http::status $http"
Currently:     (procedure "urltitle" line 11)
Currently:     invoked from within
Currently: "urltitle $word"
Currently:     (procedure "pubm:urltitle" line 7)
Currently:     invoked from within
Currently: "pubm:urltitle $_pubm1 $_pubm2 $_pubm3 $_pubm4 $_pubm5"
User avatar
m4s
Halfop
Posts: 97
Joined: Mon Jan 30, 2017 3:24 pm

Post by m4s »

I changed this line:

Code: Select all

::http::register https 443 ::tls::socket
to

Code: Select all

http::register https 443 [list ::tls::socket -autoservername true]
and the "can't read "http": no such variable" msg disappeared but I got a result with strange characters in the title: https://i.imgur.com/e9cOqzI.jpg

Any idea to avoid this?
User avatar
Carlin0
Voice
Posts: 28
Joined: Tue Dec 04, 2018 3:41 pm
Location: Italy

Post by Carlin0 »

User avatar
m4s
Halfop
Posts: 97
Joined: Mon Jan 30, 2017 3:24 pm

Post by m4s »

hello,

not for me.
error msg
Tcl error [::urlmagic::find_urls]: can't read "settings(content-length)": no such element in array

After .set errorInfo

Code: Select all

Currently: can't read "settings(content-length)": no such element in array
Currently:     while executing
Currently: "set content_length $settings(content-length)"
Currently:     (procedure "::urlmagic::get_title" line 12)
Currently:     invoked from within
Currently: "${ns}::get_title $url"
Currently:     (procedure "::urlmagic::find_urls" line 16)
Currently:     invoked from within
Currently: "::urlmagic::find_urls $_pubm1 $_pubm2 $_pubm3 $_pubm4 $_pubm5"
User avatar
Carlin0
Voice
Posts: 28
Joined: Tue Dec 04, 2018 3:41 pm
Location: Italy

Post by Carlin0 »

In my VPS Debian 10 it works very fine

tested with eggdrop 1.8.4 and 1.9.0-rc3
Post Reply