egghelp.org community Forum Index
[ egghelp.org home | forum home ]
egghelp.org community
Discussion of eggdrop bots, shell accounts and tcl scripts.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

urltitle grabber

 
Post new topic   Reply to topic    egghelp.org community Forum Index -> Scripting Help
View previous topic :: View next topic  
Author Message
Goga
Voice


Joined: 19 Sep 2020
Posts: 35

PostPosted: Mon Jan 25, 2021 5:58 am    Post subject: urltitle grabber Reply with quote

Code:
# Script to grab titles from webpages - Copyright C.Leonhardt (rosc2112 at yahoo com) Aug.11.2007
# http://members.dandy.net/~fbn/urltitle.tcl.txt
# Loosely based on the tinyurl script by Jer and other bits and pieces of my own..

################################################################################################################

# Usage:

# 1) Set the configs below
# 2) .chanset #channelname +urltitle        ;# enable script
# 3) .chanset #channelname +logurltitle     ;# enable logging
# Then just input a url in channel and the script will retrieve the title from the corresponding page.

# When reporting bugs, PLEASE include the .set errorInfo debug info!
# Read here: http://forum.egghelp.org/viewtopic.php?t=10215

################################################################################################################

# Configs:

set urltitle(ignore) "bdkqr|dkqr"    ;# User flags script will ignore input from
set urltitle(pubmflags) "-|-"       ;# user flags required for channel eggdrop use
set urltitle(length) 5          ;# minimum url length to trigger channel eggdrop use
set urltitle(delay) 1          ;# minimum seconds to wait before another eggdrop use
set urltitle(timeout) 60000       ;# geturl timeout (1/1000ths of a second)

################################################################################################################
# Script begins:

package require http         ;# You need the http package..
set urltitle(last) 111          ;# Internal variable, stores time of last eggdrop use, don't change..
setudef flag urltitle         ;# Channel flag to enable script.
setudef flag logurltitle      ;# Channel flag to enable logging of script.

set urltitlever "0.01a"
bind pubm $urltitle(pubmflags) {*://*} pubm:urltitle
proc pubm:urltitle {nick host user chan text} {
   global urltitle
   if {([channel get $chan urltitle]) && ([expr [unixtime] - $urltitle(delay)] > $urltitle(last)) && \
   (![matchattr $user $urltitle(ignore)])} {
      foreach word [split $text] {
         if {[string length $word] >= $urltitle(length) && \
         [regexp {^(f|ht)tp(s|)://} $word] && \
         ![regexp {://([^/:]*:([^/]*@|\d+(/|$))|.*/\.)} $word]} {
            set urltitle(last) [unixtime]
            set urtitle [urltitle $word]
            if {[string length $urtitle]} {
               puthelp "PRIVMSG $chan :$nick: URL Title for $word - \002$urtitle\002"
            }
            break
         }
      }
        }
   if {[channel get $chan logurltitle]} {
      foreach word [split $text] {
         if {[string match "*://*" $word]} {
            putlog "<$nick:$chan> $word -> $urtitle"
         }
      }
   }
   # change to return 0 if you want the pubm trigger logged additionally..
   return 1
}

proc urltitle {url} {
   if {[info exists url] && [string length $url]} {
      catch {set http [::http::geturl $url -timeout $::urltitle(timeout)]} error
      if {[string match -nocase "*couldn't open socket*" $error]} {
         return "Error: couldn't connect..Try again later"
      }
      if { [::http::status $http] == "timeout" } {
         return "Error: connection timed out while trying to contact $url"
      }
      set data [split [::http::data $http] \n]
      ::http::cleanup $http
      set title ""
      if {[regexp -nocase {<title>(.*?)</title>} $data match title]} {
         return [string map { {href=} "" \" "" } $title]
      } else {
         return "No title found."
      }
   }
}

putlog "Url Title Grabber $urltitlever (rosc) script loaded.."

Using up above tcl for urltitle grabber, But It works only when posts with HTTP , If Post come with HTTPS it doesn't Reply at all.
Back to top
View user's profile Send private message
CrazyCat
Owner


Joined: 13 Jan 2002
Posts: 848
Location: France

PostPosted: Mon Jan 25, 2021 8:49 am    Post subject: Re: urltitle grabber Reply with quote

Use the code button next time, please.

The tcl doesn't use tls, here is a small modification (not tested)

Code:
# Script to grab titles from webpages - Copyright C.Leonhardt (rosc2112 at yahoo com) Aug.11.2007
# http://members.dandy.net/~fbn/urltitle.tcl.txt
# Loosely based on the tinyurl script by Jer and other bits and pieces of my own..

################################################################################################################

# Usage:

# 1) Set the configs below
# 2) .chanset #channelname +urltitle        ;# enable script
# 3) .chanset #channelname +logurltitle     ;# enable logging
# Then just input a url in channel and the script will retrieve the title from the corresponding page.

# When reporting bugs, PLEASE include the .set errorInfo debug info!
# Read here: http://forum.egghelp.org/viewtopic.php?t=10215

################################################################################################################

# Configs:

set urltitle(ignore) "bdkqr|dkqr"    ;# User flags script will ignore input from
set urltitle(pubmflags) "-|-"       ;# user flags required for channel eggdrop use
set urltitle(length) 5          ;# minimum url length to trigger channel eggdrop use
set urltitle(delay) 1          ;# minimum seconds to wait before another eggdrop use
set urltitle(timeout) 60000       ;# geturl timeout (1/1000ths of a second)

################################################################################################################
# Script begins:

package require http         ;# You need the http package..
package require tls
set urltitle(last) 111          ;# Internal variable, stores time of last eggdrop use, don't change..
setudef flag urltitle         ;# Channel flag to enable script.
setudef flag logurltitle      ;# Channel flag to enable logging of script.

set urltitlever "0.01a"
bind pubm $urltitle(pubmflags) {*://*} pubm:urltitle
proc pubm:urltitle {nick host user chan text} {
   global urltitle
   if {([channel get $chan urltitle]) && ([expr [unixtime] - $urltitle(delay)] > $urltitle(last)) && \
   (![matchattr $user $urltitle(ignore)])} {
      foreach word [split $text] {
         if {[string length $word] >= $urltitle(length) && \
         [regexp {^(f|ht)tp(s|)://} $word] && \
         ![regexp {://([^/:]*:([^/]*@|\d+(/|$))|.*/\.)} $word]} {
            set urltitle(last) [unixtime]
            set urtitle [urltitle $word]
            if {[string length $urtitle]} {
               puthelp "PRIVMSG $chan :$nick: URL Title for $word - \002$urtitle\002"
            }
            break
         }
      }
        }
   if {[channel get $chan logurltitle]} {
      foreach word [split $text] {
         if {[string match "*://*" $word]} {
            putlog "<$nick:$chan> $word -> $urtitle"
         }
      }
   }
   # change to return 0 if you want the pubm trigger logged additionally..
   return 1
}

proc urltitle {url} {
   if {[info exists url] && [string length $url]} {
      if {[string match "https://*" $url]} {
         ::http::register https 443 ::tls::socket
         set secure 1
      }
      catch {set http [::http::geturl $url -timeout $::urltitle(timeout)]} error
      if {[string match -nocase "*couldn't open socket*" $error]} {
         return "Error: couldn't connect..Try again later"
      }
      if { [::http::status $http] == "timeout" } {
         return "Error: connection timed out while trying to contact $url"
      }
      set data [split [::http::data $http] \n]
      ::http::cleanup $http
      set title ""
      if {[regexp -nocase {<title>(.*?)</title>} $data match title]} {
         return [string map { {href=} "" \" "" } $title]
      } else {
         return "No title found."
      }
      if {[info exists secure]} {
         ::http::unregister https
         unset secure
      }
   }
}

putlog "Url Title Grabber $urltitlever (rosc) script loaded.."

_________________
https://www.eggdrop.fr
Offer me a coffee - Do not ask me help in PM, we are a community.
Back to top
View user's profile Send private message Visit poster's website
ComputerTech
Master


Joined: 22 Feb 2020
Posts: 374

PostPosted: Mon Jan 25, 2021 3:33 pm    Post subject: Reply with quote

Tested it CrazyCat, Works Perfect with both http and https Smile
_________________
ComputerTech
Back to top
View user's profile Send private message Send e-mail Visit poster's website
m4s
Halfop


Joined: 30 Jan 2017
Posts: 97

PostPosted: Sun Feb 28, 2021 9:38 am    Post subject: Error Reply with quote

Hello,

I got this error msg in DCC:

Tcl error [pubm:urltitle]: can't read "http": no such variable


The website I used:
https://www.politico.eu/article/trapped-in-germany-covid-coronavirus-nightmare/

Can you pls check this script again?


Last edited by m4s on Sun Feb 28, 2021 10:03 am; edited 1 time in total
Back to top
View user's profile Send private message
ComputerTech
Master


Joined: 22 Feb 2020
Posts: 374

PostPosted: Sun Feb 28, 2021 9:55 am    Post subject: Reply with quote

Just tested it again, works Laughing

m4s, you sure you have loaded the http.tcl package?

if not

https://core.tcl-lang.org/tcllib/dir?ci=e3475de99399b361&name=modules
_________________
ComputerTech


Last edited by ComputerTech on Sun Feb 28, 2021 10:15 am; edited 1 time in total
Back to top
View user's profile Send private message Send e-mail Visit poster's website
m4s
Halfop


Joined: 30 Jan 2017
Posts: 97

PostPosted: Sun Feb 28, 2021 10:06 am    Post subject: Reply with quote

ComputerTech wrote:
m4s, you sure you have loaded the http.tcl package?

if not

https://core.tcl-lang.org/tcllib/dir?ci=e3475de99399b361&name=modules


I've been using m00nie youtube script for a long time which also requires http package.
It works.
Back to top
View user's profile Send private message
ComputerTech
Master


Joined: 22 Feb 2020
Posts: 374

PostPosted: Sun Feb 28, 2021 10:17 am    Post subject: Reply with quote

I can Confirm, it works

Code:

<ComputerTech> http://forum.egghelp.org/viewtopic.php?p=109443#109443
<Tech> ComputerTech: URL Title for http://forum.egghelp.org/viewtopic.php?p=109443#109443 - egghelp.org community :: View topic - urltitle grabber

However yes for your url https://www.politico.eu/article/trapped-in-germany-covid-coronavirus-nightmare/

i get
Code:

<Tech> [14:18:42] Tcl error [pubm:urltitle]: can't read "http": no such variable


Perhaps due to ssl issue of the site, because
Code:

<ComputerTech> https://www.google.co.uk/
<Tech> ComputerTech: URL Title for https://www.google.co.uk/ - Google
<ComputerTech> https://www.anope.org/
<Tech> ComputerTech: URL Title for https://www.anope.org/ - Anope IRC Services

other https url's work Smile
_________________
ComputerTech
Back to top
View user's profile Send private message Send e-mail Visit poster's website
m4s
Halfop


Joined: 30 Jan 2017
Posts: 97

PostPosted: Sun Feb 28, 2021 10:57 am    Post subject: .set errorInfo Reply with quote

After .set errorInfo I got
Code:

Currently: can't read "http": no such variable
Currently:     while executing
Currently: "::http::status $http"
Currently:     (procedure "urltitle" line 11)
Currently:     invoked from within
Currently: "urltitle $word"
Currently:     (procedure "pubm:urltitle" line 7)
Currently:     invoked from within
Currently: "pubm:urltitle $_pubm1 $_pubm2 $_pubm3 $_pubm4 $_pubm5"
Back to top
View user's profile Send private message
m4s
Halfop


Joined: 30 Jan 2017
Posts: 97

PostPosted: Tue Mar 02, 2021 2:57 pm    Post subject: Reply with quote

I changed this line:

Code:
::http::register https 443 ::tls::socket

to
Code:
http::register https 443 [list ::tls::socket -autoservername true]


and the "can't read "http": no such variable" msg disappeared but I got a result with strange characters in the title: https://i.imgur.com/e9cOqzI.jpg

Any idea to avoid this?
Back to top
View user's profile Send private message
Carlin0
Voice


Joined: 04 Dec 2018
Posts: 22
Location: Italy

PostPosted: Wed Mar 03, 2021 11:34 am    Post subject: Reply with quote

this works best

http://tclarchive.org/search.php?str=urlmagic&cb1=t&cb2=a&cb3=d&sub.x=0&sub.y=0
Back to top
View user's profile Send private message
m4s
Halfop


Joined: 30 Jan 2017
Posts: 97

PostPosted: Wed Mar 03, 2021 12:22 pm    Post subject: Reply with quote

pachisapiu wrote:
this works best

http://tclarchive.org/search.php?str=urlmagic&cb1=t&cb2=a&cb3=d&sub.x=0&sub.y=0


hello,

not for me.
error msg
Tcl error [::urlmagic::find_urls]: can't read "settings(content-length)": no such element in array

After .set errorInfo
Code:

Currently: can't read "settings(content-length)": no such element in array
Currently:     while executing
Currently: "set content_length $settings(content-length)"
Currently:     (procedure "::urlmagic::get_title" line 12)
Currently:     invoked from within
Currently: "${ns}::get_title $url"
Currently:     (procedure "::urlmagic::find_urls" line 16)
Currently:     invoked from within
Currently: "::urlmagic::find_urls $_pubm1 $_pubm2 $_pubm3 $_pubm4 $_pubm5"
Back to top
View user's profile Send private message
Carlin0
Voice


Joined: 04 Dec 2018
Posts: 22
Location: Italy

PostPosted: Wed Mar 03, 2021 12:35 pm    Post subject: Reply with quote

In my VPS Debian 10 it works very fine

tested with eggdrop 1.8.4 and 1.9.0-rc3
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    egghelp.org community Forum Index -> Scripting Help All times are GMT - 4 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Forum hosting provided by Reverse.net

Powered by phpBB © 2001, 2005 phpBB Group
subGreen style by ktauber