This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

Quick REGEXP Help!

Help for those learning Tcl or writing their own scripts.
Post Reply
F
FTL25
Voice
Posts: 17
Joined: Mon Nov 14, 2005 5:39 pm

Quick REGEXP Help!

Post by FTL25 »

I have the following line of XML I'm reading from:

<pubDate>Tue, 15 Nov 2005 13:44:46 PST</pubDate>

In the code I use:

regexp {<pubDate>(.*)</pubDate>} $body - date

to get it, and...

puthelp "PRIVMSG $channel :HeadLine: $served"

to output it.

As output, I expect to get:

[7:25pm]«@ BOTNICK» HeadLine: Tue, 15 Nov 2005 13:44:46 PST

instead, I get...

[7:25pm]«@ BOTNICK» HeadLine: Tue, 15 Nov 2005 13:44:46 PST</pubDate>

How do I get rid of that </pubDate> ???
F
FTL25
Voice
Posts: 17
Joined: Mon Nov 14, 2005 5:39 pm

Post by FTL25 »

Maybe I should have been more clear...

I'm trying to write my own rss news feed bot, from this site.

What I want to do is have the bot output

The Title
The Date
The Description
The Link

Of the most recent news feed. It looks like that will always be:

The 3rd "<title>" on that page for the Title of the article.

The 1st "<pubDate>" on that page for the article Date.

The 2nd "<description>" on that page for the article Description.

The 3rd "<link>" on that page, for the Link to the article.

This is how I was trying to get my info for the bot to pull. So far I could only get the "<pubDate>" one to work, because its always the 1st "<pubDate>" on the page lol... I havent figured out how to do the others yet. I guess I'll have to use a loop to bring it to the correct trigger on the page. I'll figure that out later :wink: For now I just want to figure out how to get rid of that "</pubDate>" at the end of my output!

Here's the code I'm using:

Code: Select all

set rssfeed "http://sports.espn.go.com/espn/rss/news"
set trigger "!latest"
set channel "#chan"

if {![info exists egghttp(version)]} {
  putlog "egghttp.tcl was NOT successfully loaded."
  putlog "egghttp_example.tcl has not been loaded as a result."
} else {
  proc your_callbackproc {sock} {

    global rssfeed channel

    set headers [egghttp:headers $sock]
    set body [egghttp:data $sock]

    regexp {<title>(.*)</title>} $body - title
    regexp {<pubDate>(.*)</pubDate>} $body - date
    regexp {<description>(.*)</description>} $body - desc
    regexp {<link>(.*)</link>} $body - link

    puthelp "PRIVMSG $channel :Latest Top Headline: $title"
    puthelp "PRIVMSG $channel :Published: $date"
    puthelp "PRIVMSG $channel :Description: $desc"
    puthelp "PRIVMSG $channel :Link: $link"
  }

  bind pub -|* $trigger top:trigger
  proc top:trigger {nick host hand chan text} {
    global rssfeed 
    set sock [egghttp:geturl $rssfeed your_callbackproc]
    return 1
  }  

  putlog "egghttp_example.tcl has been successfully loaded."
}
Here's what the output looks like:
[9:57pm] <Me> !latest
[9:57pm]«@ Bot» Latest Top Headline: ESPN.com</title>
[9:57pm]«@ Bot» Published: Tue, 15 Nov 2005 16:49:23 PST</pubDate>
[9:57pm]«@ Bot» Description: Latest news from ESPN.com</description>
[9:57pm]«@ Bot» Link: http://espn.go.com/</link> <description>Latest news from ESPN.com</description>
I'm new to this, so go easy :wink:

It's really working exactly as its told to... just not working the way I want it to!
User avatar
demond
Revered One
Posts: 3073
Joined: Sat Jun 12, 2004 9:58 am
Location: San Francisco, CA
Contact:

Post by demond »

have a look at rssnews.tcl source

among other things, it does exactly what you need to do - parse XML and extract tag contents
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use

Code: Select all

 tag when posting logs, code
F
FTL25
Voice
Posts: 17
Joined: Mon Nov 14, 2005 5:39 pm

Post by FTL25 »

Thanks demond. I've look at it more than once. Starting to understand it more as I learn more :)

I've replaced
regexp {<title>(.*)</title>} $body - title
regexp {<pubDate>(.*)</pubDate>} $body - date
regexp {<description>(.*)</description>} $body - desc
regexp {<link>(.*)</link>} $body - link
with
regexp {<title>(.*?)</title>} $body - title
regexp {<pubDate>(.*?)</pubDate>} $body - date
regexp {<description>(.*?)</description>} $body - desc
regexp {<link>(.*?)</link>} $body - link
and it got rid of the tags at the end of each line of output, the ones that werent supposed to be there.

So now it looks right...
[10:13pm] <Me> !latest
[10:13pm]«@ Bot» Latest Top Headline: ESPN.com
[10:13pm]«@ Bot» Published: Tue, 15 Nov 2005 16:49:23 PST
[10:13pm]«@ Bot» Description: Latest news from ESPN.com
[10:13pm]«@ Bot» Link: http://espn.go.com/
instead of...
[9:57pm] <Me> !latest
[9:57pm]«@ Bot» Latest Top Headline: ESPN.com</title>
[9:57pm]«@ Bot» Published: Tue, 15 Nov 2005 16:49:23 PST</pubDate>
[9:57pm]«@ Bot» Description: Latest news from ESPN.com</description>
[9:57pm]«@ Bot» Link: http://espn.go.com/</link> <description>Latest news from ESPN.com</description>
Now I just need to have it get the right ones for Headline, Description and Link... any tips?
F
FTL25
Voice
Posts: 17
Joined: Mon Nov 14, 2005 5:39 pm

Post by FTL25 »

demond, what part of your rssnews code gets you to the right part of the XLM code? because of all the repeated patterns, you cant just use the first <title> for example... you got to make sure you're at the right spot to get the text you want. This is the part I still cant figure out!
User avatar
demond
Revered One
Posts: 3073
Joined: Sat Jun 12, 2004 9:58 am
Location: San Francisco, CA
Contact:

Post by demond »

RSS feeds have standard XML structure, for example <title> tags are enclosed by <item> tags
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use

Code: Select all

 tag when posting logs, code
F
FTL25
Voice
Posts: 17
Joined: Mon Nov 14, 2005 5:39 pm

Post by FTL25 »

Okay, this is the part of the code I'm looking at from the URL I'm getting the news from:

Code: Select all

  <?xml version="1.0" encoding="iso-8859-1" ?> 
- <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:atom="http://purl.org/atom/ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
- <channel>
  <title>ESPN.com</title> 
  <link>http://espn.go.com/</link> 
  <description>Latest news from ESPN.com</description> 
  <language>en-us</language> 
  <atom:link rel="start" href="http://sports.espn.go.com/espn/rss/news?null" /> 
  <lastBuildDate>Wed, 16 Nov 2005 20:15:52 PST</lastBuildDate> 
  <docs>http://backend.userland.com/rss</docs> 
  <managingEditor>webmaster@espn.go.com</managingEditor> 
- <image>
  <url>http://espn-att.starwave.com/i/tvlistings/tv_espn_original.gif</url> 
  <title>ESPN logo</title> 
  <link>http://espn.go.com</link> 
  <width>84</width> 
  <height>34</height> 
  </image>
  <ttl>30</ttl> 
  <dc:rights>Copyright 2005</dc:rights> 
  <admin:generatorAgent rdf:resource="http://espn.go.com/rss/?v=0.9beta" /> 
  <admin:errorReportsTo rdf:resource="mailto:customer.service@espn.go.com" /> 
  <sy:updatePeriod>hourly</sy:updatePeriod> 
  <sy:updateFrequency>1</sy:updateFrequency> 
  <sy:updateBase>2000-01-01T12:00+00:00</sy:updateBase> 
- <item>
- <dc:creator>
- <![CDATA[ John Carroll
  ]]> 
  </dc:creator>
- <title>
- <![CDATA[ Carroll: Comparing Brown and Jackson and their iffy teams
  ]]> 
  </title>
- <description>
- <![CDATA[ L.A. Showdown: Brown, Jackson meet again<br /><br /> by John Carroll<br/><br/>When Phil Jackson and Larry Brown walk onto the Staples Center floor tonight, it will be the first time these two coaches have met since June 16, 2004.  That was Game 5 of the NBA Finals and the Detroit Pistons, the team Brown coached, won 100-87, clinching...
  ]]> 
  </description>
  <pubDate>Wed, 16 Nov 2005 09:24:27 PST</pubDate> 
  <guid>http://insider.espn.go.com/nba/insider/columns/story?columnist=carroll_john&id=2225946&campaign=rss&source=ESPNHeadlines</guid> 
  <link>http://insider.espn.go.com/nba/insider/columns/story?columnist=carroll_john&id=2225946&campaign=rss&source=ESPNHeadlines</link> 
  </item>
Im trying to use

Code: Select all

<![CDATA[ Carroll: Comparing Brown and Jackson and their iffy teams ]]> 
For the $title

Code: Select all

<![CDATA[ L.A. Showdown: Brown, Jackson meet again<br /><br /> by John Carroll<br/><br/>When Phil Jackson and Larry Brown walk onto the Staples Center floor tonight, it will be the first time these two coaches have met since June 16, 2004.  That was Game 5 of the NBA Finals and the Detroit Pistons, the team Brown coached, won 100-87, clinching... ]]>
For the $description

and...

Code: Select all

<link>http://insider.espn.go.com/nba/insider/columns/story?columnist=carroll_john&id=2225946&campaign=rss&source=ESPNHeadlines</link> 
For the $link

As far as the $date in my script... I can get that, because there is only one "<pubDate> * </pubDate>"

The others ( $title $description and $link ) are the ones I'm having trouble with because I dont know how to tell the script to use the ones I want to use in that above XML code :cry:
User avatar
demond
Revered One
Posts: 3073
Joined: Sat Jun 12, 2004 9:58 am
Location: San Francisco, CA
Contact:

Post by demond »

if you can't grasp rssnews.tcl code - pretty streamlined use of Tcl and regexps which does exactly what you want - you probably need to study Tcl and regexps in greater details
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use

Code: Select all

 tag when posting logs, code
F
FTL25
Voice
Posts: 17
Joined: Mon Nov 14, 2005 5:39 pm

Post by FTL25 »

Okay... I've looked around and havent found too much on regexp.

From the "Enhancing Your Eggdrop" Page...
If you have some experience writing Tcl scripts and would like to write your own for Eggdrop, have a read through the Beginners Guide to TCL, and be sure to check out tcl-commands.doc in the /doc directory which contains information on all of Eggdrop's built-in Tcl commands. If you're completely new to Tcl, try the excellent Guide to TCL scripting for Eggdrop 1.6. And download yourself a copy of the Tcl Manual for quick reference.
I've read both of those guides and looked over the tcl.commands.doc briefly.

Do you know of any other guides or any good reads on this stuff anywhere else? I'll try all I can to learn it!
User avatar
demond
Revered One
Posts: 3073
Joined: Sat Jun 12, 2004 9:58 am
Location: San Francisco, CA
Contact:

Post by demond »

in these days when more & more people seem to be too lazy to help themselves, it's refreshing to see a person like yourself with a genuine desire to learn and code :)

the aforementioned beginner's guides will help you write the most basic scripts only and not much more than that; if you are serious about scripting, you need to be a decent programmer in the first place, i.e. to understand data structures & algorithms, memory management, operating system's facilities, networking and how computers operate in general (needless to say, you ought to be proficient in at least one real programming language, preferably C)

once you have the programming basics, you should buy a Tcl book and/or explore and study in great detail Tcl Developer Site and The Tcler's Wiki; these sites feature tons of learning resources for those who are eager to become serious scripters

of course, to be able to produce quality & powerful eggdrop scripts, you must also know tcl-commands.doc inside & out, even by heart :) better yet, you should dig into eggdrop's C source code and grasp its internals (prerequisite of which is knowing and understanding the IRC protocol as defined in RFC1459 and other technical documents)

I know that's not an easy path, and the learning curve could be steep & long; but if you are really serious, this is the way
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use

Code: Select all

 tag when posting logs, code
F
FTL25
Voice
Posts: 17
Joined: Mon Nov 14, 2005 5:39 pm

Post by FTL25 »

You're right, its not the easiest path, but its what I like to do!

I'm a major right now in computer science. Only took the COBOL and advanced COBOL courses so far, but C++ is coming up either this Spring or next Fall semester, and definitely VB this Spring. I tried the SAMs teach yourself C++ a couple years back, but lost interest :( I'm more of a visual learner and its a lot easier when someone is actually teaching it to me and there to show me some things. After learning some basics of TCL though, I realize how much COBOL sucks!!! haha

But anyways, thanks for those links... The Tcler's Wiki looks especially cool. Back to reading...

:)
Post Reply