| View previous topic :: View next topic |
| Author |
Message |
FTL25 Voice
Joined: 14 Nov 2005 Posts: 17
|
Posted: Tue Nov 15, 2005 8:34 pm Post subject: Quick REGEXP Help! |
|
|
I have the following line of XML I'm reading from:
<pubDate>Tue, 15 Nov 2005 13:44:46 PST</pubDate>
In the code I use:
regexp {<pubDate>(.*)</pubDate>} $body - date
to get it, and...
puthelp "PRIVMSG $channel :HeadLine: $served"
to output it.
As output, I expect to get:
[7:25pm]«@ BOTNICK» HeadLine: Tue, 15 Nov 2005 13:44:46 PST
instead, I get...
[7:25pm]«@ BOTNICK» HeadLine: Tue, 15 Nov 2005 13:44:46 PST</pubDate>
How do I get rid of that </pubDate> ??? |
|
| Back to top |
|
 |
FTL25 Voice
Joined: 14 Nov 2005 Posts: 17
|
Posted: Tue Nov 15, 2005 10:48 pm Post subject: |
|
|
Maybe I should have been more clear...
I'm trying to write my own rss news feed bot, from this site.
What I want to do is have the bot output
The Title
The Date
The Description
The Link
Of the most recent news feed. It looks like that will always be:
The 3rd "<title>" on that page for the Title of the article.
The 1st "<pubDate>" on that page for the article Date.
The 2nd "<description>" on that page for the article Description.
The 3rd "<link>" on that page, for the Link to the article.
This is how I was trying to get my info for the bot to pull. So far I could only get the "<pubDate>" one to work, because its always the 1st "<pubDate>" on the page lol... I havent figured out how to do the others yet. I guess I'll have to use a loop to bring it to the correct trigger on the page. I'll figure that out later For now I just want to figure out how to get rid of that "</pubDate>" at the end of my output!
Here's the code I'm using:
| Code: |
set rssfeed "http://sports.espn.go.com/espn/rss/news"
set trigger "!latest"
set channel "#chan"
if {![info exists egghttp(version)]} {
putlog "egghttp.tcl was NOT successfully loaded."
putlog "egghttp_example.tcl has not been loaded as a result."
} else {
proc your_callbackproc {sock} {
global rssfeed channel
set headers [egghttp:headers $sock]
set body [egghttp:data $sock]
regexp {<title>(.*)</title>} $body - title
regexp {<pubDate>(.*)</pubDate>} $body - date
regexp {<description>(.*)</description>} $body - desc
regexp {<link>(.*)</link>} $body - link
puthelp "PRIVMSG $channel :Latest Top Headline: $title"
puthelp "PRIVMSG $channel :Published: $date"
puthelp "PRIVMSG $channel :Description: $desc"
puthelp "PRIVMSG $channel :Link: $link"
}
bind pub -|* $trigger top:trigger
proc top:trigger {nick host hand chan text} {
global rssfeed
set sock [egghttp:geturl $rssfeed your_callbackproc]
return 1
}
putlog "egghttp_example.tcl has been successfully loaded."
}
|
Here's what the output looks like:
| Quote: | [9:57pm] <Me> !latest
[9:57pm]«@ Bot» Latest Top Headline: ESPN.com</title>
[9:57pm]«@ Bot» Published: Tue, 15 Nov 2005 16:49:23 PST</pubDate>
[9:57pm]«@ Bot» Description: Latest news from ESPN.com</description>
[9:57pm]«@ Bot» Link: http://espn.go.com/</link> <description>Latest news from ESPN.com</description>
|
I'm new to this, so go easy
It's really working exactly as its told to... just not working the way I want it to! |
|
| Back to top |
|
 |
demond Revered One

Joined: 12 Jun 2004 Posts: 3073 Location: San Francisco, CA
|
Posted: Tue Nov 15, 2005 11:10 pm Post subject: |
|
|
have a look at rssnews.tcl source
among other things, it does exactly what you need to do - parse XML and extract tag contents _________________ connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code |
|
| Back to top |
|
 |
FTL25 Voice
Joined: 14 Nov 2005 Posts: 17
|
Posted: Tue Nov 15, 2005 11:26 pm Post subject: |
|
|
Thanks demond. I've look at it more than once. Starting to understand it more as I learn more
I've replaced
| Quote: | regexp {<title>(.*)</title>} $body - title
regexp {<pubDate>(.*)</pubDate>} $body - date
regexp {<description>(.*)</description>} $body - desc
regexp {<link>(.*)</link>} $body - link |
with
| Quote: | regexp {<title>(.*?)</title>} $body - title
regexp {<pubDate>(.*?)</pubDate>} $body - date
regexp {<description>(.*?)</description>} $body - desc
regexp {<link>(.*?)</link>} $body - link |
and it got rid of the tags at the end of each line of output, the ones that werent supposed to be there.
So now it looks right... | Quote: | [10:13pm] <Me> !latest
[10:13pm]«@ Bot» Latest Top Headline: ESPN.com
[10:13pm]«@ Bot» Published: Tue, 15 Nov 2005 16:49:23 PST
[10:13pm]«@ Bot» Description: Latest news from ESPN.com
[10:13pm]«@ Bot» Link: http://espn.go.com/ |
instead of... | Quote: | [9:57pm] <Me> !latest
[9:57pm]«@ Bot» Latest Top Headline: ESPN.com</title>
[9:57pm]«@ Bot» Published: Tue, 15 Nov 2005 16:49:23 PST</pubDate>
[9:57pm]«@ Bot» Description: Latest news from ESPN.com</description>
[9:57pm]«@ Bot» Link: http://espn.go.com/</link> <description>Latest news from ESPN.com</description> |
Now I just need to have it get the right ones for Headline, Description and Link... any tips? |
|
| Back to top |
|
 |
FTL25 Voice
Joined: 14 Nov 2005 Posts: 17
|
Posted: Wed Nov 16, 2005 11:32 am Post subject: |
|
|
| demond, what part of your rssnews code gets you to the right part of the XLM code? because of all the repeated patterns, you cant just use the first <title> for example... you got to make sure you're at the right spot to get the text you want. This is the part I still cant figure out! |
|
| Back to top |
|
 |
demond Revered One

Joined: 12 Jun 2004 Posts: 3073 Location: San Francisco, CA
|
Posted: Wed Nov 16, 2005 11:12 pm Post subject: |
|
|
RSS feeds have standard XML structure, for example <title> tags are enclosed by <item> tags _________________ connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code |
|
| Back to top |
|
 |
FTL25 Voice
Joined: 14 Nov 2005 Posts: 17
|
Posted: Thu Nov 17, 2005 12:22 am Post subject: |
|
|
Okay, this is the part of the code I'm looking at from the URL I'm getting the news from:
| Code: | <?xml version="1.0" encoding="iso-8859-1" ?>
- <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:atom="http://purl.org/atom/ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
- <channel>
<title>ESPN.com</title>
<link>http://espn.go.com/</link>
<description>Latest news from ESPN.com</description>
<language>en-us</language>
<atom:link rel="start" href="http://sports.espn.go.com/espn/rss/news?null" />
<lastBuildDate>Wed, 16 Nov 2005 20:15:52 PST</lastBuildDate>
<docs>http://backend.userland.com/rss</docs>
<managingEditor>webmaster@espn.go.com</managingEditor>
- <image>
<url>http://espn-att.starwave.com/i/tvlistings/tv_espn_original.gif</url>
<title>ESPN logo</title>
<link>http://espn.go.com</link>
<width>84</width>
<height>34</height>
</image>
<ttl>30</ttl>
<dc:rights>Copyright 2005</dc:rights>
<admin:generatorAgent rdf:resource="http://espn.go.com/rss/?v=0.9beta" />
<admin:errorReportsTo rdf:resource="mailto:customer.service@espn.go.com" />
<sy:updatePeriod>hourly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<sy:updateBase>2000-01-01T12:00+00:00</sy:updateBase>
- <item>
- <dc:creator>
- <![CDATA[ John Carroll
]]>
</dc:creator>
- <title>
- <![CDATA[ Carroll: Comparing Brown and Jackson and their iffy teams
]]>
</title>
- <description>
- <![CDATA[ L.A. Showdown: Brown, Jackson meet again<br /><br /> by John Carroll<br/><br/>When Phil Jackson and Larry Brown walk onto the Staples Center floor tonight, it will be the first time these two coaches have met since June 16, 2004. That was Game 5 of the NBA Finals and the Detroit Pistons, the team Brown coached, won 100-87, clinching...
]]>
</description>
<pubDate>Wed, 16 Nov 2005 09:24:27 PST</pubDate>
<guid>http://insider.espn.go.com/nba/insider/columns/story?columnist=carroll_john&id=2225946&campaign=rss&source=ESPNHeadlines</guid>
<link>http://insider.espn.go.com/nba/insider/columns/story?columnist=carroll_john&id=2225946&campaign=rss&source=ESPNHeadlines</link>
</item> |
Im trying to use
| Code: | <![CDATA[ Carroll: Comparing Brown and Jackson and their iffy teams ]]>
|
For the $title
| Code: | <![CDATA[ L.A. Showdown: Brown, Jackson meet again<br /><br /> by John Carroll<br/><br/>When Phil Jackson and Larry Brown walk onto the Staples Center floor tonight, it will be the first time these two coaches have met since June 16, 2004. That was Game 5 of the NBA Finals and the Detroit Pistons, the team Brown coached, won 100-87, clinching... ]]>
|
For the $description
and...
| Code: | <link>http://insider.espn.go.com/nba/insider/columns/story?columnist=carroll_john&id=2225946&campaign=rss&source=ESPNHeadlines</link>
|
For the $link
As far as the $date in my script... I can get that, because there is only one "<pubDate> * </pubDate>"
The others ( $title $description and $link ) are the ones I'm having trouble with because I dont know how to tell the script to use the ones I want to use in that above XML code  |
|
| Back to top |
|
 |
demond Revered One

Joined: 12 Jun 2004 Posts: 3073 Location: San Francisco, CA
|
Posted: Thu Nov 17, 2005 1:11 am Post subject: |
|
|
if you can't grasp rssnews.tcl code - pretty streamlined use of Tcl and regexps which does exactly what you want - you probably need to study Tcl and regexps in greater details _________________ connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code |
|
| Back to top |
|
 |
FTL25 Voice
Joined: 14 Nov 2005 Posts: 17
|
Posted: Thu Nov 17, 2005 8:08 am Post subject: |
|
|
Okay... I've looked around and havent found too much on regexp.
From the "Enhancing Your Eggdrop" Page... | Quote: | | If you have some experience writing Tcl scripts and would like to write your own for Eggdrop, have a read through the Beginners Guide to TCL, and be sure to check out tcl-commands.doc in the /doc directory which contains information on all of Eggdrop's built-in Tcl commands. If you're completely new to Tcl, try the excellent Guide to TCL scripting for Eggdrop 1.6. And download yourself a copy of the Tcl Manual for quick reference. | I've read both of those guides and looked over the tcl.commands.doc briefly.
Do you know of any other guides or any good reads on this stuff anywhere else? I'll try all I can to learn it! |
|
| Back to top |
|
 |
demond Revered One

Joined: 12 Jun 2004 Posts: 3073 Location: San Francisco, CA
|
Posted: Fri Nov 18, 2005 1:17 am Post subject: |
|
|
in these days when more & more people seem to be too lazy to help themselves, it's refreshing to see a person like yourself with a genuine desire to learn and code
the aforementioned beginner's guides will help you write the most basic scripts only and not much more than that; if you are serious about scripting, you need to be a decent programmer in the first place, i.e. to understand data structures & algorithms, memory management, operating system's facilities, networking and how computers operate in general (needless to say, you ought to be proficient in at least one real programming language, preferably C)
once you have the programming basics, you should buy a Tcl book and/or explore and study in great detail Tcl Developer Site and The Tcler's Wiki; these sites feature tons of learning resources for those who are eager to become serious scripters
of course, to be able to produce quality & powerful eggdrop scripts, you must also know tcl-commands.doc inside & out, even by heart better yet, you should dig into eggdrop's C source code and grasp its internals (prerequisite of which is knowing and understanding the IRC protocol as defined in RFC1459 and other technical documents)
I know that's not an easy path, and the learning curve could be steep & long; but if you are really serious, this is the way _________________ connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code |
|
| Back to top |
|
 |
FTL25 Voice
Joined: 14 Nov 2005 Posts: 17
|
Posted: Fri Nov 18, 2005 1:47 am Post subject: |
|
|
You're right, its not the easiest path, but its what I like to do!
I'm a major right now in computer science. Only took the COBOL and advanced COBOL courses so far, but C++ is coming up either this Spring or next Fall semester, and definitely VB this Spring. I tried the SAMs teach yourself C++ a couple years back, but lost interest I'm more of a visual learner and its a lot easier when someone is actually teaching it to me and there to show me some things. After learning some basics of TCL though, I realize how much COBOL sucks!!! haha
But anyways, thanks for those links... The Tcler's Wiki looks especially cool. Back to reading...
 |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|