egghelp.org community Forum Index
[ egghelp.org home | forum home ]
egghelp.org community
Discussion of eggdrop bots, shell accounts and tcl scripts.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Quick REGEXP Help!

 
Post new topic   Reply to topic    egghelp.org community Forum Index -> Scripting Help
View previous topic :: View next topic  
Author Message
FTL25
Voice


Joined: 14 Nov 2005
Posts: 17

PostPosted: Tue Nov 15, 2005 8:34 pm    Post subject: Quick REGEXP Help! Reply with quote

I have the following line of XML I'm reading from:

<pubDate>Tue, 15 Nov 2005 13:44:46 PST</pubDate>

In the code I use:

regexp {<pubDate>(.*)</pubDate>} $body - date

to get it, and...

puthelp "PRIVMSG $channel :HeadLine: $served"

to output it.

As output, I expect to get:

[7:25pm]«@ BOTNICK» HeadLine: Tue, 15 Nov 2005 13:44:46 PST

instead, I get...

[7:25pm]«@ BOTNICK» HeadLine: Tue, 15 Nov 2005 13:44:46 PST</pubDate>

How do I get rid of that </pubDate> ???
Back to top
View user's profile Send private message
FTL25
Voice


Joined: 14 Nov 2005
Posts: 17

PostPosted: Tue Nov 15, 2005 10:48 pm    Post subject: Reply with quote

Maybe I should have been more clear...

I'm trying to write my own rss news feed bot, from this site.

What I want to do is have the bot output

The Title
The Date
The Description
The Link

Of the most recent news feed. It looks like that will always be:

The 3rd "<title>" on that page for the Title of the article.

The 1st "<pubDate>" on that page for the article Date.

The 2nd "<description>" on that page for the article Description.

The 3rd "<link>" on that page, for the Link to the article.

This is how I was trying to get my info for the bot to pull. So far I could only get the "<pubDate>" one to work, because its always the 1st "<pubDate>" on the page lol... I havent figured out how to do the others yet. I guess I'll have to use a loop to bring it to the correct trigger on the page. I'll figure that out later Wink For now I just want to figure out how to get rid of that "</pubDate>" at the end of my output!

Here's the code I'm using:

Code:

set rssfeed "http://sports.espn.go.com/espn/rss/news"
set trigger "!latest"
set channel "#chan"

if {![info exists egghttp(version)]} {
  putlog "egghttp.tcl was NOT successfully loaded."
  putlog "egghttp_example.tcl has not been loaded as a result."
} else {
  proc your_callbackproc {sock} {

    global rssfeed channel

    set headers [egghttp:headers $sock]
    set body [egghttp:data $sock]

    regexp {<title>(.*)</title>} $body - title
    regexp {<pubDate>(.*)</pubDate>} $body - date
    regexp {<description>(.*)</description>} $body - desc
    regexp {<link>(.*)</link>} $body - link

    puthelp "PRIVMSG $channel :Latest Top Headline: $title"
    puthelp "PRIVMSG $channel :Published: $date"
    puthelp "PRIVMSG $channel :Description: $desc"
    puthelp "PRIVMSG $channel :Link: $link"
  }

  bind pub -|* $trigger top:trigger
  proc top:trigger {nick host hand chan text} {
    global rssfeed
    set sock [egghttp:geturl $rssfeed your_callbackproc]
    return 1
  } 

  putlog "egghttp_example.tcl has been successfully loaded."
}


Here's what the output looks like:

Quote:
[9:57pm] <Me> !latest
[9:57pm]«@ Bot» Latest Top Headline: ESPN.com</title>
[9:57pm]«@ Bot» Published: Tue, 15 Nov 2005 16:49:23 PST</pubDate>
[9:57pm]«@ Bot» Description: Latest news from ESPN.com</description>
[9:57pm]«@ Bot» Link: http://espn.go.com/</link> <description>Latest news from ESPN.com</description>

I'm new to this, so go easy Wink

It's really working exactly as its told to... just not working the way I want it to!
Back to top
View user's profile Send private message
demond
Revered One


Joined: 12 Jun 2004
Posts: 3073
Location: San Francisco, CA

PostPosted: Tue Nov 15, 2005 11:10 pm    Post subject: Reply with quote

have a look at rssnews.tcl source

among other things, it does exactly what you need to do - parse XML and extract tag contents
_________________
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code
Back to top
View user's profile Send private message Visit poster's website
FTL25
Voice


Joined: 14 Nov 2005
Posts: 17

PostPosted: Tue Nov 15, 2005 11:26 pm    Post subject: Reply with quote

Thanks demond. I've look at it more than once. Starting to understand it more as I learn more Smile

I've replaced
Quote:
regexp {<title>(.*)</title>} $body - title
regexp {<pubDate>(.*)</pubDate>} $body - date
regexp {<description>(.*)</description>} $body - desc
regexp {<link>(.*)</link>} $body - link


with
Quote:
regexp {<title>(.*?)</title>} $body - title
regexp {<pubDate>(.*?)</pubDate>} $body - date
regexp {<description>(.*?)</description>} $body - desc
regexp {<link>(.*?)</link>} $body - link


and it got rid of the tags at the end of each line of output, the ones that werent supposed to be there.

So now it looks right...
Quote:
[10:13pm] <Me> !latest
[10:13pm]«@ Bot» Latest Top Headline: ESPN.com
[10:13pm]«@ Bot» Published: Tue, 15 Nov 2005 16:49:23 PST
[10:13pm]«@ Bot» Description: Latest news from ESPN.com
[10:13pm]«@ Bot» Link: http://espn.go.com/


instead of...
Quote:
[9:57pm] <Me> !latest
[9:57pm]«@ Bot» Latest Top Headline: ESPN.com</title>
[9:57pm]«@ Bot» Published: Tue, 15 Nov 2005 16:49:23 PST</pubDate>
[9:57pm]«@ Bot» Description: Latest news from ESPN.com</description>
[9:57pm]«@ Bot» Link: http://espn.go.com/</link> <description>Latest news from ESPN.com</description>


Now I just need to have it get the right ones for Headline, Description and Link... any tips?
Back to top
View user's profile Send private message
FTL25
Voice


Joined: 14 Nov 2005
Posts: 17

PostPosted: Wed Nov 16, 2005 11:32 am    Post subject: Reply with quote

demond, what part of your rssnews code gets you to the right part of the XLM code? because of all the repeated patterns, you cant just use the first <title> for example... you got to make sure you're at the right spot to get the text you want. This is the part I still cant figure out!
Back to top
View user's profile Send private message
demond
Revered One


Joined: 12 Jun 2004
Posts: 3073
Location: San Francisco, CA

PostPosted: Wed Nov 16, 2005 11:12 pm    Post subject: Reply with quote

RSS feeds have standard XML structure, for example <title> tags are enclosed by <item> tags
_________________
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code
Back to top
View user's profile Send private message Visit poster's website
FTL25
Voice


Joined: 14 Nov 2005
Posts: 17

PostPosted: Thu Nov 17, 2005 12:22 am    Post subject: Reply with quote

Okay, this is the part of the code I'm looking at from the URL I'm getting the news from:

Code:
  <?xml version="1.0" encoding="iso-8859-1" ?>
- <rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:atom="http://purl.org/atom/ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/">
- <channel>
  <title>ESPN.com</title>
  <link>http://espn.go.com/</link>
  <description>Latest news from ESPN.com</description>
  <language>en-us</language>
  <atom:link rel="start" href="http://sports.espn.go.com/espn/rss/news?null" />
  <lastBuildDate>Wed, 16 Nov 2005 20:15:52 PST</lastBuildDate>
  <docs>http://backend.userland.com/rss</docs>
  <managingEditor>webmaster@espn.go.com</managingEditor>
- <image>
  <url>http://espn-att.starwave.com/i/tvlistings/tv_espn_original.gif</url>
  <title>ESPN logo</title>
  <link>http://espn.go.com</link>
  <width>84</width>
  <height>34</height>
  </image>
  <ttl>30</ttl>
  <dc:rights>Copyright 2005</dc:rights>
  <admin:generatorAgent rdf:resource="http://espn.go.com/rss/?v=0.9beta" />
  <admin:errorReportsTo rdf:resource="mailto:customer.service@espn.go.com" />
  <sy:updatePeriod>hourly</sy:updatePeriod>
  <sy:updateFrequency>1</sy:updateFrequency>
  <sy:updateBase>2000-01-01T12:00+00:00</sy:updateBase>
- <item>
- <dc:creator>
- <![CDATA[ John Carroll
  ]]>
  </dc:creator>
- <title>
- <![CDATA[ Carroll: Comparing Brown and Jackson and their iffy teams
  ]]>
  </title>
- <description>
- <![CDATA[ L.A. Showdown: Brown, Jackson meet again<br /><br /> by John Carroll<br/><br/>When Phil Jackson and Larry Brown walk onto the Staples Center floor tonight, it will be the first time these two coaches have met since June 16, 2004.  That was Game 5 of the NBA Finals and the Detroit Pistons, the team Brown coached, won 100-87, clinching...
  ]]>
  </description>
  <pubDate>Wed, 16 Nov 2005 09:24:27 PST</pubDate>
  <guid>http://insider.espn.go.com/nba/insider/columns/story?columnist=carroll_john&id=2225946&campaign=rss&source=ESPNHeadlines</guid>
  <link>http://insider.espn.go.com/nba/insider/columns/story?columnist=carroll_john&id=2225946&campaign=rss&source=ESPNHeadlines</link>
  </item>


Im trying to use

Code:
<![CDATA[ Carroll: Comparing Brown and Jackson and their iffy teams ]]>


For the $title

Code:
<![CDATA[ L.A. Showdown: Brown, Jackson meet again<br /><br /> by John Carroll<br/><br/>When Phil Jackson and Larry Brown walk onto the Staples Center floor tonight, it will be the first time these two coaches have met since June 16, 2004.  That was Game 5 of the NBA Finals and the Detroit Pistons, the team Brown coached, won 100-87, clinching... ]]>


For the $description

and...

Code:
<link>http://insider.espn.go.com/nba/insider/columns/story?columnist=carroll_john&id=2225946&campaign=rss&source=ESPNHeadlines</link>


For the $link

As far as the $date in my script... I can get that, because there is only one "<pubDate> * </pubDate>"

The others ( $title $description and $link ) are the ones I'm having trouble with because I dont know how to tell the script to use the ones I want to use in that above XML code Crying or Very sad
Back to top
View user's profile Send private message
demond
Revered One


Joined: 12 Jun 2004
Posts: 3073
Location: San Francisco, CA

PostPosted: Thu Nov 17, 2005 1:11 am    Post subject: Reply with quote

if you can't grasp rssnews.tcl code - pretty streamlined use of Tcl and regexps which does exactly what you want - you probably need to study Tcl and regexps in greater details
_________________
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code
Back to top
View user's profile Send private message Visit poster's website
FTL25
Voice


Joined: 14 Nov 2005
Posts: 17

PostPosted: Thu Nov 17, 2005 8:08 am    Post subject: Reply with quote

Okay... I've looked around and havent found too much on regexp.

From the "Enhancing Your Eggdrop" Page...
Quote:
If you have some experience writing Tcl scripts and would like to write your own for Eggdrop, have a read through the Beginners Guide to TCL, and be sure to check out tcl-commands.doc in the /doc directory which contains information on all of Eggdrop's built-in Tcl commands. If you're completely new to Tcl, try the excellent Guide to TCL scripting for Eggdrop 1.6. And download yourself a copy of the Tcl Manual for quick reference.
I've read both of those guides and looked over the tcl.commands.doc briefly.

Do you know of any other guides or any good reads on this stuff anywhere else? I'll try all I can to learn it!
Back to top
View user's profile Send private message
demond
Revered One


Joined: 12 Jun 2004
Posts: 3073
Location: San Francisco, CA

PostPosted: Fri Nov 18, 2005 1:17 am    Post subject: Reply with quote

in these days when more & more people seem to be too lazy to help themselves, it's refreshing to see a person like yourself with a genuine desire to learn and code Smile

the aforementioned beginner's guides will help you write the most basic scripts only and not much more than that; if you are serious about scripting, you need to be a decent programmer in the first place, i.e. to understand data structures & algorithms, memory management, operating system's facilities, networking and how computers operate in general (needless to say, you ought to be proficient in at least one real programming language, preferably C)

once you have the programming basics, you should buy a Tcl book and/or explore and study in great detail Tcl Developer Site and The Tcler's Wiki; these sites feature tons of learning resources for those who are eager to become serious scripters

of course, to be able to produce quality & powerful eggdrop scripts, you must also know tcl-commands.doc inside & out, even by heart Smile better yet, you should dig into eggdrop's C source code and grasp its internals (prerequisite of which is knowing and understanding the IRC protocol as defined in RFC1459 and other technical documents)

I know that's not an easy path, and the learning curve could be steep & long; but if you are really serious, this is the way
_________________
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code
Back to top
View user's profile Send private message Visit poster's website
FTL25
Voice


Joined: 14 Nov 2005
Posts: 17

PostPosted: Fri Nov 18, 2005 1:47 am    Post subject: Reply with quote

You're right, its not the easiest path, but its what I like to do!

I'm a major right now in computer science. Only took the COBOL and advanced COBOL courses so far, but C++ is coming up either this Spring or next Fall semester, and definitely VB this Spring. I tried the SAMs teach yourself C++ a couple years back, but lost interest Sad I'm more of a visual learner and its a lot easier when someone is actually teaching it to me and there to show me some things. After learning some basics of TCL though, I realize how much COBOL sucks!!! haha

But anyways, thanks for those links... The Tcler's Wiki looks especially cool. Back to reading...

Smile
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    egghelp.org community Forum Index -> Scripting Help All times are GMT - 4 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Forum hosting provided by Reverse.net

Powered by phpBB © 2001, 2005 phpBB Group
subGreen style by ktauber