egghelp.org community Forum Index
[ egghelp.org home | forum home ]
egghelp.org community
Discussion of eggdrop bots, shell accounts and tcl scripts.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

parsing another website

 
Post new topic   Reply to topic    egghelp.org community Forum Index -> Scripting Help
View previous topic :: View next topic  
Author Message
theice
Voice


Joined: 13 Mar 2008
Posts: 36

PostPosted: Sun Mar 23, 2008 12:48 am    Post subject: parsing another website Reply with quote

Code:
set title [lrange $text 0 end]

putserv "PRIVMSG $c :$title:"
regexp {<td><b>"<a href="/wiki/.*?" title="$title">.*?</a>"</b></td>(.*?)</tr>} $data - data
regexp {<td><b><a href="/wiki/.*?" title="(.+?)">.*?</a></b></td>.*?<td>(.+?)</td>.*?<td>(.+?)</td>.*?<td>(.+?)</td>.*?<td>(.+?)</td>.*?<td>(.+?)</td>} $data - artist guitar bass drums vocals band
putserv "PRIVMSG $c :by-$artist , Difficulties: Guitar-$guitar , Bass-$bass , VoX-$vocals , Drums-$drums ,

Band-$band"

http::cleanup $data
         
}


working partially:

http://en.wikipedia.org/wiki/List_of_songs_in_Rock_Band

trying to grab the information from the site the problem is, its using different types of html coding for each title =[

Code:
[00:47] <@|ICE|> .song Black Hole Sun
[00:47] <+ICEdrop> Black Hole Sun:
[00:47] <+ICEdrop> by-Jet (band) , Difficulties: Guitar-Tier 6 , Bass-Tier 6 , VoX-Tier 7 , Drums-Tier 5 , Band-Tier 6


instead of grabbing the correct $title, it grabs the very first one "Are You Gonna Be My Girl"
Back to top
View user's profile Send private message
speechles
Revered One


Joined: 26 Aug 2006
Posts: 1398
Location: emerald triangle, california (coastal redwoods)

PostPosted: Sun Mar 23, 2008 11:13 pm    Post subject: Re: parsing another website Reply with quote

theice wrote:
Code:
regexp {<td><b>"<a href="/wiki/.*?" title="$title">.*?</a>"</b></td>(.*?)</tr>} $data - data

This is wrong, will never work within curly braces (substitution does not take place within curly bracings). The type of regexp you desire is known as a dynamic regexp. Look at the wikipedia/wikimedia portion of the unofficial google script, it uses these for #subtag look-ups. To use them correctly first build your regexp into a variable, then use quotes to build the regexp.

Code:
set dynregex "<td><b>\"<a href=\"/wiki/.*?\" title=\"$title\">.*?</a>\"</b></td>(.*?)</tr>"
if {![regexp "$dynregex" $data - data]} {
  #notfound
} {
  #found
}


Notice, you MUST escape quotes within other quotes, but within curly braces there is no need.

also, what is the purpose of this beauty?!
Code:
set title [lrange $text 0 end]
remember, do not confuse lists with strings, or vice versa. When you do unexpected behavior occurs, and you will be constantly fighting this later with code kludges and messy filters to compensate. It's always better to do it correctly to begin with.
Code:
set title [join [lrange [split $text] 0 end]]
Notice the split (to protect special characters mischevious users may try for input), then an lrange on the list split creates, and afterwards a join to turn this list back into a string. Remember, #1 rule of Tcl never confuse a list and a string.
Back to top
View user's profile Send private message
metroid
Owner


Joined: 16 Jun 2004
Posts: 771

PostPosted: Wed Apr 02, 2008 3:23 pm    Post subject: Reply with quote

though you told him how to use split and join properly, you still didn't fix that nasty lrange.

Using lrange $var 0 end is the exact same as not doing anything at all.

In this case, you can just use set title $text because "set title [join [lrange [split $text] 0 end]]" quite simply is the exact same.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    egghelp.org community Forum Index -> Scripting Help All times are GMT - 4 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Forum hosting provided by Reverse.net

Powered by phpBB © 2001, 2005 phpBB Group
subGreen style by ktauber