| View previous topic :: View next topic |
| Author |
Message |
Metuant Voice
Joined: 28 Jul 2007 Posts: 3
|
Posted: Sat Jul 28, 2007 6:38 pm Post subject: Help with regexp/regsub |
|
|
Hi,
I'm trying to parse some information from a website using regsub and regexp, but i'm completely useless at regexp so now that they've updated their website the regexp no longer works.
The block of information I'm trying to parse (which is sometimes repeated multiple times - hence the while in the code) is:
<tr>
<td class='tablebottom'><img src="/img/member.gif" alt="[M]"/></td>
<!--name--><td class='tablebottom'>Abyssal whip</td>
<td class="tablebottom" title="Former average price: 1,650,000gp [decreased by 50,000gp]"><img src="/img/market/p_d.gif" alt="This price has decreased" /></td>
<!--price--><td class="tablebottom">1,550,000gp - 1,650,000gp</td>
<td class="tablebottom" width="20"><a href="/priceguide.php?report=45&par=" title="Report Incorrect Price"><img src="/img/!.gif" alt="[!]" border="0" /></a></td>
<td class="tablebottom"><a href="/priceguide.php?category=45">Obsidian & Abyssal</a></td>
</tr>
</table></form><br />
I'm trying to grab the item name (Abyssal whip) and its price (1,550,000gp - 1,650,000gp)
Using...
| Code: |
while {[regexp "<!--name--><td class=\'tablebottom\'>(.*?)</td>\n\n<!--price--><td class=\"tablebottom\">(.*?)</td>\n<td class=\"tablebottom\" width=\"20\">" $data junk tname tprice]} {
regsub "<!--name--><td class=\'tablebottom\'>[addslashes $tname]</td>\n\n<!--price--><td class=\"tablebottom\">[addslashes $tprice]</td>\n<td class=\"tablebottom\" width=\"20\">" $data - data
if {$i == 0 || ([string match [string tolower [string range $item 0 1]] [string tolower [string range $tname 0 1]]] && [string length $tname] < [string length $name])} {
set name $tname
set price $tprice |
I'm assuming that you can't just use \n\n to skip the line of useless data as I'd hoped..
Any help is appreciated |
|
| Back to top |
|
 |
speechles Revered One

Joined: 26 Aug 2006 Posts: 1398 Location: emerald triangle, california (coastal redwoods)
|
Posted: Sun Jul 29, 2007 3:56 am Post subject: |
|
|
Perhaps sanitize the data before you attempt to parse it. So newlines, carriage returns, tabs, etc.. get eliminated before you get to that step. | Code: | regsub -all "\t" $data "" data
regsub -all "\n" $data "" data
regsub -all "\r" $data "" data
regsub -all "\v" $data "" data |
You can use a quantifier to express a range. This snippet should work: | Code: | while {[regexp "<!--name--><td class=\'tablebottom\'>(.*?)</td>.*?<!--price--><td class=\"tablebottom\">(.*?)</td>.*?<td class=\"tablebottom\" width=\"20\">" $data junk tname tprice]} {
regsub "<!--name--><td class=\'tablebottom\'>[addslashes $tname]</td>.*?<!--price--><td class=\"tablebottom\">[addslashes $tprice]</td>.*?<td class=\"tablebottom\" width=\"20\">" $data - data
if {$i == 0 || ([string match [string tolower [string range $item 0 1]] [string tolower [string range $tname 0 1]]] && [string length $tname] < [string length $name])} {
set name $tname
set price $tprice |
|
|
| Back to top |
|
 |
Metuant Voice
Joined: 28 Jul 2007 Posts: 3
|
Posted: Sun Jul 29, 2007 7:03 am Post subject: |
|
|
| Thanks for the help - it works great :] |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|