| View previous topic :: View next topic |
| Author |
Message |
Wannabe Voice
Joined: 10 Feb 2006 Posts: 17
|
Posted: Thu May 03, 2007 2:48 pm Post subject: RegExp help please [SOLVED] |
|
|
Hey, im still learning regular expressions, and im pretty much stumped on this one, several people have tried to help me already and its just not right yet, im using a http package to read a website, and then parse for a specific piece of info on that website, the information im after is stored in a table.
The two lines that im looking at are :
<td width="45%"><font face="Verdana, Arial, sans-serif" size=2 class="fontNormal">Kills per Death:</font></td>
<td width="55%"><font face="Verdana, Arial, sans-serif" size=2 class="fontNormal">0.6193</font></td>
what i need to do, is check that the line before has Kills per Death, and then pull the value from the next line which in this case is 0.6193. there are several sections of the table that are identical appart from the text Kills per Death: hence why i need to check both lines.
i have tried many diffrent regexp to get this working, and all return nothing.
Any help and i would be very greatful.
Last edited by Wannabe on Thu May 03, 2007 8:11 pm; edited 1 time in total |
|
| Back to top |
|
 |
Sir_Fz Revered One

Joined: 27 Apr 2003 Posts: 3793 Location: Lebanon
|
Posted: Thu May 03, 2007 2:59 pm Post subject: |
|
|
| Code: | | regexp {\d+\.\d+} {<td width="55%"><font face="Verdana, Arial, sans-serif" size=2 class="fontNormal">0.6193</font></td>} value |
this will store 0.6193 in value. _________________ Follow me on GitHub
- Opposing
Public Tcl scripts
Last edited by Sir_Fz on Thu May 03, 2007 3:07 pm; edited 1 time in total |
|
| Back to top |
|
 |
Wannabe Voice
Joined: 10 Feb 2006 Posts: 17
|
Posted: Thu May 03, 2007 3:07 pm Post subject: |
|
|
That gives me a result of 4.01, which im not sure where its getting it from but its not correct.
The entire source that the regexp needs to search through is the source of this page : http://ns.wireplay.co.uk/hlstats.php?mode=playerinfo&player=55
if thats any help, i really dont understand that regexp you gave me atall. i stuggle to get my head around it
EDIT :
Ok i found that it match the 4.01 in this line
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
again i dont really have a clue how the regexp works, or id try fixing it myself |
|
| Back to top |
|
 |
Sir_Fz Revered One

Joined: 27 Apr 2003 Posts: 3793 Location: Lebanon
|
Posted: Thu May 03, 2007 3:13 pm Post subject: |
|
|
Well \d matches any digit, \. matches a period '.' and + (or {1,}) means 1 or more. An alternative regexp you can try is:
| Code: | | regexp {<.+><.+>(.+)<.+><.+>} {<td width="55%"><font face="Verdana, Arial, sans-serif" size=2 class="fontNormal">0.6193</font></td>} grbg value |
$value should contain 0.6193. _________________ Follow me on GitHub
- Opposing
Public Tcl scripts |
|
| Back to top |
|
 |
Wannabe Voice
Joined: 10 Feb 2006 Posts: 17
|
Posted: Thu May 03, 2007 3:24 pm Post subject: |
|
|
I think i explained badly, the regexp works on the entire source of the website, not just the two lines i posted, thats the reason i wanted to get the words Kills per Death: so that i was sure it was the right data.
the problem i have is the two seperate lines i dont know how to deal with. but thanks for explaining that regexp. it actually makes sence to me now  |
|
| Back to top |
|
 |
Sir_Fz Revered One

Joined: 27 Apr 2003 Posts: 3793 Location: Lebanon
|
Posted: Thu May 03, 2007 3:30 pm Post subject: |
|
|
If you provided code, it would've been easier. The concept is easy, this should explain it:
| Code: | # variable $lines is a list containing the html source
set notFound 1
foreach line $lines {
if {$notFound && [regexp {Kills\sper\sDeath:} $line]} {
set notFound 0
} elseif {!$notFound} {
regexp {\d+\.\d+} $line value
break
}
}
# $value contains the number. |
_________________ Follow me on GitHub
- Opposing
Public Tcl scripts |
|
| Back to top |
|
 |
Wannabe Voice
Joined: 10 Feb 2006 Posts: 17
|
Posted: Thu May 03, 2007 3:49 pm Post subject: |
|
|
ive attempted to do what you suggested, however it never seems to find the Kill per Deaths:
Im wondering if ive split the file wrong, ive written $::html to a text document, and it comes out exactly as it is in the source. so im not sure why it wouldnt work my code is :
| Code: |
set notFound 1
set lines [split $::html \n]
foreach line $lines {
if {$notFound && [regexp {Kills\sper\sDeath:} $line]} {
set notFound 0
} elseif {!$notFound} {
putquick "PRIVMSG $chan : Line $line found"
regexp {\d+\.\d+} $line value
putquick "PRIVMSG $chan : Value is $value"
break
}
} |
|
|
| Back to top |
|
 |
Sir_Fz Revered One

Joined: 27 Apr 2003 Posts: 3793 Location: Lebanon
|
Posted: Thu May 03, 2007 6:36 pm Post subject: |
|
|
Worked fine for me; tested it on tclsh
| Code: | proc bla {} {
set url "http://ns.wireplay.co.uk/hlstats.php?mode=playerinfo&player=55"
set token [::http::geturl $url]
set content [::http::data $token]
::http::cleanup $token
set notFound 1
foreach line [split $content \n] {
if {$notFound && [regexp {Kills\sper\sDeath:} $line]} {
set notFound 0
} elseif {!$notFound} {
regexp {\d+\.\d+} $line value
puts $value
break
}
}
} |
| Quote: | % package require http
2.5.2
% bla
0.6649 |
_________________ Follow me on GitHub
- Opposing
Public Tcl scripts
Last edited by Sir_Fz on Thu May 03, 2007 8:58 pm; edited 1 time in total |
|
| Back to top |
|
 |
Wannabe Voice
Joined: 10 Feb 2006 Posts: 17
|
Posted: Thu May 03, 2007 8:10 pm Post subject: |
|
|
Yep, sorry about that, i accedently deleted a character when removing some trash code, and it make it check the wrong html var, hence no result. its all fixed and working now, thanks  |
|
| Back to top |
|
 |
Sir_Fz Revered One

Joined: 27 Apr 2003 Posts: 3793 Location: Lebanon
|
Posted: Thu May 03, 2007 8:54 pm Post subject: |
|
|
This is a much faster method to grep the information:
| Code: | proc blo {} {
set url "http://ns.wireplay.co.uk/hlstats.php?mode=playerinfo&player=55"
set token [::http::geturl $url]
set content [split [::http::data $token] \n]
::http::cleanup $token
if {[set i [lsearch -glob $content {*Kills per Death:*}]]!=-1} {
regexp {\d+\.\d+} [lindex $content [incr i]] value
puts $value
}
} |
_________________ Follow me on GitHub
- Opposing
Public Tcl scripts |
|
| Back to top |
|
 |
|