| View previous topic :: View next topic |
| Author |
Message |
Jarek Voice
Joined: 19 Nov 2007 Posts: 3
|
Posted: Mon Nov 19, 2007 10:45 am Post subject: Regexp Problem |
|
|
Hi Folks.
I'd like to get the profile id value out of this line:
| Code: |
<td align="center"><a class="profil_link" href="javascript:;" onclick="window.open('/profile/index.php?profile_id=20129','_blank','width=730,height=600,status=no,toolbars=no,scrollbars=yes');"><img class="td_border" src="/pictures/60x80/11-07/20129_47400a57eefb8.jpg" width="60" height="80" border="0" alt="jaroslove"></a></td>
|
How I've to build the regular expression to get the value "20129"?
Thanks. |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Mon Nov 19, 2007 12:33 pm Post subject: |
|
|
Assuming the data is in a var called $html:
| Code: |
regexp {'/profile/index.php?profile_id=(.*?)'} $html fullmatch exactmatch
# the data you want will be in $exactmatch var.
|
|
|
| Back to top |
|
 |
Jarek Voice
Joined: 19 Nov 2007 Posts: 3
|
Posted: Mon Nov 19, 2007 1:08 pm Post subject: |
|
|
Hm, $exactmatch is empty after doing this.
My proc looks this like:
| Code: |
proc poloniaflirt::internalCom { suche } {
set fullmatch ""
set exactmatch ""
set log1 [open pf.txt a]
set log2 [open reg.txt a]
set pfsearchurl "http://www.polonia-flirt.de/search/index.php"
set pfquery [::http::formatQuery sea_nickname "$suche" send "send"]
set page [http::config -useragent "Mozilla/4.0 (compatible\; MSIE 6.0\; Windows NT 5.0)"]
set page [::http::geturl $pfsearchurl -query $pfquery -timeout $poloniaflirt::pftimeout]
set html [::http::data $page]
puts $log1 "$html"
close $log1
regexp {'/profile/index.php?profile_id=(.*?)'} $html fullmatch exactmatch
puts $log2 "$exactmatch"
close $log2
return $page
}
|
|
|
| Back to top |
|
 |
speechles Revered One

Joined: 26 Aug 2006 Posts: 1398 Location: emerald triangle, california (coastal redwoods)
|
Posted: Mon Nov 19, 2007 6:21 pm Post subject: |
|
|
| Jarek wrote: | | Code: | | regexp {'/profile/index.php?profile_id=(.*?)'} $html fullmatch exactmatch |
|
| Code: | | regexp {'/profile/index\.php\?profile_id=(.*?)'} $html fullmatch exactmatch |
You need to \escape the period(.) and you need to \escape the question mark(?) |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Mon Nov 19, 2007 7:29 pm Post subject: |
|
|
I don't think that would make much difference, as both . (dot) and ? are wildcard chars, so it should have matched the string.
Is the var $html empty of data? You don't handle any error conditions, so it could be that the data is not being retrieved.
Here is an example of getting html data and handling error conditions, then fishing out the data you want:
| Code: |
set xeurl "http://www.xe.com/ucc/convert.cgi"
set xequery [::http::formatQuery Amount "$amount" From "$fromcur" To "$tocur"]
catch {set page [::http::geturl $xeurl -query $xequery -timeout $xeutimeout]} error
if {[string match -nocase "*couldn't open socket*" $error]} {
puthelp "PRIVMSG $nick :Error: couldn't connect to XE.com..Try again later"
::http::cleanup $page
return
}
if { [::http::status $page] == "timeout" } {
puthelp "PRIVMSG $nick :Error: Connection timed out to XE.com."
::http::cleanup $page
return
}
set html [::http::data $page]
::http::cleanup $page
if {[regexp {>Live rates at (.*?)</span>} $html match xetime]} {
#some of the IF above has been deleted for this example
# manipulate the data:
regsub -all {<!.*?>} $fromamount {} fromamount
regsub -all {<!.*?>} $toamount {} toamount
puthelp "PRIVMSG $chan :XE.COM: \002$fromamount\002 equals \002$toamount\002 as of $xetime"
} else {
puthelp "PRIVMSG $chan :Could not obtain results from XE.com, sorry!"
}
|
|
|
| Back to top |
|
 |
nml375 Revered One
Joined: 04 Aug 2006 Posts: 2857
|
Posted: Mon Nov 19, 2007 9:56 pm Post subject: |
|
|
Half right, half wrong...
. would match any character, and would survive not being escaped.
? however does not match any characters by itself, but is used to match 0 or 1 occurances of the prefixed atom (in this case the character p). In this case it must be escaped. _________________ NML_375, idling at #eggdrop@IrcNET |
|
| Back to top |
|
 |
Jarek Voice
Joined: 19 Nov 2007 Posts: 3
|
Posted: Tue Nov 20, 2007 8:23 am Post subject: |
|
|
| Quote: | You need to \escape the period(.) and you need to \escape the question mark(?)
|
Thanks, mate! This was the right thing. I had to escape the special chars. Now it works! |
|
| Back to top |
|
 |
|