| View previous topic :: View next topic |
| Author |
Message |
Koepi Voice
Joined: 31 Aug 2003 Posts: 26
|
Posted: Sat Dec 03, 2005 3:20 pm Post subject: parse website |
|
|
Hi,
i like to parse a website. The Content i need is between <td class="txt" height="1"> and <br>
| Code: | <td class="txt" height="1">
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
<br> |
Can someone explain me how i grab the lines?
greets Koepi |
|
| Back to top |
|
 |
demond Revered One

Joined: 12 Jun 2004 Posts: 3073 Location: San Francisco, CA
|
Posted: Sat Dec 03, 2005 4:07 pm Post subject: |
|
|
| Code: |
regexp {<td.*?>(.*?)<br>} $string -> text
|
_________________ connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code |
|
| Back to top |
|
 |
Koepi Voice
Joined: 31 Aug 2003 Posts: 26
|
Posted: Sun Dec 04, 2005 2:57 pm Post subject: |
|
|
Thx for your answer.
Until this time i parse website line by line
The website content is:
| Code: |
<table border="0" cellpadding="0" cellspacing="0" width="600">
<tbody><tr>
<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
<td class="head" colspan="2">TITLE1</td>
</tr>
<tr>
<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="1"></td>
<td class="head" colspan="2">DATE</td>
</tr>
<tr>
<td colspan="3"><img src="woche_dateien/p_0.gif" alt="" border="0" height="24" width="1"></td>
</tr>
<tr>
<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="1"></td>
<td class="head" colspan="2">TITLE2</td>
</tr>
<tr>
<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
<td class="fliesstxt" height="1">
TEXT
TEXT
TEXT
TEXT
TEXT
<br>
</td><td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
</tr>
<tr>
<td colspan="3"><img src="woche_dateien/p_0.gif" alt="" border="0" height="24" width="1"></td>
</tr>
<tr>
<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="1"></td>
<td class="head" colspan="2">TITLE3</td>
</tr>
<tr>
<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
<td class="fliesstxt" height="1">
TEXT
TEXT
TEXT
TEXT
TEXT<br>
</td><td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
</tr>
<tr>
<td colspan="3"><img src="woche_dateien/p_0.gif" alt="" border="0" height="24" width="1"></td>
</tr>
<tr>
<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="1"></td>
<td class="head" colspan="2">TITLE4</td>
</tr>
<tr>
<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
<td class="fliesstxt" height="1">
TEXT
TEXT
TEXT
TEXT
TEXT<br>
</td><td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
</tr>
<tr>
<td colspan="3"><img src="woche_dateien/p_0.gif" alt="" border="0" height="24" width="1"></td>
</tr>
<tr>
<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="4" width="10"></td>
<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="580"></td>
<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
</tr>
</tbody></table>
|
i want the content in this order:
TITLE1
DATE
TITLE2
TEXT
TEXT
TEXT
TEXT
TEXT
TITLE3
TEXT
TEXT
TEXT
TEXT
TEXT
TITLE4
TEXT
TEXT
TEXT
TEXT
TEXT
I can grab the title and date with:
| Code: |
set data [http::geturl $url]
set data2 [http::data $data]
http::cleanup $data
foreach line [split $data2 \n] {
if {[regexp -nocase {<td class="head" colspan="2">(.*?)</td>} $line a ]} {
puthelp "PRIVMSG $chan :$a"
|
but now i dont'n know how to integrate
| Code: | | regexp {<td.*?>(.*?)<br>} $string -> text |
My Script search the content line by line so he can't find <td.*?>(.*?)<br>.
hope my code is not too ugly ...  |
|
| Back to top |
|
 |
demond Revered One

Joined: 12 Jun 2004 Posts: 3073 Location: San Francisco, CA
|
Posted: Sun Dec 04, 2005 8:39 pm Post subject: |
|
|
then simply apply regexp -all or call [regexp] in a loop _________________ connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code |
|
| Back to top |
|
 |
|