This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

parse website

Help for those learning Tcl or writing their own scripts.
Post Reply
K
Koepi
Voice
Posts: 26
Joined: Sun Aug 31, 2003 1:21 am

parse website

Post by Koepi »

Hi,

i like to parse a website. The Content i need is between <td class="txt" height="1"> and <br>

Code: Select all

<td class="txt" height="1">
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
<br>
Can someone explain me how i grab the lines?

greets Koepi
User avatar
demond
Revered One
Posts: 3073
Joined: Sat Jun 12, 2004 9:58 am
Location: San Francisco, CA
Contact:

Post by demond »

Code: Select all

regexp {<td.*?>(.*?)<br>} $string -> text
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use

Code: Select all

 tag when posting logs, code
K
Koepi
Voice
Posts: 26
Joined: Sun Aug 31, 2003 1:21 am

Post by Koepi »

Thx for your answer. :)

Until this time i parse website line by line

The website content is:

Code: Select all

<table border="0" cellpadding="0" cellspacing="0" width="600">
	  <tbody><tr>
	  	<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
		<td class="head" colspan="2">TITLE1</td>
	  </tr>
	  <tr>

	  	<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="1"></td>
		<td class="head" colspan="2">DATE</td>
	  </tr>
	  <tr>
		 <td colspan="3"><img src="woche_dateien/p_0.gif" alt="" border="0" height="24" width="1"></td>
	 </tr>
  
	  <tr>
	  	<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="1"></td>

		<td class="head" colspan="2">TITLE2</td>
	   </tr>
	  <tr>
		<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
		<td class="fliesstxt" height="1">
TEXT
TEXT
TEXT
TEXT
TEXT
<br>
		</td><td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
	   </tr>

	   <tr>
		 <td colspan="3"><img src="woche_dateien/p_0.gif" alt="" border="0" height="24" width="1"></td>
	 </tr>
  
	  <tr>
	  	<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="1"></td>
		<td class="head" colspan="2">TITLE3</td>
	   </tr>
	  <tr>

		<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
		<td class="fliesstxt" height="1">
TEXT
TEXT
TEXT
TEXT
TEXT<br>
		</td><td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
	   </tr>
	   <tr>
		 <td colspan="3"><img src="woche_dateien/p_0.gif" alt="" border="0" height="24" width="1"></td>
	 </tr>
  
	  <tr>

	  	<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="1"></td>
		<td class="head" colspan="2">TITLE4</td>
	   </tr>
	  <tr>
		<td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
		<td class="fliesstxt" height="1">
TEXT
TEXT
TEXT
TEXT
TEXT<br>
		</td><td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>

	   </tr>
	   <tr>
		 <td colspan="3"><img src="woche_dateien/p_0.gif" alt="" border="0" height="24" width="1"></td>
	 </tr>

	  <tr>
		 <td><img src="woche_dateien/p_0.gif" alt="" border="0" height="4" width="10"></td>
		 <td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="580"></td>
		 <td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>

	   </tr>
 </tbody></table>
i want the content in this order:

TITLE1
DATE
TITLE2
TEXT
TEXT
TEXT
TEXT
TEXT
TITLE3
TEXT
TEXT
TEXT
TEXT
TEXT
TITLE4
TEXT
TEXT
TEXT
TEXT
TEXT

I can grab the title and date with:

Code: Select all

   set data [http::geturl $url]
   set data2 [http::data $data]
   http::cleanup $data
   foreach line [split $data2 \n] {
    if {[regexp -nocase {<td class="head" colspan="2">(.*?)</td>} $line a ]} {
   puthelp "PRIVMSG $chan :$a"
but now i dont'n know how to integrate

Code: Select all

regexp {<td.*?>(.*?)<br>} $string -> text
My Script search the content line by line so he can't find <td.*?>(.*?)<br>.

hope my code is not too ugly ... :oops:
User avatar
demond
Revered One
Posts: 3073
Joined: Sat Jun 12, 2004 9:58 am
Location: San Francisco, CA
Contact:

Post by demond »

then simply apply regexp -all or call [regexp] in a loop
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use

Code: Select all

 tag when posting logs, code
Post Reply