egghelp.org community Forum Index
[ egghelp.org home | forum home ]
egghelp.org community
Discussion of eggdrop bots, shell accounts and tcl scripts.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

parse website

 
Post new topic   Reply to topic    egghelp.org community Forum Index -> Scripting Help
View previous topic :: View next topic  
Author Message
Koepi
Voice


Joined: 31 Aug 2003
Posts: 26

PostPosted: Sat Dec 03, 2005 3:20 pm    Post subject: parse website Reply with quote

Hi,

i like to parse a website. The Content i need is between <td class="txt" height="1"> and <br>

Code:
<td class="txt" height="1">
TEXT
TEXT
TEXT
TEXT
TEXT
TEXT
<br>


Can someone explain me how i grab the lines?

greets Koepi
Back to top
View user's profile Send private message
demond
Revered One


Joined: 12 Jun 2004
Posts: 3073
Location: San Francisco, CA

PostPosted: Sat Dec 03, 2005 4:07 pm    Post subject: Reply with quote

Code:

regexp {<td.*?>(.*?)<br>} $string -> text

_________________
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code
Back to top
View user's profile Send private message Visit poster's website
Koepi
Voice


Joined: 31 Aug 2003
Posts: 26

PostPosted: Sun Dec 04, 2005 2:57 pm    Post subject: Reply with quote

Thx for your answer. Smile

Until this time i parse website line by line

The website content is:

Code:

<table border="0" cellpadding="0" cellspacing="0" width="600">
     <tbody><tr>
        <td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
      <td class="head" colspan="2">TITLE1</td>
     </tr>
     <tr>

        <td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="1"></td>
      <td class="head" colspan="2">DATE</td>
     </tr>
     <tr>
       <td colspan="3"><img src="woche_dateien/p_0.gif" alt="" border="0" height="24" width="1"></td>
    </tr>
 
     <tr>
        <td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="1"></td>

      <td class="head" colspan="2">TITLE2</td>
      </tr>
     <tr>
      <td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
      <td class="fliesstxt" height="1">
TEXT
TEXT
TEXT
TEXT
TEXT
<br>
      </td><td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
      </tr>

      <tr>
       <td colspan="3"><img src="woche_dateien/p_0.gif" alt="" border="0" height="24" width="1"></td>
    </tr>
 
     <tr>
        <td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="1"></td>
      <td class="head" colspan="2">TITLE3</td>
      </tr>
     <tr>

      <td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
      <td class="fliesstxt" height="1">
TEXT
TEXT
TEXT
TEXT
TEXT<br>
      </td><td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
      </tr>
      <tr>
       <td colspan="3"><img src="woche_dateien/p_0.gif" alt="" border="0" height="24" width="1"></td>
    </tr>
 
     <tr>

        <td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="1"></td>
      <td class="head" colspan="2">TITLE4</td>
      </tr>
     <tr>
      <td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>
      <td class="fliesstxt" height="1">
TEXT
TEXT
TEXT
TEXT
TEXT<br>
      </td><td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>

      </tr>
      <tr>
       <td colspan="3"><img src="woche_dateien/p_0.gif" alt="" border="0" height="24" width="1"></td>
    </tr>

     <tr>
       <td><img src="woche_dateien/p_0.gif" alt="" border="0" height="4" width="10"></td>
       <td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="580"></td>
       <td><img src="woche_dateien/p_0.gif" alt="" border="0" height="1" width="10"></td>

      </tr>
 </tbody></table>


i want the content in this order:

TITLE1
DATE
TITLE2
TEXT
TEXT
TEXT
TEXT
TEXT
TITLE3
TEXT
TEXT
TEXT
TEXT
TEXT
TITLE4
TEXT
TEXT
TEXT
TEXT
TEXT

I can grab the title and date with:
Code:

   set data [http::geturl $url]
   set data2 [http::data $data]
   http::cleanup $data
   foreach line [split $data2 \n] {
    if {[regexp -nocase {<td class="head" colspan="2">(.*?)</td>} $line a ]} {
   puthelp "PRIVMSG $chan :$a"


but now i dont'n know how to integrate
Code:
regexp {<td.*?>(.*?)<br>} $string -> text


My Script search the content line by line so he can't find <td.*?>(.*?)<br>.

hope my code is not too ugly ... Embarassed
Back to top
View user's profile Send private message
demond
Revered One


Joined: 12 Jun 2004
Posts: 3073
Location: San Francisco, CA

PostPosted: Sun Dec 04, 2005 8:39 pm    Post subject: Reply with quote

then simply apply regexp -all or call [regexp] in a loop
_________________
connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    egghelp.org community Forum Index -> Scripting Help All times are GMT - 4 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Forum hosting provided by Reverse.net

Powered by phpBB © 2001, 2005 phpBB Group
subGreen style by ktauber