egghelp.org community Forum Index
[ egghelp.org home | forum home ]
egghelp.org community
Discussion of eggdrop bots, shell accounts and tcl scripts.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

html parser

 
Post new topic   Reply to topic    egghelp.org community Forum Index -> Scripting Help
View previous topic :: View next topic  
Author Message
romprod
Halfop


Joined: 19 Oct 2001
Posts: 49

PostPosted: Mon Nov 29, 2010 2:45 pm    Post subject: html parser Reply with quote

Trying create a basic script to rip info from a url and spit it out to a channel but for some reason it aint working. Can anyone point out the obvious to me please as it's driving me crazy! Smile

Code:
# Config
set url "http://feed43.com/3222412860174114.xml"
set dcctrigger "test"
# End of config

if {![info exists egghttp(version)]} {
  putlog "egghttp.tcl was NOT successfully loaded."
  putlog "egghttp_example.tcl has not been loaded as a result."
} else {
  proc your_callbackproc {sock} {
    global url
    set headers [egghttp:headers $sock]
    set body [egghttp:data $sock]
 
    regsub -all "\n" $body "" body
    regsub -all -nocase {<br>} $body "<br>\n" body

    regexp {<b>(.*)<br/>} $body - team

    putlog "Team: $team"
  }

  bind dcc o|o $dcctrigger our:dcctrigger
  proc our:dcctrigger {hand idx text} {
    global url
    set sock [egghttp:geturl $url your_callbackproc]
    return 1
  } 

  putlog "egghttp_example.tcl has been successfully loaded."
}
Back to top
View user's profile Send private message
romprod
Halfop


Joined: 19 Oct 2001
Posts: 49

PostPosted: Tue Nov 30, 2010 10:57 am    Post subject: Reply with quote

The above script didn't work because of the page it was getting data from, i've changed the source now and it is working but I'm unable to make it loop through to the next line of text. I'll also include a sample of the html code i'm trying to parse.


Code:
set rssfeed "http://www.fred.co.uk"
set trigger "!latest"
set channel "#12321"

if {![info exists egghttp(version)]} {
  putlog "egghttp.tcl was NOT successfully loaded."
  putlog "egghttp_example.tcl has not been loaded as a result."
} else {
  proc your_callbackproc {sock} {

    global rssfeed channel

    set headers [egghttp:headers $sock]
    set body [egghttp:data $sock]

    regexp {"><h2>(.*?)</h2>} $body - date
    puthelp "PRIVMSG $channel : $date"

    set xml { $body } 
    foreach line [split $xml "\n"] {
    regexp {<td valign="top" class="tblRow colmNum000">(.*?)</td><td valign="top" class="tblRow">(.*?)</td></tr>} $body - time1 game1
    puthelp "PRIVMSG $channel : $time1 $game1"
   }
  }

  bind pub -|* $trigger top:trigger
  proc top:trigger {nick host hand chan text} {
    global rssfeed
    set sock [egghttp:geturl $rssfeed your_callbackproc]
    return 1
  }
}


HTML that I need to parse

Code:
<div class="content"><h1>Barclays Premier League fixtures</h1></div><div class="tblContain"><h2>4 Dec 2010</h2><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz1</td><td valign="top" class="tblRow">xxx1</td></tr></table><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz2</td><td valign="top" class="tblRow">xxx2</td></tr></table><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz3</td><td valign="top" class="tblRow">xxx3</td></tr></table><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz4</td><td valign="top" class="tblRow">xxx4</td></tr></table><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz5</td><td valign="top" class="tblRow">xxx5</td></tr></table><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz6</td><td valign="top" class="tblRow">xxx6</td></tr></table><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz7</td><td valign="top" class="tblRow">xxx7</td></tr></table><br/><h2>5 Dec 2010</h2><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz8</td><td valign="top" class="tblRow">xxx8</td></tr></table><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz9</td><td valign="top" class="tblRow">xxx9</td></tr></table><br/><h2>6 Dec 2010</h2><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz10</td><td valign="top" class="tblRow">xxx10</td></tr></table><br/><h2>11 Dec 2010</h2><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz11</td><td valign="top" class="tblRow">xxx11</td></tr></table><table class="tblResults" cellpadding="0" cellspacing="2" border="0"><tr>
<td valign="top" class="tblRow colmNum000">zzz12</td><td valign="top" class="tblRow">xxx12</td></tr></table></div><div class="content infoArea">


The only outcome will now be

Code:
[02:43:56] <@nick> !latest
[02:44:01] <+bot> 4 Dec 2010
[02:44:03] <+bot> zzz1 xxx1


But I would like

Code:
[02:43:56] <@nick> !latest
[02:44:01] <+bot> 4 Dec 2010
[02:44:03] <+bot> zzz1 xxx1
[02:44:03] <+bot> zzz2 xxx2
[02:44:03] <+bot> zzz3 xxx3
[02:44:03] <+bot> zzz4 xxx4
[02:44:03] <+bot> zzz5 xxx5
etc etc etc


Thanks in davance! Smile
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    egghelp.org community Forum Index -> Scripting Help All times are GMT - 4 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Forum hosting provided by Reverse.net

Powered by phpBB © 2001, 2005 phpBB Group
subGreen style by ktauber