This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

improving performance of randomquote.tcl + large log file

Help for those learning Tcl or writing their own scripts.
Post Reply
s
stock
Voice
Posts: 3
Joined: Tue Oct 25, 2005 7:48 pm
Contact:

improving performance of randomquote.tcl + large log file

Post by stock »

running randomquote.tcl on a log file of 80 mb sort of sends my computer to a bad place. i don't really know tcl, nor any programming at all, so i'm mostly wondering about ways to speed up getting a random quote. would doing something like 'wc -l filename', and then having it somehow grab a random range of, say, 100 lines with 'head' and 'tail' be a good approach to investigate, or is that completely the wrong way to think about this issue? below is the script i'm currently using--it comes from mel.sf.net, though i changed it to use 'split' instead of regsub (or at least i think i did).
thanks for any ideas.

Code: Select all

#### Random Quotes add-on for mEL 1.6.0
#### (c)2002 Jules <mel@angelbears.org>
####
#### !quote ?nickname?
#### Searches the channel log for a random quote and displays it
#### in the channel.
####
#### mEL 2.0 needed
#### source it after mel2.tcl

bind pub - "!rq" ::mel::rquote                                                            
        putlog "Loading Random Quote script"                                       
namespace eval mel {                                                                      
        proc rq_fixquote {line} {
                regsub -all {\[} $line {\\[} line
                regsub -all {\]} $line {\\]} line
                regsub -all {\"} $line {\\"} line
                return $line
        }
        
        proc rquote {nick host handle channel text} {
                variable actives
                foreach v $actives {variable $v}
                if {$text == ""} {set rq_query [string tolower $nick]} else {set rq_query [lindex [string tolower [split $text]] 0]}
                foreach rq_nr [array names channels] { lappend rq_chanlist [string tolower $channels($rq_nr)] }
                if {$unixnames == 1} {set rq_chan [string tolower [string range $channel 1 end]]} else {set rq_chan [string tolower $channel]}
                if {[lsearch -exact $rq_chanlist [string tolower $channel]] == -1 || ![file exists [file join $statslogdir $rq_chan].log]} {
                        putserv "PRIVMSG $channel : No random quote found for $rq_query!"                 
                        putlog "RandomQuote: No such channel or logfile: $channel"
                        return 0
                }
                set rq_lines -1
                set rq_read [open [file join $statslogdir $rq_chan].log r]
                while {![eof $rq_read]} {
                        set rq_data [split [gets $rq_read]]
                        if {[eof $rq_read]} {break}
                        if {[string match -nocase [split $rq_query] [string trim [lindex $rq_data 1] <>]]} {      
                                incr rq_lines
                                set rq_userlines($rq_lines) $rq_data
                        }
                }
                close $rq_read
                if {$rq_lines == -1} {putserv "PRIVMSG $channel : No random quote found for $rq_query!" ; return 0}
                putserv "PRIVMSG $channel : [join $rq_userlines([rand $rq_lines])]"               
                array unset rq_userlines
                return 0
        }

}
e
egghead
Master
Posts: 481
Joined: Mon Oct 29, 2001 8:00 pm
Contact:

Post by egghead »

If the log file only contains quotes, then a good way to go is:
1. to determine the size of the file,
2. set the read pointer to a value randomly between 0 and the filesize,
3. retrieve the line at the pointer location.
Tcl has the functions for each of the three steps.
User avatar
De Kus
Revered One
Posts: 1361
Joined: Sun Dec 15, 2002 11:41 am
Location: Germany

Post by De Kus »

Code: Select all

set size [expr {[file size [file join $statslogdir $rq_chan].log] -1}]
set rq_read [open [file join $statslogdir $rq_chan].log r]
for {set i 0] {$i < 20} {incr i} {
  seek $rq_read [rand $size]
  gets $rq_read
  set y 0
  while {![eof $rq_read] && $y < 100} {
    set rq_data [split [gets $rq_read]]
    if {[string match -nocase $rq_query [string trim [lindex $rq_data 1] <>]]} {
      incr rq_lines
      set rq_userlines($rq_lines) $rq_data
    }
    incr y
  }
  if {$rq_lines => 2} {break}
}
close $rq_read
you mean something like that? that should search 100 lines on up to 20 random positions without parsing the whole file. depending on speed of system you could try with 50 positions and 200 lines or something like that.
De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens...
s
stock
Voice
Posts: 3
Joined: Tue Oct 25, 2005 7:48 pm
Contact:

Post by stock »

that was pretty much it exactly, De Kus. thank you.

i tried changing it a bit, so it would do perform the part you wrote three times. i know i could get the same outcome by changing the number of lines searched and the number of random searches, but i wanted to see if i could make the while loop work. does the following look at least somewhat like a correct way to have the tcl loop through the random search three times?

Code: Select all

    set rq_lines -1
# trying to setup that while loop
    set tries 0
    while {$tries < 3} {
# done according to http://forum.egghelp.org/viewtopic.php?p=56844#56844
      set size [expr {[file size [file join $statslogdir $rq_chan].log] -1}]
      set rq_read [open [file join $statslogdir $rq_chan].log r]
        for {set i 0} {$i < 40} {incr i} {
        seek $rq_read [rand $size]
        gets $rq_read   
        set y 0         
        while {![eof $rq_read] && $y < 400} {
          set rq_data [split [gets $rq_read]]
          if {[string match -nocase [split $rq_query] [string trim [lindex $rq_data 1] <>]]} {                    
            incr rq_lines
            set rq_userlines($rq_lines) $rq_data
          }             
          incr y        
        }               
        if {$rq_lines > 1} {break}
      }                 
      close $rq_read    
      unset i y         
# more edits by me in an attempt to get this thing to loop three times
# while looking for a quote   
      if {$rq_lines == -1} {incr tries}
      if {$tries == 3} {putserv "PRIVMSG $channel : no random quote found for $rq_query."}                
      if {$rq_lines != -1} {putserv "PRIVMSG $channel : [join $rq_userlines([rand $rq_lines])]"; set tries 3}
      }
    array unset rq_userlines
    return 0            
  }
}
Post Reply