| View previous topic :: View next topic |
| Author |
Message |
stock Voice
Joined: 25 Oct 2005 Posts: 3
|
Posted: Tue Oct 25, 2005 10:17 pm Post subject: improving performance of randomquote.tcl + large log file |
|
|
running randomquote.tcl on a log file of 80 mb sort of sends my computer to a bad place. i don't really know tcl, nor any programming at all, so i'm mostly wondering about ways to speed up getting a random quote. would doing something like 'wc -l filename', and then having it somehow grab a random range of, say, 100 lines with 'head' and 'tail' be a good approach to investigate, or is that completely the wrong way to think about this issue? below is the script i'm currently using--it comes from mel.sf.net, though i changed it to use 'split' instead of regsub (or at least i think i did).
thanks for any ideas.
| Code: |
#### Random Quotes add-on for mEL 1.6.0
#### (c)2002 Jules <mel@angelbears.org>
####
#### !quote ?nickname?
#### Searches the channel log for a random quote and displays it
#### in the channel.
####
#### mEL 2.0 needed
#### source it after mel2.tcl
bind pub - "!rq" ::mel::rquote
putlog "Loading Random Quote script"
namespace eval mel {
proc rq_fixquote {line} {
regsub -all {\[} $line {\\[} line
regsub -all {\]} $line {\\]} line
regsub -all {\"} $line {\\"} line
return $line
}
proc rquote {nick host handle channel text} {
variable actives
foreach v $actives {variable $v}
if {$text == ""} {set rq_query [string tolower $nick]} else {set rq_query [lindex [string tolower [split $text]] 0]}
foreach rq_nr [array names channels] { lappend rq_chanlist [string tolower $channels($rq_nr)] }
if {$unixnames == 1} {set rq_chan [string tolower [string range $channel 1 end]]} else {set rq_chan [string tolower $channel]}
if {[lsearch -exact $rq_chanlist [string tolower $channel]] == -1 || ![file exists [file join $statslogdir $rq_chan].log]} {
putserv "PRIVMSG $channel : No random quote found for $rq_query!"
putlog "RandomQuote: No such channel or logfile: $channel"
return 0
}
set rq_lines -1
set rq_read [open [file join $statslogdir $rq_chan].log r]
while {![eof $rq_read]} {
set rq_data [split [gets $rq_read]]
if {[eof $rq_read]} {break}
if {[string match -nocase [split $rq_query] [string trim [lindex $rq_data 1] <>]]} {
incr rq_lines
set rq_userlines($rq_lines) $rq_data
}
}
close $rq_read
if {$rq_lines == -1} {putserv "PRIVMSG $channel : No random quote found for $rq_query!" ; return 0}
putserv "PRIVMSG $channel : [join $rq_userlines([rand $rq_lines])]"
array unset rq_userlines
return 0
}
}
|
|
|
| Back to top |
|
 |
egghead Master
Joined: 29 Oct 2001 Posts: 481
|
Posted: Wed Oct 26, 2005 4:33 am Post subject: |
|
|
If the log file only contains quotes, then a good way to go is:
1. to determine the size of the file,
2. set the read pointer to a value randomly between 0 and the filesize,
3. retrieve the line at the pointer location.
Tcl has the functions for each of the three steps. |
|
| Back to top |
|
 |
De Kus Revered One

Joined: 15 Dec 2002 Posts: 1361 Location: Germany
|
Posted: Wed Oct 26, 2005 8:20 am Post subject: |
|
|
| Code: | set size [expr {[file size [file join $statslogdir $rq_chan].log] -1}]
set rq_read [open [file join $statslogdir $rq_chan].log r]
for {set i 0] {$i < 20} {incr i} {
seek $rq_read [rand $size]
gets $rq_read
set y 0
while {![eof $rq_read] && $y < 100} {
set rq_data [split [gets $rq_read]]
if {[string match -nocase $rq_query [string trim [lindex $rq_data 1] <>]]} {
incr rq_lines
set rq_userlines($rq_lines) $rq_data
}
incr y
}
if {$rq_lines => 2} {break}
}
close $rq_read |
you mean something like that? that should search 100 lines on up to 20 random positions without parsing the whole file. depending on speed of system you could try with 50 positions and 200 lines or something like that. _________________ De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens... |
|
| Back to top |
|
 |
stock Voice
Joined: 25 Oct 2005 Posts: 3
|
Posted: Thu Oct 27, 2005 5:12 am Post subject: |
|
|
that was pretty much it exactly, De Kus. thank you.
i tried changing it a bit, so it would do perform the part you wrote three times. i know i could get the same outcome by changing the number of lines searched and the number of random searches, but i wanted to see if i could make the while loop work. does the following look at least somewhat like a correct way to have the tcl loop through the random search three times?
| Code: |
set rq_lines -1
# trying to setup that while loop
set tries 0
while {$tries < 3} {
# done according to http://forum.egghelp.org/viewtopic.php?p=56844#56844
set size [expr {[file size [file join $statslogdir $rq_chan].log] -1}]
set rq_read [open [file join $statslogdir $rq_chan].log r]
for {set i 0} {$i < 40} {incr i} {
seek $rq_read [rand $size]
gets $rq_read
set y 0
while {![eof $rq_read] && $y < 400} {
set rq_data [split [gets $rq_read]]
if {[string match -nocase [split $rq_query] [string trim [lindex $rq_data 1] <>]]} {
incr rq_lines
set rq_userlines($rq_lines) $rq_data
}
incr y
}
if {$rq_lines > 1} {break}
}
close $rq_read
unset i y
# more edits by me in an attempt to get this thing to loop three times
# while looking for a quote
if {$rq_lines == -1} {incr tries}
if {$tries == 3} {putserv "PRIVMSG $channel : no random quote found for $rq_query."}
if {$rq_lines != -1} {putserv "PRIVMSG $channel : [join $rq_userlines([rand $rq_lines])]"; set tries 3}
}
array unset rq_userlines
return 0
}
}
|
|
|
| Back to top |
|
 |
|