| View previous topic :: View next topic |
| Author |
Message |
bras Voice
Joined: 03 Feb 2006 Posts: 7
|
Posted: Mon Feb 06, 2006 12:59 am Post subject: Striping out character |
|
|
Hi,
I'm doing a script but I'm having some trouble to remove a character in a text. The text is TIME
I know that is 171 in ASCII code and 187, however I don't know how to represent them in a replacevar procedure. I tried:
set echo [replacevar $echo "\0171" ""]
set echo [replacevar $echo "\0187" ""]
Obviously didn't work Anyone could help me ? |
|
| Back to top |
|
 |
demond Revered One

Joined: 12 Jun 2004 Posts: 3073 Location: San Francisco, CA
|
Posted: Mon Feb 06, 2006 1:51 am Post subject: |
|
|
\0171 and \0187 are invalid character escapes, they should be \253 and \273 (since 171 decimal is 253 octal and 187 is 273)
| Code: |
string map {\253 {} \273 {}} $str
|
_________________ connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code |
|
| Back to top |
|
 |
bras Voice
Joined: 03 Feb 2006 Posts: 7
|
Posted: Mon Feb 06, 2006 9:09 am Post subject: |
|
|
Hi demond, thanks very much for getting some time to help me. You were right about the codes, however I don't know why I don't see to be able to strip them out. Here is what I'm doing
| Code: |
bind pubm "m|m" *\00312TIME* dotime
proc replacevar {strin what withwhat} {
set output $strin
set replacement $withwhat
set cutpos 0
while { [string first $what $output] != -1 } {
set cutstart [expr [string first $what $output] - 1]
set cutstop [expr $cutstart + [string length $what] + 1]
set output [string range $output 0 $cutstart]$replacement[string range $output $cutstop end]
}
return $output
}
proc dotime { nick host handle channel text } {
set text [split $text]
set time [lrange $text 5 end]
set echo $time
set echo [replacevar $echo "\253" ""]
set echo [replacevar $echo "\273" ""]
putserv "PRIVMSG #newsnet :$echo"
}
|
I don't know why the replacevar proc is not working for this characters. It has always worked for me. An example of the text where I'm stripping out would be:
In Rio de Janeiro : 23h 12m 30s TIME
What I want is only the time, which is not always in this format, that's why I'm trying to work with what is between : and
Would you have any idea why I can't strip out and ?
Thanks again! |
|
| Back to top |
|
 |
demond Revered One

Joined: 12 Jun 2004 Posts: 3073 Location: San Francisco, CA
|
Posted: Mon Feb 06, 2006 11:54 pm Post subject: |
|
|
get rid of that [replacevar] proc, Tcl has built-in proc for replacing string(s) within a string, it's called [string map] (there is also [string replace] of course, but it doesn't suit you for what you need) _________________ connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code |
|
| Back to top |
|
 |
bras Voice
Joined: 03 Feb 2006 Posts: 7
|
Posted: Tue Feb 07, 2006 12:44 am Post subject: |
|
|
I used string map too, didn't work neither.
| Quote: |
set data [string map {"\273" ""} $time]
|
Can remove everything else but those signs. Can't understand why. Thanks anyway for your patience demond. |
|
| Back to top |
|
 |
demond Revered One

Joined: 12 Jun 2004 Posts: 3073 Location: San Francisco, CA
|
Posted: Tue Feb 07, 2006 1:19 am Post subject: |
|
|
really?
| Code: |
% set a foo\273bar
foo?bar
% string map {\273 {}} $a
foobar
|
_________________ connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code |
|
| Back to top |
|
 |
spock Master
Joined: 12 Dec 2002 Posts: 319
|
Posted: Tue Feb 07, 2006 1:23 am Post subject: |
|
|
try \xAB and \xBB
actually f*** that, if demond's suggestion doesnt work then min ewont either (PEBKAC) _________________ photon? |
|
| Back to top |
|
 |
bras Voice
Joined: 03 Feb 2006 Posts: 7
|
Posted: Tue Feb 07, 2006 10:09 am Post subject: |
|
|
Yep... neither worked for me...
I found out that it's happening because there are color escapes near the characters I'm working with. Its not \003 though... are there (in case of yes, which are) any other ways to end a color escape besides \003 ? |
|
| Back to top |
|
 |
bras Voice
Joined: 03 Feb 2006 Posts: 7
|
Posted: Tue Feb 07, 2006 5:46 pm Post subject: |
|
|
| Just to show what I'm talking about... forgot about the image |
|
| Back to top |
|
 |
demond Revered One

Joined: 12 Jun 2004 Posts: 3073 Location: San Francisco, CA
|
Posted: Tue Feb 07, 2006 11:43 pm Post subject: |
|
|
you simply don't know your codes
print them out with:
| Code: |
foreach c [split $str {}] {binary scan $c H2 x; putlog "$c \\x$x"}
|
_________________ connection, sharing, dcc problems? click <here>
before asking for scripting help, read <this>
use [code] tag when posting logs, code |
|
| Back to top |
|
 |
awyeah Revered One

Joined: 26 Apr 2004 Posts: 1580 Location: Switzerland
|
Posted: Tue Jul 10, 2007 6:50 am Post subject: |
|
|
Actually hes right. Today I was working with this, researched deeply on this topic for 2-3hrs and tested my bot.
The only codes which can be removed, stripped, detected in string or list are from the following range:
| Code: |
In octal: \300-\377
In hexadecimal: \xC0-\xFF
|
I tried everything from regexp, regsub and even string map, but the codes from in the range:
| Code: |
In octal: \200-\277
In hexadecimal: \x80-\xBF
|
were not detected through anyway. For this I also performed some tests. Here is one of them shown.
In this one I use the whole range as you can see 128 chars and for regexp matching I used \200-\277 & 300-\377 to detect, generally all should be detected, but only \300-\377 were detected.
| Code: |
<awyeah> .tcl string length ""
<adapter> Tcl: 128
<awyeah> !test ""
<adapter> Remaining: ""
|
Further I also used regsub to substitude and string map also, they gave me similar answers.
So my conclusion, for wasting the whole afternoon and working on this was that:
In the character range:
| Code: |
octal: \200-\277 and \300-\377
hexadecimal: \x80-\xFF
|
Only the range:
| Code: |
In octal: \300-\377
In hexadecimal: \xC0-\xFF
|
is detectable. _________________ ·awyeah·
==================================
Facebook: jawad@idsia.ch (Jay Dee)
PS: Guys, I don't accept script helps or requests personally anymore.
==================================
Last edited by awyeah on Tue Jul 10, 2007 8:25 pm; edited 1 time in total |
|
| Back to top |
|
 |
awyeah Revered One

Joined: 26 Apr 2004 Posts: 1580 Location: Switzerland
|
Posted: Tue Jul 10, 2007 8:14 am Post subject: |
|
|
Follow up of my previous post. For testing:
In partyline I got this:
| Code: |
<awyeah> .tcl string map {"" "" "" "" "" "" "" "" "" "" "" ""} "werytyrtewretrwerwetertfg"
<adapter> Tcl: werytyrtewretrwerwetertfg
<awyeah> .tcl string match "**" "werytyrtewretrwerwetertfg"
<adapter> Tcl: 0
<awyeah> .tcl string match "**" "werytyrtewretrwerwetertfg"
<adapter> Tcl: 1
|
This indicates everything is working correctly in partyline.
Now check, when I load the tcl into the bot and then test.
For this proc, (tcl loaded into the bot):
| Code: |
bind pub - !test testing
proc testing {n u h c t} {
set i [string map {"\x8A" "" "\x8C" "" "\x8E" "" "\x9C" "" "\x9E" "" "\x9F" ""} $t]
putserv "PRIVMSG #adapter :String map: $i"
if {[string match -nocase "*\x8C*" $t] || [string match -nocase "*\x9E*" $t]} {
putserv "PRIVMSG #adapter :Match found"
} else {
putserv "PRIVMSG #adapter :No match found"
}
}
|
and for the same string, I got these results:
| Code: |
<awyeah> !test "werytyrtewretrwerwetertfg"
<adapter> String map: "werytyrtewretrwerwetertfg"
<adapter> No match found
|
Means there is definately something wrong.
Evidently, I also check for this proc:
| Code: |
bind pub - !test testing
proc testing {n u h c t} {
set i [string map {"" "" "" "" "" "" "" "" "" "" "" ""} $t]
putserv "PRIVMSG #adapter :String map: $t"
if {[string match -nocase "**" $t]} {
putserv "PRIVMSG #adapter :Match found"
} else {
putserv "PRIVMSG #adapter :No match found"
}
}
|
It also gave me the same result as above:
| Code: |
<awyeah> !test "werytyrtewretrwerwetertfg"
<adapter> String map: "werytyrtewretrwerwetertfg"
<adapter> No match found
|
Further more as a conclusion from what I've read there might be 2 identified problems for this case:
1) http://www.ascii.cl/htmlcodes.htm << this page lists that characters from the range \x80-\xBF (or \200-\277) are NOT defined in HTML 4 standard
2) From: /eggdrop/docs/known-problems
| Quote: |
* High-bit characters are being filtered from channel names. This is a
fault of the Tcl interpreter, and not Eggdrop. The Tcl interpreter
filters the characters when it reads a file for interpreting. Update
your Tcl to version 8.1 or higher.
* Version 8.1 of Tcl doesn't support unicode characters, for example, .
If those characters are handled in a script as text, you run into errors.
Eggdrop can't handle these errors at the moment.
|
However, strange as it may seem my shell provider has tcl version 8.4 and patch upto 8.4.11.
I think these major two are the basic problems, due to which my aim is not achievable. If anyone has anything to say or any comment, regarding my conclusion, please follow up my post.
Thanks,
JD _________________ ·awyeah·
==================================
Facebook: jawad@idsia.ch (Jay Dee)
PS: Guys, I don't accept script helps or requests personally anymore.
================================== |
|
| Back to top |
|
 |
awyeah Revered One

Joined: 26 Apr 2004 Posts: 1580 Location: Switzerland
|
Posted: Wed Jul 11, 2007 3:02 am Post subject: |
|
|
Actually, I got it infact. Its quite easy, I readup today about encoding different ascii character sets, and then tested on some. The major two which can be used for this case are: cp1252 and iso8859-1.
I tried with cp1252 for the proc below, it didnot completely strip the characters and ended up with stripping some and leaving some weird characters as you can see in the output.
| Code: |
bind pub - !test testing
proc testing {n u h c t} {
regsub -all {[\200-\377]} [encoding convertfrom cp1252 $t] {} a
putserv "privmsg #adapter :CP1252: $a"
regsub -all {[\200-\377]} [encoding convertfrom iso8859-1 $t] {} b
putserv "privmsg #adapter :ISO8859-1: $b"
}
|
When I used iso8859-1, everything was stripped off completely as I wanted it to be, see the results below.
| Code: |
<awyeah> !test "dffdgdffgddsderyrtdfdfertdfseerftdstrydsrtsdfrtyrtdsffsddsfsddfsdtrysdfsdtytrrtjhmjhmmkhjrtmkhjk,hjh,kluihjkhjkuytiuyikwefsewrddssdfdfsffsfssdsdfsddstyfrtsdsdfsd"
<adapter> CP1552: "dffdgdffg&d dsderyrt!df`9R}dfertdfse""a:Serft~dsxtrydsrtsdfrtyrtdsffsddsfsddfsdtrysdfsdtytrrtjhmjhmmkhjrtmkhjk,hjh,kluihjkhjkuytiuyikwefsewrddssdfdfsffsfssdsdfsddstyfrtsdsdfsd"
<adapter> ISO8859-1: "dffdgdffgddsderyrtdfdfertdfseerftdstrydsrtsdfrtyrtdsffsddsfsddfsdtrysdfsdtytrrtjhmjhmmkhjrtmkhjk,hjh,kluihjkhjkuytiuyikwefsewrddssdfdfsffsfssdsdfsddstyfrtsdsdfsd"
|
Hence to completely be able to use the complete range \200-\377 or \x80-\xFF you need to encode the text in the proc and convertfrom iso8859-1.
Mission successful! _________________ ·awyeah·
==================================
Facebook: jawad@idsia.ch (Jay Dee)
PS: Guys, I don't accept script helps or requests personally anymore.
================================== |
|
| Back to top |
|
 |
|