egghelp.org community Forum Index
[ egghelp.org home | forum home ]
egghelp.org community
Discussion of eggdrop bots, shell accounts and tcl scripts.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Getting eggdrop ready for UTF-8
Goto page 1, 2  Next
 
Post new topic   Reply to topic    egghelp.org community Forum Index -> Modules & Programming
View previous topic :: View next topic  
Author Message
De Kus
Revered One


Joined: 15 Dec 2002
Posts: 1361
Location: Germany

PostPosted: Thu Aug 23, 2007 5:15 am    Post subject: Getting eggdrop ready for UTF-8 Reply with quote

It's been discussed often and all topics I found ended up using some kind of work around like downgrading to TCL 8.3 and stuff like that. However I want to complete the task in TCL 8.4 (or if it would solve the hassle, TCL 8.5, but I don't really want to install it yet on a production server) with the latest eggdrop version 1.6.18 or 1.6.19 CVS.

The first thing you can notice when it comes to UTF-8: something isn't right. I use de_DE.utf8 as enviroment locale and the config encoded in UTF-8 is read fine. So far so good. But when it comes to actuall server traffic something between input and output gets out of sync. The bot joins a channel which bytes correspens a UTF-8 to ISO-8859-1 conversion from the config while. If I now want the bot to join an UTF-8 Channel name I simple double encode to UTF-8. This works fine for joing the channel, the bot is there, chanserv gives it op, but:
It thinks it's not on the channel tries to join it every minute.

I tried to fix that by replacing some iso8859-1 stuff (except for the get_encoding thing) in tcl.c with near line 660 to UTF-8, but I have the impression that it changed nothing, because it was already using UTF-8 from the environment.

So my major question: can anyone help me tweaking eggdrop so far it can somehow read and write the same bytes to the server? I mean I don't care about internal encodings, since it's meant for German usage only, I wouldn't care conversion to an ANSI encoding either (though I'd prefer Windows-1252 to ISO-8859-1, because later one doesn't support 4 byte variantes and converts them to ?). I mean I wouldn't care either if there would be no conversion at all (everything is read and written in raw bytes).
_________________
De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens...
Back to top
View user's profile Send private message MSN Messenger
8an
Voice


Joined: 27 Dec 2007
Posts: 1

PostPosted: Thu Dec 27, 2007 11:03 am    Post subject: Patch Reply with quote

The problem wasn't in TCL locale, but in conversion from TCL object back to C string. This patch should fix it, but it may break something else:
Code:
--- eggdrop1.6.18.orig/src/tcl.c   2006-03-28 04:35:50.000000000 +0200
+++ eggdrop1.6.18/src/tcl.c   2007-12-26 23:52:50.000000000 +0100
@@ -339,7 +339,7 @@
   utftot += sizeof(char *) * objc;
   objc -= 5;
   for (i = 0; i < objc; i++) {
-    byteptr = (char *) Tcl_GetByteArrayFromObj(objv[i], &len);
+    byteptr = (char *) Tcl_GetStringFromObj(objv[i], &len);
     strings[i] = (char *) nmalloc(len + 1);
     utftot += len + 1;
     strncpy(strings[i], byteptr, len);
Back to top
View user's profile Send private message
djevrek
Voice


Joined: 31 Jul 2007
Posts: 11

PostPosted: Thu Feb 21, 2008 6:47 am    Post subject: Reply with quote

doesn't work for me.

I try to make my bot join #Србија and he joins #!@18X0 instead. When i make some other chan, and set all settings to that chan and make server redirect users from that new chan to #Србија bot still doesn't work. He joins it, but all settings and everything else doesn't work. I tried http://www.egghelp.org/files/patches/eggdrop1.6.18-sp.0007.desc this patch too, same thing, but i got some other error with starting channel module.
Back to top
View user's profile Send private message
thommey
Halfop


Joined: 01 Apr 2008
Posts: 73

PostPosted: Tue Apr 01, 2008 3:20 pm    Post subject: Reply with quote

Hi,

interesting topic. I tested this patch and I could bind on utf-8 commands and join utf-8 channels (didn't have much time to test further).

BE AWARE: This patch FORCES UTF-8 support, so only apply it if your system supports it :) [This patch overrides eggdrops own mechanism to detect the encoding system it should use based on variables (LC_ALL,...). So this is totally a HACK and nothing to put into production code for compatibility reasons]
Code:

--- eggdrop1.6.18.original/src/main.h   2006-03-28 04:35:50.000000000 +0200
+++ eggdrop1.6.18.utf8/src/main.h       2008-04-01 20:57:29.000000000 +0200
@@ -44,7 +44,7 @@
 #endif

 #if (((TCL_MAJOR_VERSION == 8) && (TCL_MINOR_VERSION >= 1)) || (TCL_MAJOR_VERSION > 8))
-#  define USE_TCL_BYTE_ARRAYS
+#  undef USE_TCL_BYTE_ARRAYS
 #  define USE_TCL_ENCODING
 #endif

diff -ur eggdrop1.6.18.original/src/tcl.c eggdrop1.6.18.utf8/src/tcl.c
--- eggdrop1.6.18.original/src/tcl.c    2006-03-28 04:35:50.000000000 +0200
+++ eggdrop1.6.18.utf8/src/tcl.c        2008-04-01 20:55:48.000000000 +0200
@@ -650,7 +650,7 @@
   if (encoding == NULL) {
     encoding = "iso8859-1";
   }
-
+  encoding = "utf-8";
   Tcl_SetSystemEncoding(NULL, encoding);


PS: Please tell me if it worked or not :)
Back to top
View user's profile Send private message
incith
Master


Joined: 23 Apr 2005
Posts: 275
Location: Canada

PostPosted: Mon Jun 16, 2008 12:32 pm    Post subject: Reply with quote

This works, bot can output utf-8 properly now. I am posting this rather immediately with no after-testing. I did a !weather, output still works.
_________________
; Answer a few unanswered posts!
Back to top
View user's profile Send private message
De Kus
Revered One


Joined: 15 Dec 2002
Posts: 1361
Location: Germany

PostPosted: Mon Jun 16, 2008 12:35 pm    Post subject: Reply with quote

Well, my problem wasn't the UTF-8 output. I didn't test the hack, but since my bots TCL-Encoding is already UTF-8, I am pretty sure it wouldn't change a thing.
_________________
De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens...
Back to top
View user's profile Send private message MSN Messenger
incith
Master


Joined: 23 Apr 2005
Posts: 275
Location: Canada

PostPosted: Mon Jun 16, 2008 1:27 pm    Post subject: Reply with quote

Simply forcing encoding to utf-8 did not fix the output for me. I had to undef the USE_TCL_BYTE_ARRAYS line.

This is bizarre since I am on Tcl 8.4.

Oops, nevermind. >= 1.
_________________
; Answer a few unanswered posts!
Back to top
View user's profile Send private message
thommey
Halfop


Joined: 01 Apr 2008
Posts: 73

PostPosted: Mon Jun 16, 2008 1:32 pm    Post subject: Reply with quote

De Kus, enforcing the tcl-encoding to be utf-8 is not the important part there, the other one is. And as it seems to work for 2 users now (including me), it's worth a try, isn't it? As incith mentioned, the key is undefining USE_TCL_BYTE_ARRAYS. That's the "clean" solution of making eggdrop use GetStringFromObj instead of GetByteArrayFromObj, what other users already found to be the source of the problem.


PS: Thanks for the feedback Smile
Back to top
View user's profile Send private message
MellowB
Voice


Joined: 23 Jan 2008
Posts: 24
Location: Germany

PostPosted: Thu Jul 17, 2008 2:21 pm    Post subject: Reply with quote

Jep, can confirm that this is working. My eggdrop (1.6.19) is accepting and outputting UTF-8 correctly now, at least if the script that's used supports this.
Unfortunately most of the scripts, like the modded version of incith's google tcl by speechless do not since they use own workarounds and thus break it again. (it works semi fine with an unpatched bot and all the workarounds in the script but still not perfect, so using this patch here would be much better)

So yeah, thanks for the tip there thommey, this sure could be helpful in the future!
_________________
On the keyboard of life, always keep one finger on the ESC key.
Back to top
View user's profile Send private message Visit poster's website
moff
Voice


Joined: 24 Jul 2008
Posts: 27

PostPosted: Thu Jul 24, 2008 9:26 pm    Post subject: Reply with quote

ok, sry guys im new too this...
eggdrop is compiled and runs good, but how do i install the utf-8 patch/hack ?

thanks!
Back to top
View user's profile Send private message
moff
Voice


Joined: 24 Jul 2008
Posts: 27

PostPosted: Fri Jul 25, 2008 8:31 pm    Post subject: Reply with quote

Code:
moff@HAL-9000:~/eggsource/eggdrop1.6.19$ patch -p0 < utf8patch.patch
can't find file to patch at input line 4
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|
|--- eggdrop1.6.19.original/src/main.h   2006-03-28 04:35:50.000000000 +0200
|+++ eggdrop1.6.19.utf8/src/main.h       2008-04-01 20:57:29.000000000 +0200
--------------------------
File to patch:
moff@HAL-9000:~/eggsource/eggdrop1.6.19$ patch -p1 < utf8patch.patch
patching file src/main.h
patching file src/tcl.c
Hunk #1 FAILED at 650.
1 out of 1 hunk FAILED -- saving rejects to file src/tcl.c.rej
moff@HAL-9000:~/eggsource/eggdrop1.6.19$


i get these errors... can someone help me please?
Back to top
View user's profile Send private message
De Kus
Revered One


Joined: 15 Dec 2002
Posts: 1361
Location: Germany

PostPosted: Mon Aug 04, 2008 11:05 am    Post subject: Reply with quote

I know I am a little late for feedback, but the project was frozen for a little bit, so I was able to confirm it just now.

And yeah, it also fixed the issue with the bot unable to "listen" to utf-8 channel. I am truely amazed, that such a simple thing can fix such a troublesome issue. The only thing that seems not possible is to enter both the UTF-8 and the ISO-8859-1 name, at least it seems it ignores the ISO one for me.

PS: I only modified the main.h and skipped the one in the tcl.c. I should mention that I put "export LANG=de_DE.utf8" in the .bashrc, so the locale of the enviroment was already utf-8.

moff wrote:
i get these errors... can someone help me please?

You should be fine, since the important change in the main.h was done without errors. Just make sure that your bot runs on a shell with utf-8 enabled environment.
_________________
De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens...
Back to top
View user's profile Send private message MSN Messenger
speechles
Revered One


Joined: 26 Aug 2006
Posts: 1398
Location: emerald triangle, california (coastal redwoods)

PostPosted: Mon Sep 28, 2009 1:31 am    Post subject: Reply with quote

MellowB wrote:
Jep, can confirm that this is working. My eggdrop (1.6.19) is accepting and outputting UTF-8 correctly now, at least if the script that's used supports this.
Unfortunately most of the scripts, like the modded version of incith's google tcl by speechless do not since they use own workarounds and thus break it again. (it works semi fine with an unpatched bot and all the workarounds in the script but still not perfect, so using this patch here would be much better)

So yeah, thanks for the tip there thommey, this sure could be helpful in the future!


The future is now! Well, at least it is in response to the script mentioned above. Lately some development time has been found and that investment of time has now lead us to where we are today. See here for details, but suffice it to say that the modded version of incith google I've provided does in fact now fully support this patch method. So I encourage all those using this script and wanting truly multi-language utf-8 compliant script with perfect renderings of every character in both input and output I can now safely suggest you rush to patch your bots. Enjoy Wink
_________________
speechles' eggdrop tcl archive
Back to top
View user's profile Send private message
De Kus
Revered One


Joined: 15 Dec 2002
Posts: 1361
Location: Germany

PostPosted: Mon Sep 28, 2009 1:52 pm    Post subject: Reply with quote

Are you aware that the real problem has never been the messages going in and out but the channel/user names?! I see only multilingual messages, but not any channel name. Or am I just looking close enough?
_________________
De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens...
Back to top
View user's profile Send private message MSN Messenger
shadrach
Halfop


Joined: 14 Dec 2007
Posts: 74

PostPosted: Mon Sep 28, 2009 2:04 pm    Post subject: Reply with quote

Trying to patch eggdrop to utf8. Can someone tell me what the problem in the execution is here?

Code:
[*******@liberty (~/eggdrop1.6.19)]$ patch -p1 < utf8patch.patch
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|
|--- eggdrop1.6.19.original/src/main.h   2006-03-28 04:35:50.000000000 +0200
|+++ eggdrop1.6.19.utf8/src/main.h       2008-04-01 20:57:29.000000000 +0200
--------------------------
Patching file src/main.h using Plan A...
Hunk #1 succeeded at 44.
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff -ur eggdrop1.6.19.original/src/tcl.c eggdrop1.6.19.utf8/src/tcl.c
|--- eggdrop1.6.19.original/src/tcl.c    2006-03-28 04:35:50.000000000 +0200
|+++ eggdrop1.6.19.utf8/src/tcl.c        2008-04-01 20:55:48.000000000 +0200
--------------------------
Patching file src/tcl.c using Plan A...
Hunk #1 failed at 650.
1 out of 1 hunks failed--saving rejects to src/tcl.c.rej
done
[*******@liberty (~/eggdrop1.6.19)]$
Back to top
View user's profile Send private message MSN Messenger
Display posts from previous:   
Post new topic   Reply to topic    egghelp.org community Forum Index -> Modules & Programming All times are GMT - 4 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Forum hosting provided by Reverse.net

Powered by phpBB © 2001, 2005 phpBB Group
subGreen style by ktauber