| View previous topic :: View next topic |
| Author |
Message |
thommey Halfop
Joined: 01 Apr 2008 Posts: 73
|
Posted: Tue Apr 01, 2008 3:20 pm Post subject: |
|
|
Hi,
interesting topic. I tested this patch and I could bind on utf-8 commands and join utf-8 channels (didn't have much time to test further).
BE AWARE: This patch FORCES UTF-8 support, so only apply it if your system supports it :) [This patch overrides eggdrops own mechanism to detect the encoding system it should use based on variables (LC_ALL,...). So this is totally a HACK and nothing to put into production code for compatibility reasons]
| Code: |
--- eggdrop1.6.18.original/src/main.h 2006-03-28 04:35:50.000000000 +0200
+++ eggdrop1.6.18.utf8/src/main.h 2008-04-01 20:57:29.000000000 +0200
@@ -44,7 +44,7 @@
#endif
#if (((TCL_MAJOR_VERSION == 8) && (TCL_MINOR_VERSION >= 1)) || (TCL_MAJOR_VERSION > 8))
-# define USE_TCL_BYTE_ARRAYS
+# undef USE_TCL_BYTE_ARRAYS
# define USE_TCL_ENCODING
#endif
diff -ur eggdrop1.6.18.original/src/tcl.c eggdrop1.6.18.utf8/src/tcl.c
--- eggdrop1.6.18.original/src/tcl.c 2006-03-28 04:35:50.000000000 +0200
+++ eggdrop1.6.18.utf8/src/tcl.c 2008-04-01 20:55:48.000000000 +0200
@@ -650,7 +650,7 @@
if (encoding == NULL) {
encoding = "iso8859-1";
}
-
+ encoding = "utf-8";
Tcl_SetSystemEncoding(NULL, encoding);
|
PS: Please tell me if it worked or not :) |
|
| Back to top |
|
 |
incith Master

Joined: 23 Apr 2005 Posts: 275 Location: Canada
|
Posted: Mon Jun 16, 2008 12:32 pm Post subject: |
|
|
This works, bot can output utf-8 properly now. I am posting this rather immediately with no after-testing. I did a !weather, output still works. _________________ ; Answer a few unanswered posts! |
|
| Back to top |
|
 |
De Kus Revered One

Joined: 15 Dec 2002 Posts: 1361 Location: Germany
|
Posted: Mon Jun 16, 2008 12:35 pm Post subject: |
|
|
Well, my problem wasn't the UTF-8 output. I didn't test the hack, but since my bots TCL-Encoding is already UTF-8, I am pretty sure it wouldn't change a thing. _________________ De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens... |
|
| Back to top |
|
 |
incith Master

Joined: 23 Apr 2005 Posts: 275 Location: Canada
|
Posted: Mon Jun 16, 2008 1:27 pm Post subject: |
|
|
Simply forcing encoding to utf-8 did not fix the output for me. I had to undef the USE_TCL_BYTE_ARRAYS line.
This is bizarre since I am on Tcl 8.4.
Oops, nevermind. >= 1. _________________ ; Answer a few unanswered posts! |
|
| Back to top |
|
 |
thommey Halfop
Joined: 01 Apr 2008 Posts: 73
|
Posted: Mon Jun 16, 2008 1:32 pm Post subject: |
|
|
De Kus, enforcing the tcl-encoding to be utf-8 is not the important part there, the other one is. And as it seems to work for 2 users now (including me), it's worth a try, isn't it? As incith mentioned, the key is undefining USE_TCL_BYTE_ARRAYS. That's the "clean" solution of making eggdrop use GetStringFromObj instead of GetByteArrayFromObj, what other users already found to be the source of the problem.
PS: Thanks for the feedback  |
|
| Back to top |
|
 |
MellowB Voice
Joined: 23 Jan 2008 Posts: 24 Location: Germany
|
Posted: Thu Jul 17, 2008 2:21 pm Post subject: |
|
|
Jep, can confirm that this is working. My eggdrop (1.6.19) is accepting and outputting UTF-8 correctly now, at least if the script that's used supports this.
Unfortunately most of the scripts, like the modded version of incith's google tcl by speechless do not since they use own workarounds and thus break it again. (it works semi fine with an unpatched bot and all the workarounds in the script but still not perfect, so using this patch here would be much better)
So yeah, thanks for the tip there thommey, this sure could be helpful in the future! _________________ On the keyboard of life, always keep one finger on the ESC key. |
|
| Back to top |
|
 |
moff Voice
Joined: 24 Jul 2008 Posts: 27
|
Posted: Thu Jul 24, 2008 9:26 pm Post subject: |
|
|
ok, sry guys im new too this...
eggdrop is compiled and runs good, but how do i install the utf-8 patch/hack ?
thanks! |
|
| Back to top |
|
 |
moff Voice
Joined: 24 Jul 2008 Posts: 27
|
Posted: Fri Jul 25, 2008 8:31 pm Post subject: |
|
|
| Code: | moff@HAL-9000:~/eggsource/eggdrop1.6.19$ patch -p0 < utf8patch.patch
can't find file to patch at input line 4
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|
|--- eggdrop1.6.19.original/src/main.h 2006-03-28 04:35:50.000000000 +0200
|+++ eggdrop1.6.19.utf8/src/main.h 2008-04-01 20:57:29.000000000 +0200
--------------------------
File to patch:
moff@HAL-9000:~/eggsource/eggdrop1.6.19$ patch -p1 < utf8patch.patch
patching file src/main.h
patching file src/tcl.c
Hunk #1 FAILED at 650.
1 out of 1 hunk FAILED -- saving rejects to file src/tcl.c.rej
moff@HAL-9000:~/eggsource/eggdrop1.6.19$
|
i get these errors... can someone help me please? |
|
| Back to top |
|
 |
De Kus Revered One

Joined: 15 Dec 2002 Posts: 1361 Location: Germany
|
Posted: Mon Aug 04, 2008 11:05 am Post subject: |
|
|
I know I am a little late for feedback, but the project was frozen for a little bit, so I was able to confirm it just now.
And yeah, it also fixed the issue with the bot unable to "listen" to utf-8 channel. I am truely amazed, that such a simple thing can fix such a troublesome issue. The only thing that seems not possible is to enter both the UTF-8 and the ISO-8859-1 name, at least it seems it ignores the ISO one for me.
PS: I only modified the main.h and skipped the one in the tcl.c. I should mention that I put "export LANG=de_DE.utf8" in the .bashrc, so the locale of the enviroment was already utf-8.
| moff wrote: | | i get these errors... can someone help me please? |
You should be fine, since the important change in the main.h was done without errors. Just make sure that your bot runs on a shell with utf-8 enabled environment. _________________ De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens... |
|
| Back to top |
|
 |
speechles Revered One

Joined: 26 Aug 2006 Posts: 1398 Location: emerald triangle, california (coastal redwoods)
|
Posted: Mon Sep 28, 2009 1:31 am Post subject: |
|
|
| MellowB wrote: | Jep, can confirm that this is working. My eggdrop (1.6.19) is accepting and outputting UTF-8 correctly now, at least if the script that's used supports this.
Unfortunately most of the scripts, like the modded version of incith's google tcl by speechless do not since they use own workarounds and thus break it again. (it works semi fine with an unpatched bot and all the workarounds in the script but still not perfect, so using this patch here would be much better)
So yeah, thanks for the tip there thommey, this sure could be helpful in the future! |
The future is now! Well, at least it is in response to the script mentioned above. Lately some development time has been found and that investment of time has now lead us to where we are today. See here for details, but suffice it to say that the modded version of incith google I've provided does in fact now fully support this patch method. So I encourage all those using this script and wanting truly multi-language utf-8 compliant script with perfect renderings of every character in both input and output I can now safely suggest you rush to patch your bots. Enjoy  _________________ speechles' eggdrop tcl archive |
|
| Back to top |
|
 |
De Kus Revered One

Joined: 15 Dec 2002 Posts: 1361 Location: Germany
|
Posted: Mon Sep 28, 2009 1:52 pm Post subject: |
|
|
Are you aware that the real problem has never been the messages going in and out but the channel/user names?! I see only multilingual messages, but not any channel name. Or am I just looking close enough? _________________ De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens... |
|
| Back to top |
|
 |
shadrach Halfop
Joined: 14 Dec 2007 Posts: 74
|
Posted: Mon Sep 28, 2009 2:04 pm Post subject: |
|
|
Trying to patch eggdrop to utf8. Can someone tell me what the problem in the execution is here?
| Code: | [*******@liberty (~/eggdrop1.6.19)]$ patch -p1 < utf8patch.patch
Hmm... Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|
|--- eggdrop1.6.19.original/src/main.h 2006-03-28 04:35:50.000000000 +0200
|+++ eggdrop1.6.19.utf8/src/main.h 2008-04-01 20:57:29.000000000 +0200
--------------------------
Patching file src/main.h using Plan A...
Hunk #1 succeeded at 44.
Hmm... The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff -ur eggdrop1.6.19.original/src/tcl.c eggdrop1.6.19.utf8/src/tcl.c
|--- eggdrop1.6.19.original/src/tcl.c 2006-03-28 04:35:50.000000000 +0200
|+++ eggdrop1.6.19.utf8/src/tcl.c 2008-04-01 20:55:48.000000000 +0200
--------------------------
Patching file src/tcl.c using Plan A...
Hunk #1 failed at 650.
1 out of 1 hunks failed--saving rejects to src/tcl.c.rej
done
[*******@liberty (~/eggdrop1.6.19)]$
|
|
|
| Back to top |
|
 |
Anahel Halfop

Joined: 03 Jul 2009 Posts: 48 Location: Dom!
|
Posted: Mon Sep 28, 2009 4:00 pm Post subject: |
|
|
had same problem, it patched only main.h, so i manually edited tcl.h, you need to add
| Code: | | encoding = "utf-8"; |
after this:
| Code: | if (encoding == NULL) {
encoding = "iso8859-1";
} |
so i should look like that:
| Code: | if (encoding == NULL) {
encoding = "iso8859-1";
}
encoding = "utf-8" |
|
|
| Back to top |
|
 |
speechles Revered One

Joined: 26 Aug 2006 Posts: 1398 Location: emerald triangle, california (coastal redwoods)
|
Posted: Mon Sep 28, 2009 5:11 pm Post subject: |
|
|
| De Kus wrote: | | Are you aware that the real problem has never been the messages going in and out but the channel/user names?! I see only multilingual messages, but not any channel name. Or am I just looking close enough? |
You are correct in the regard it isn't that hard to get correct output for utf-8. Depending on how manipulation of the string is done. If any elements within the string are replaced with any strings in any others encodings, these encoding will break the utf-8 represetation sequences. The rest of the string beyond this will be shown as iso8859-1 (meaning each byte is rendered, rather than sequencing them properly).
Input has always been affected for me. I'm surprised you haven't experienced it yet. This is the same reason you cannot join a utf-8 channel and instead get the incorrect so8859-1 encoding used. The same thing happens when trying to read a users input from within a bind. It seems for utf-8 any type of input fails (by fail, try nesting 2 languages in that utf-8: english and japanese or russian and french. Using just one makes it too easy). There are ways to work-around this, but they will still fail when dealing with accented vowels. The same way eggdrop's output does for some when dealing with accented vowels (most times they use an elaborate string map to fix this condition, see for yourself). Myself, I've noticed that the (Ã / ascii 195) confuses the utf-8 string, and breaks it back to iso8859-1 encoding. This happens when trying to render french accented sentences in utf-8 on an unpatched bot.
Plus this finally puts to rest those wishing better support for utf-8 within the script. So I felt was worth mentioning ;P _________________ speechles' eggdrop tcl archive |
|
| Back to top |
|
 |
shadrach Halfop
Joined: 14 Dec 2007 Posts: 74
|
Posted: Tue Sep 29, 2009 4:52 pm Post subject: |
|
|
| Thank you, I've got it working. |
|
| Back to top |
|
 |
|