This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

Getting eggdrop ready for UTF-8

Discussion of Eggdrop's code and module programming in C.
User avatar
Anahel
Halfop
Posts: 48
Joined: Fri Jul 03, 2009 6:18 pm
Location: Dom!

Post by Anahel »

had same problem, it patched only main.h, so i manually edited tcl.h, you need to add

Code: Select all

encoding = "utf-8"; 
after this:

Code: Select all

if (encoding == NULL) {
     encoding = "iso8859-1";
   } 

so i should look like that:

Code: Select all

if (encoding == NULL) {
     encoding = "iso8859-1";
   } 

encoding = "utf-8"
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

De Kus wrote:Are you aware that the real problem has never been the messages going in and out but the channel/user names?! I see only multilingual messages, but not any channel name. Or am I just looking close enough?
You are correct in the regard it isn't that hard to get correct output for utf-8. Depending on how manipulation of the string is done. If any elements within the string are replaced with any strings in any others encodings, these encoding will break the utf-8 represetation sequences. The rest of the string beyond this will be shown as iso8859-1 (meaning each byte is rendered, rather than sequencing them properly).

Input has always been affected for me. I'm surprised you haven't experienced it yet. This is the same reason you cannot join a utf-8 channel and instead get the incorrect so8859-1 encoding used. The same thing happens when trying to read a users input from within a bind. It seems for utf-8 any type of input fails (by fail, try nesting 2 languages in that utf-8: english and japanese or russian and french. Using just one makes it too easy). There are ways to work-around this, but they will still fail when dealing with accented vowels. The same way eggdrop's output does for some when dealing with accented vowels (most times they use an elaborate string map to fix this condition, see for yourself). Myself, I've noticed that the (Ã / ascii 195) confuses the utf-8 string, and breaks it back to iso8859-1 encoding. This happens when trying to render french accented sentences in utf-8 on an unpatched bot.

Plus this finally puts to rest those wishing better support for utf-8 within the script. So I felt was worth mentioning ;P
s
shadrach
Halfop
Posts: 74
Joined: Fri Dec 14, 2007 6:29 pm

Post by shadrach »

Thank you, I've got it working.
Post Reply