| View previous topic :: View next topic |
| Author |
Message |
awyeah Revered One

Joined: 26 Apr 2004 Posts: 1580 Location: Switzerland
|
Posted: Tue Jul 31, 2007 4:19 am Post subject: |
|
|
You would have to use the [encoding] function and convertfrom, to your respective character encoding. _________________ ·awyeah·
==================================
Facebook: jawad@idsia.ch (Jay Dee)
PS: Guys, I don't accept script helps or requests personally anymore.
================================== |
|
| Back to top |
|
 |
djevrek Voice
Joined: 31 Jul 2007 Posts: 11
|
Posted: Tue Jul 31, 2007 5:49 am Post subject: |
|
|
| I don't know how to use that. Maybe the best way to resolve this is to ask autor to update script to works with UTF-8. |
|
| Back to top |
|
 |
speechles Revered One

Joined: 26 Aug 2006 Posts: 1398 Location: emerald triangle, california (coastal redwoods)
|
Posted: Tue Jul 31, 2007 6:21 pm Post subject: |
|
|
The problem is my shallow grasp of tcl syntax. I mean, don't get me wrong I've been programming for 25+ years (Yes, I'm old, started on the atari 400 w/ membrane keyboard when it debuted and I was 8 years old ), just not in tcl mind you. So encodings are new/strange to me, but programming/scripting is not. If someone could enlighten me on how to accomplish what you want, then by all means I could probably implement it.
I suspect it would involve some sort of list using "country:encoding" and then using this list to determine how the display should look.. I'm just not sure how to implement it, whenever I try I get this. | Quote: | | <sp33chy> Currently: can't read " 5?C1;8:0 !@18X0 X5 :>=B8=5=B0;=0 4@6020 :>X0 A5 =0;078 C XC3>8AB>G=>X 2@>?8 (=0 0;:0=A:>;C>AB@2C) 8 C A@54Z>X 2@>?8 (0=>=A:>X =878X8). # A0AB02C 5?C1;8:5 !@18X5 AC 8 425 0CB>=>:@0X8=5 >X2>48=0 8 >A>2> 8 5B>E8X0. 5?C1;8:0 !@18X0 X5 453 =0@>40 8 A28E 4@C38E 3@0R0=0 :>X8 C Z>X 6825, 70A=>20=0 =0 45X ?@82@548, ?>HB>20ZC YC4A:8E ?@020 8 2;04028=8 ?@020. !@18X0 A": no such variable |
I've had similar problems with the translate function, and tried to clumsily add convertto to it to solve some language encoding problems, but it's far from foolproofed.. heh
So for the short answer, I just need someone more advanced in the tcl department to throw me a bone of sorts, as this old dog needs a new trick.  |
|
| Back to top |
|
 |
speechles Revered One

Joined: 26 Aug 2006 Posts: 1398 Location: emerald triangle, california (coastal redwoods)
|
Posted: Wed Aug 01, 2007 2:46 am Post subject: Re: Cyrillic wikipedias |
|
|
| djevrek wrote: | | Code: | <djevrek> !w .sr Srbija
<Grgo> !@18X0 | 5?C1;8:0 !@18X0 X5 :>=B8=5=B0;=0 4@6020 :>X0 A5 =0;078 C XC3>8AB>G=>X 2@>?8 (=0 0;:0=A:>;C>AB@2C) 8 C A@54Z>X 2@>?8 (0=>=A:>X =878X8). # A0AB02C 5?C1;8:5 !@18X5 AC 8 425 0CB>=>:@0X8=5 >X2>48=0 8 >A>2> 8 5B>E8X0. 5?C1;8:0 !@18X0 X5 453 =0@>40 8 A28E 4@C38E 3@0R0=0 :>X8 C Z>X 6825, 70A=>20=0 =0 45X ?@82@548, ?>HB>20ZC YC4A:8E ?@020 8 2;04028=8 ?@020. !@18X0 A @ http://sr.wikipedia.org/wiki/Srbija |
|
| Quote: | <speechles> !w .sr Srbija
<sp33chy> Србија | Република Србија је континентална држава која се налази у југоисточној Европи (на Балканском полуострву) и у средњој Европи (Панонској низији). У саставу Републике Србије су и две аутономне покрајине Војводина и Косово и Метохија. Република Србија је демократска држава српског народа и свих других грађана који у њој живе, заснована на демократским начелима, тржишн @ http://sr.wikipedia.org/wiki/Srbija
<speechles> !w .sr Srbija#toc
<sp33chy> Србија | ToC: Географија; Историја; Територијална организација; Градови; Демографија; Народи и националне мањине; Језик; Вероисповест; Државни симболи; Политика; Правосуђе; Права грађана; Економија; Туризам; Саобраћај; Култура; Ликовне уметности; Средњи век; Модерно доба; њижевност; Музика; Класична музика; Позориште и филм; Светска културна баштина УНЕСКО-а у Србији; Фестивали; Развој науке и високог школства;
<sp33chy> Образовање; Празници; Види још; Галерија слика; Референце; Спољашње везе; Влада; Остало @ http://sr.wikipedia.org/wiki/Srbija#toc
<speechles> !w .sr Srbija#Географија
<sp33chy> Србија | Географија Србија се налази на Балкану - региону југоисточне Европе (око 80% територије) и у Панонској низији - региону средње Европе (око 20% територије). Но, географски, а и климатски, једним делом се убраја и у медитеранске земље. Укупна дужина граница са околним земљама износи 2.027 km. Дужина граница по државама суседима износи: Албанија 115 km, Босна и Херце @
<sp33chy> http://sr.wikipedia.org/wiki/Srbija#.D0.93.D0.B5.D0.BE.D0.B3.D1.80.D0.B0.D1.84.D0.B8.D1.98.D0.B0 [1 Redirect(s)] |
I think i've found an easy way to remedy this if that looks correct to you. cp1251 = Serbian language encoding.
| Code: | | set html [encoding convertto "cp1251" $html] |
If I hardcode that just after loading the third and final wikipage, the 'destination' page (it's arrived at after traversing), it will work like above. I don't use recursion as all, I use prediction. At most 3 pages load in sequence with each !wiki command, at the least 2 will. So if this is indeed correct and looks Serbian. I can start making a list of "country:encodings" and get this started, hopefully... |
|
| Back to top |
|
 |
djevrek Voice
Joined: 31 Jul 2007 Posts: 11
|
Posted: Wed Aug 01, 2007 4:58 am Post subject: |
|
|
| No, this doesn't look good. This is not proper Serbian language. Try to compare it with online wikipedia page on the links above. Proper encoding would be with UTF-8. Can you try it and put it here please? If it's ok with UTF-8 can you tell me what did you change so i can change it too, or can you tell me when we can expect new version of script? |
|
| Back to top |
|
 |
speechles Revered One

Joined: 26 Aug 2006 Posts: 1398 Location: emerald triangle, california (coastal redwoods)
|
Posted: Wed Aug 01, 2007 5:41 am Post subject: |
|
|
| djevrek wrote: | | No, this doesn't look good. This is not proper Serbian language. Try to compare it with online wikipedia page on the links above. Proper encoding would be with UTF-8. Can you try it and put it here please? If it's ok with UTF-8 can you tell me what did you change so i can change it too, or can you tell me when we can expect new version of script? |
Put what here? I'm telling you, if I choose UTF-8 it appears exactly the same as using standard eggdrop unicode, no difference at all. Don't know why either, it just does. So the trick I used above with convertto "cp1251" is working, it just doesn't look right on my American English mIRC 6.12 client (which is what I pasted). But would've looked right to any Serbian in channel that saw it, you see. So give me some time to make a list of "wikipedia country:country encoding". It will be a big list. Then I'll either use a giant case statement or a list with a foreach, haven't decided yet. But it won't be soon (unless soon means a week; yes, it may take that long), as this list takes time to build. Rome wasn't built in a day, and this script is large in scope, and complicated, and best of all.. FREE.
| Quote: | Currently: unknown encoding "iso-8859-5"
Currently: while executing
Currently: "encoding convertto "iso-8859-5" $html" | Btw, this why I chose "cp1251" to represent Serbian even tho it's not 100% correct, it's the best possible encoding for eggdrop. |
|
| Back to top |
|
 |
djevrek Voice
Joined: 31 Jul 2007 Posts: 11
|
Posted: Wed Aug 01, 2007 7:16 am Post subject: |
|
|
| I think that the problem is with your mIRC program, can you please try xchat or some other client that support UTF-8 completely. Or ... try to update mIRC to newer version (6.21 i think). That one support utf-8 for sure. |
|
| Back to top |
|
 |
speechles Revered One

Joined: 26 Aug 2006 Posts: 1398 Location: emerald triangle, california (coastal redwoods)
|
Posted: Wed Aug 01, 2007 7:29 am Post subject: |
|
|
*yawn*
| Code: | | set html [encoding convertto "utf-8" $html] |
Add that in around line 2668, then save and tell me if that works. If so, there ya go, enjoy. If not, told ya so. | Code: | regsub -all " " $html " " html
regsub -all ";;>" $html "" html
}
set html [encoding convertto "utf-8" $html]
set match "" |
That area should look like this if you did it right. |
|
| Back to top |
|
 |
djevrek Voice
Joined: 31 Jul 2007 Posts: 11
|
Posted: Wed Aug 01, 2007 8:12 am Post subject: |
|
|
I did that, but nothing much happends. I just got same old weird characters, not what i want to see. OK, for now i will wait for you to find out some other way to fix this.
P.S. I really don't know anything about TCL, but maybe something from here (http://www.google.com/codesearch?hl=en&lr=&q="set+html"+encoding&btnG=Search) can help you with this problem. |
|
| Back to top |
|
 |
speechles Revered One

Joined: 26 Aug 2006 Posts: 1398 Location: emerald triangle, california (coastal redwoods)
|
Posted: Wed Aug 01, 2007 3:24 pm Post subject: |
|
|
| djevrek wrote: | I did that, but nothing much happends. I just got same old weird characters, not what i want to see. OK, for now i will wait for you to find out some other way to fix this.
P.S. I really don't know anything about TCL, but maybe something from here (http://www.google.com/codesearch?hl=en&lr=&q="set+html"+encoding&btnG=Search) can help you with this problem. |
Okay, let's go over this, #2 results specificially. | Quote: | share/dotlrn0/packages/acs-tcl/tcl/html-email-procs.tcl - 13 identical
41: # convert text to charset
set encoding [ns_encodingforcharset $charset]
if {[lsearch [encoding names] $encoding] != -1} {
set html_body [encoding convertto $encoding $html_body]
set text_body [encoding convertto $encoding $text_body]
} else {
|
This has potential, but.. like i said, i need to make a list because.. using the bot wih utf-8 and expecting multi-lingual greatness is broke or something. So...?
Edit: Update.. Since there isn't any way to get the page encoding from the page since it tells the bot it is utf-8, and as explained above about utf-8, with the bot it does nothing...Dunno, is eggdrop buggy? Am I missing something? Since I can't answer those questions, forcing encodings is what I need to do. So to do that, I need to read this: http://meta.wikimedia.org/wiki/List_of_Wikipedias
What that means is 253 Countries presently are needing an entry onto the big ol' "wiki country:encoding" list.. Wow! That's going to be tediously tedious.. Does anyone have such a list already? Or conversely, know an easier way to go about this? Is the big ol' list the only way to do it? |
|
| Back to top |
|
 |
c0nv1ct Voice
Joined: 17 May 2007 Posts: 5
|
Posted: Sat Aug 04, 2007 9:19 pm Post subject: |
|
|
Thanks for the wikimedia addition! #sabayon on freenode appreciates your work
Only problem i've noticed is the weather parsing for some cities. Here's an example:
| Code: |
<c0nv1ct> !google weather eindhoven
<risponditore> Weather for Eindhoven, Netherlands: 63°F, Wind: SE at 4 mph, Humidity: 82%, <div style="padding:5px;float:left" align=center>Sun <img style="border:1px solid #bbc;margin-bottom:2px" src="/images/weather/mostly_sunny.gif" alt="Mostly Sunny" title="Mostly Sunny" width=40 height=40 border=0> <nobr>84°F | 66°F</nobr>
|
Yet Amsterdam is fine:
| Code: |
<c0nv1ct> !google weather amsterdam
<risponditore> Weather for Amsterdam, Netherlands: 63°F, Clear, Wind: SE at 9 mph, Humidity: 88%
|
|
|
| Back to top |
|
 |
speechles Revered One

Joined: 26 Aug 2006 Posts: 1398 Location: emerald triangle, california (coastal redwoods)
|
Posted: Sat Aug 04, 2007 10:10 pm Post subject: |
|
|
| c0nv1ct wrote: | Thanks for the wikimedia addition! #sabayon on freenode appreciates your work
Only problem i've noticed is the weather parsing for some cities. |
That one is pretty easy to fix, but requires a kludge rather than a real fix, as the method to detect weather results is a bit clumsy. | Code: | # weather!
} elseif {[string match "*/images/weather/*" $html] == 1} {
regexp -- {<p.*?class=e>.*?<td><div.*?>(.+?)</div>.*?<td><div.*?>(.+?)<.*?>(.+?)<.*?>(.+?)<.*?>(.+?)</div>.*?</table>} $html - w1 w2 w3 w4 w5
regsub -- {<p.*?class=e>(.*?)</table} $html {} html
if {[string match "*<*" $w5]} {
set w5 ""
} else {
set w5 ", ${w5}"
}
set desc "$w1\: $w2, $w3, $w4$w5"
regsub -all -- {°} $desc {°} desc
set link ""
regsub -all -- {weather} $input {} input
##NoWrap################################################################################################################################################ | Replace the entire weather section with this to fix it. Disregard the #NoWrap# it's just to defeat word wrap.
The problem was the clumsy weather parser expected to "always" get 5 results; Name, Temp, Condition, Wind, Humidity. All worked fine until it got to one which only held 4 (some weather stations only report 4) causing html spill-over. All I've done is account for this with the kludge. If 5th result contains <, must be html tags in it, so it disregards the 5th result now. Simple & Effective.
This fix will be included shortly, as soon as I finish the wikipedia encodings and get that squared away. I'll finally be able to show you a fully working multi-language wikipedia script, that yes, natively supports Serbian among others.  |
|
| Back to top |
|
 |
Elfriede Halfop
Joined: 07 Aug 2007 Posts: 67
|
Posted: Tue Aug 07, 2007 5:21 am Post subject: |
|
|
Script works perfect for me ! Thx .. except one function:
seperator == '\n'
I've changed this in the code.. i've tried " \n " : "\n" and ""
No matter what i do.. when i make eg !google mirc i'll get the results withe the |
Anybody has an idea, what i'm making wrong ?
edit:
I'm using: v1.9.6 - July 27th, 2oo7 |
|
| Back to top |
|
 |
speechles Revered One

Joined: 26 Aug 2006 Posts: 1398 Location: emerald triangle, california (coastal redwoods)
|
Posted: Tue Aug 07, 2007 3:52 pm Post subject: |
|
|
| Elfriede wrote: | Anybody has an idea, what i'm making wrong ?
edit:
I'm using: v1.9.6 - July 27th, 2oo7 |
| Code: | # what to use to seperate results, set this to "\n" and it will output each result
# on a line of its own. the seperator will be removed from the end of the last result.
variable seperator " | " |
Change variable seperator to "\n" and issue a .rehash on your bots partyline. Afterwards it should look similar to how it is below.
sidenote: Still working on that big ol wikipedia list (the country:encodings), and that will soon be finished, it's just tedious doing it all by hand. I should have something to show by this coming weekend and it should make Serbians happy.  |
|
| Back to top |
|
 |
Elfriede Halfop
Joined: 07 Aug 2007 Posts: 67
|
Posted: Tue Aug 07, 2007 4:05 pm Post subject: |
|
|
| Code: |
# what to use to seperate results, set this to "\n" and it will output each result
# on a line of its own. the seperator will be removed from the end of the last result.
variable seperator "\n"
|
Thats what my changes look like, but the results:
and yes, i've rehashed
Anything else i need to change or have to install ? Are there country differences ?
edit:
there must be something specific wrong on my machine.. a friend of mine has it too .. and it works as it should ..
The question is.. what causes this "error" ^^ |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|