This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

UNOFFICIAL incith-google 2.1x (Nov30,2o12)

Support & discussion of released scripts, and announcements of new releases.
Post Reply
User avatar
awyeah
Revered One
Posts: 1580
Joined: Mon Apr 26, 2004 2:37 am
Location: Switzerland
Contact:

Post by awyeah »

You would have to use the [encoding] function and convertfrom, to your respective character encoding.
·­awyeah·

==================================
Facebook: jawad@idsia.ch (Jay Dee)
PS: Guys, I don't accept script helps or requests personally anymore.
==================================
d
djevrek
Voice
Posts: 11
Joined: Tue Jul 31, 2007 4:05 am

Post by djevrek »

I don't know how to use that. Maybe the best way to resolve this is to ask autor to update script to works with UTF-8.
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

The problem is my shallow grasp of tcl syntax. I mean, don't get me wrong I've been programming for 25+ years (Yes, I'm old, started on the atari 400 w/ membrane keyboard when it debuted and I was 8 years old :P), just not in tcl mind you. So encodings are new/strange to me, but programming/scripting is not. If someone could enlighten me on how to accomplish what you want, then by all means I could probably implement it.

I suspect it would involve some sort of list using "country:encoding" and then using this list to determine how the display should look.. I'm just not sure how to implement it, whenever I try I get this.
<sp33chy> Currently: can't read " 5?C1;8:0 !@18X0 X5 :>=B8=5=B0;=0 4@6020 :>X0 A5 =0;078 C XC3>8AB>G=>X 2@>?8 (=0 0;:0=A:>;C>AB@2C) 8 C A@54Z>X 2@>?8 (0=>=A:>X =878X8). # A0AB02C 5?C1;8:5 !@18X5 AC 8 425 0CB>=>:@0X8=5 >X2>48=0 8 >A>2> 8 5B>E8X0. 5?C1;8:0 !@18X0 X5 453 =0@>40 8 A28E 4@C38E 3@0R0=0 :>X8 C Z>X 6825, 70A=>20=0 =0 45X ?@82@548, ?>HB>20ZC YC4A:8E ?@020 8 2;04028=8 ?@020. !@18X0 A": no such variable
I've had similar problems with the translate function, and tried to clumsily add convertto to it to solve some language encoding problems, but it's far from foolproofed.. heh

So for the short answer, I just need someone more advanced in the tcl department to throw me a bone of sorts, as this old dog needs a new trick. :)
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Re: Cyrillic wikipedias

Post by speechles »

djevrek wrote:

Code: Select all

<djevrek> !w .sr Srbija
<Grgo> !@18X0 |  5?C1;8:0 !@18X0 X5 :>=B8=5=B0;=0 4@6020 :>X0 A5 =0;078 C XC3>8AB>G=>X 2@>?8 (=0 0;:0=A:>;C>AB@2C) 8 C A@54Z>X 2@>?8 (0=>=A:>X =878X8). # A0AB02C 5?C1;8:5 !@18X5 AC 8 425 0CB>=>:@0X8=5 >X2>48=0 8 >A>2> 8 5B>E8X0. 5?C1;8:0 !@18X0 X5 453 =0@>40 8 A28E 4@C38E 3@0R0=0 :>X8 C Z>X 6825, 70A=>20=0 =0 45X ?@82@548, ?>HB>20ZC YC4A:8E ?@020 8 2;04028=8 ?@020. !@18X0 A @ http://sr.wikipedia.org/wiki/Srbija
<speechles> !w .sr Srbija
<sp33chy> Ñðáè¼à | Ðåïóáëèêà Ñðáè¼à ¼å êîíòèíåíòàëíà äðæàâà êî¼à ñå íàëàçè ó ¼óãîèñòî÷íî¼ Åâðîïè (íà Áàëêàíñêîì ïîëóîñòðâó) è ó ñðåäœî¼ Åâðîïè (Ïàíîíñêî¼ íèçè¼è). Ó ñàñòàâó Ðåïóáëèêå Ñðáè¼å ñó è äâå àóòîíîìíå ïîêðà¼èíå Âî¼âîäèíà è Êîñîâî è Ìåòîõè¼à. Ðåïóáëèêà Ñðáè¼à ¼å äåìîêðàòñêà äðæàâà ñðïñêîã íàðîäà è ñâèõ äðóãèõ ãðààíà êî¼è ó œî¼ æèâå, çàñíîâàíà íà äåìîêðàòñêèì íà÷åëèìà, òðæèøí @ http://sr.wikipedia.org/wiki/Srbija

<speechles> !w .sr Srbija#toc
<sp33chy> Ñðáè¼à | ToC: Ãåîãðàôè¼à; Èñòîðè¼à; Òåðèòîðè¼àëíà îðãàíèçàöè¼à; Ãðàäîâè; Äåìîãðàôè¼à; Íàðîäè è íàöèîíàëíå ìàœèíå; £åçèê; Âåðîèñïîâåñò; Äðæàâíè ñèìáîëè; Ïîëèòèêà; Ïðàâîñóå; Ïðàâà ãðààíà; Åêîíîìè¼à; Òóðèçàì; Ñàîáðàžà¼; Êóëòóðà; Ëèêîâíå óìåòíîñòè; Ñðåäœè âåê; Ìîäåðíî äîáà; œèæåâíîñò; Ìóçèêà; Êëàñè÷íà ìóçèêà; Ïîçîðèøòå è ôèëì; Ñâåòñêà êóëòóðíà áàøòèíà ÓÍÅÑÊÎ-à ó Ñðáè¼è; Ôåñòèâàëè; Ðàçâî¼ íàóêå è âèñîêîã øêîëñòâà;
<sp33chy> Îáðàçîâàœå; Ïðàçíèöè; Âèäè ¼îø; Ãàëåðè¼à ñëèêà; Ðåôåðåíöå; Ñïîšàøœå âåçå; Âëàäà; Îñòàëî @ http://sr.wikipedia.org/wiki/Srbija#toc

<speechles> !w .sr Srbija#Ãåîãðàôè¼à
<sp33chy> Ñðáè¼à | Ãåîãðàôè¼à Ñðáè¼à ñå íàëàçè íà Áàëêàíó - ðåãèîíó ¼óãîèñòî÷íå Åâðîïå (îêî 80% òåðèòîðè¼å) è ó Ïàíîíñêî¼ íèçè¼è - ðåãèîíó ñðåäœå Åâðîïå (îêî 20% òåðèòîðè¼å). Íî, ãåîãðàôñêè, à è êëèìàòñêè, ¼åäíèì äåëîì ñå óáðà¼à è ó ìåäèòåðàíñêå çåìšå. Óêóïíà äóæèíà ãðàíèöà ñà îêîëíèì çåìšàìà èçíîñè 2.027 km. Äóæèíà ãðàíèöà ïî äðæàâàìà ñóñåäèìà èçíîñè: Àëáàíè¼à 115 km, Áîñíà è Õåðöå @
<sp33chy> http://sr.wikipedia.org/wiki/Srbija#.D0 ... 1.98.D0.B0 [1 Redirect(s)]
I think i've found an easy way to remedy this if that looks correct to you. cp1251 = Serbian language encoding.

Code: Select all

set html [encoding convertto "cp1251" $html]
If I hardcode that just after loading the third and final wikipage, the 'destination' page (it's arrived at after traversing), it will work like above. I don't use recursion as all, I use prediction. At most 3 pages load in sequence with each !wiki command, at the least 2 will. So if this is indeed correct and looks Serbian. I can start making a list of "country:encodings" and get this started, hopefully...
d
djevrek
Voice
Posts: 11
Joined: Tue Jul 31, 2007 4:05 am

Post by djevrek »

No, this doesn't look good. This is not proper Serbian language. Try to compare it with online wikipedia page on the links above. Proper encoding would be with UTF-8. Can you try it and put it here please? If it's ok with UTF-8 can you tell me what did you change so i can change it too, or can you tell me when we can expect new version of script?
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

djevrek wrote:No, this doesn't look good. This is not proper Serbian language. Try to compare it with online wikipedia page on the links above. Proper encoding would be with UTF-8. Can you try it and put it here please? If it's ok with UTF-8 can you tell me what did you change so i can change it too, or can you tell me when we can expect new version of script?
Put what here? I'm telling you, if I choose UTF-8 it appears exactly the same as using standard eggdrop unicode, no difference at all. Don't know why either, it just does. So the trick I used above with convertto "cp1251" is working, it just doesn't look right on my American English mIRC 6.12 client (which is what I pasted). But would've looked right to any Serbian in channel that saw it, you see. So give me some time to make a list of "wikipedia country:country encoding". It will be a big list. Then I'll either use a giant case statement or a list with a foreach, haven't decided yet. But it won't be soon (unless soon means a week; yes, it may take that long), as this list takes time to build. Rome wasn't built in a day, and this script is large in scope, and complicated, and best of all.. FREE.
Currently: unknown encoding "iso-8859-5"
Currently: while executing
Currently: "encoding convertto "iso-8859-5" $html"
Btw, this why I chose "cp1251" to represent Serbian even tho it's not 100% correct, it's the best possible encoding for eggdrop.
d
djevrek
Voice
Posts: 11
Joined: Tue Jul 31, 2007 4:05 am

Post by djevrek »

I think that the problem is with your mIRC program, can you please try xchat or some other client that support UTF-8 completely. Or ... try to update mIRC to newer version (6.21 i think). That one support utf-8 for sure.
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

*yawn*

Code: Select all

set html [encoding convertto "utf-8" $html]
Add that in around line 2668, then save and tell me if that works. If so, there ya go, enjoy. If not, told ya so.

Code: Select all

        regsub -all " " $html " " html
        regsub -all ";;>" $html "" html
      }

      set html [encoding convertto "utf-8" $html]

      set match ""
That area should look like this if you did it right.
d
djevrek
Voice
Posts: 11
Joined: Tue Jul 31, 2007 4:05 am

Post by djevrek »

I did that, but nothing much happends. I just got same old weird characters, not what i want to see. OK, for now i will wait for you to find out some other way to fix this.

P.S. I really don't know anything about TCL, but maybe something from here (http://www.google.com/codesearch?hl=en& ... tnG=Search) can help you with this problem.
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

djevrek wrote:I did that, but nothing much happends. I just got same old weird characters, not what i want to see. OK, for now i will wait for you to find out some other way to fix this.

P.S. I really don't know anything about TCL, but maybe something from here (http://www.google.com/codesearch?hl=en& ... tnG=Search) can help you with this problem.
Okay, let's go over this, #2 results specificially.
share/dotlrn0/packages/acs-tcl/tcl/html-email-procs.tcl - 13 identical

41: # convert text to charset
set encoding [ns_encodingforcharset $charset]
if {[lsearch [encoding names] $encoding] != -1} {
set html_body [encoding convertto $encoding $html_body]
set text_body [encoding convertto $encoding $text_body]
} else {
This has potential, but.. like i said, i need to make a list because.. using the bot wih utf-8 and expecting multi-lingual greatness is broke or something. So...?

Edit: Update.. Since there isn't any way to get the page encoding from the page since it tells the bot it is utf-8, and as explained above about utf-8, with the bot it does nothing...Dunno, is eggdrop buggy? Am I missing something? Since I can't answer those questions, forcing encodings is what I need to do. So to do that, I need to read this: http://meta.wikimedia.org/wiki/List_of_Wikipedias

What that means is 253 Countries presently are needing an entry onto the big ol' "wiki country:encoding" list.. Wow! That's going to be tediously tedious.. Does anyone have such a list already? Or conversely, know an easier way to go about this? Is the big ol' list the only way to do it?
c
c0nv1ct
Voice
Posts: 5
Joined: Thu May 17, 2007 2:34 pm

Post by c0nv1ct »

Thanks for the wikimedia addition! #sabayon on freenode appreciates your work :D

Only problem i've noticed is the weather parsing for some cities. Here's an example:

Code: Select all

<c0nv1ct> !google weather eindhoven
<risponditore> Weather for Eindhoven, Netherlands: 63°F, Wind: SE at 4 mph, Humidity: 82%, <div style="padding:5px;float:left" align=center>Sun <img style="border:1px solid #bbc;margin-bottom:2px" src="/images/weather/mostly_sunny.gif" alt="Mostly Sunny" title="Mostly Sunny" width=40 height=40 border=0> <nobr>84°F | 66°F</nobr>
Yet Amsterdam is fine:

Code: Select all

<c0nv1ct> !google weather amsterdam
<risponditore> Weather for Amsterdam, Netherlands: 63°F, Clear, Wind: SE at 9 mph, Humidity: 88%
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

c0nv1ct wrote:Thanks for the wikimedia addition! #sabayon on freenode appreciates your work :D

Only problem i've noticed is the weather parsing for some cities.
That one is pretty easy to fix, but requires a kludge rather than a real fix, as the method to detect weather results is a bit clumsy.

Code: Select all

        # weather!
        } elseif {[string match "*/images/weather/*" $html] == 1} {
          regexp -- {<p.*?class=e>.*?<td><div.*?>(.+?)</div>.*?<td><div.*?>(.+?)<.*?>(.+?)<.*?>(.+?)<.*?>(.+?)</div>.*?</table>} $html - w1 w2 w3 w4 w5
          regsub -- {<p.*?class=e>(.*?)</table} $html {} html
          if {[string match "*<*" $w5]} {
            set w5 ""
          } else {
            set w5 ", ${w5}"
          }
          set desc "$w1\: $w2, $w3, $w4$w5"
          regsub -all -- {°} $desc {°} desc
          set link ""
          regsub -all -- {weather} $input {} input
##NoWrap################################################################################################################################################
Replace the entire weather section with this to fix it. Disregard the #NoWrap# it's just to defeat word wrap.

The problem was the clumsy weather parser expected to "always" get 5 results; Name, Temp, Condition, Wind, Humidity. All worked fine until it got to one which only held 4 (some weather stations only report 4) causing html spill-over. All I've done is account for this with the kludge. If 5th result contains <, must be html tags in it, so it disregards the 5th result now. Simple & Effective.

This fix will be included shortly, as soon as I finish the wikipedia encodings and get that squared away. I'll finally be able to show you a fully working multi-language wikipedia script, that yes, natively supports Serbian among others. :P
E
Elfriede
Halfop
Posts: 67
Joined: Tue Aug 07, 2007 4:21 am

Post by Elfriede »

Script works perfect for me ! Thx .. except one function:

seperator == '\n'

I've changed this in the code.. i've tried " \n " : "\n" and ""

No matter what i do.. when i make eg !google mirc i'll get the results withe the |

Anybody has an idea, what i'm making wrong ? :)

edit:
I'm using: v1.9.6 - July 27th, 2oo7
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

Elfriede wrote:Anybody has an idea, what i'm making wrong ? :)

edit:
I'm using: v1.9.6 - July 27th, 2oo7

Code: Select all

    # what to use to seperate results, set this to "\n" and it will output each result
    # on a line of its own. the seperator will be removed from the end of the last result.
    variable seperator " | "
Change variable seperator to "\n" and issue a .rehash on your bots partyline. Afterwards it should look similar to how it is below.
<speechles> !g mirc
<sp33chy> 12,300,000 Results
<sp33chy> mIRC - An Internet Relay Chat program @ http://www.mirc.com/
<sp33chy> Download mIRC or the mIRC FAQ. @ http://www.mirc.com/get.html
<sp33chy> #mIRC-DALnet Resource Center @ http://www.mirc.org/
<sp33chy> - mIRC Scripting Network - mIRC Scri @ http://www.mirc.net/
sidenote: Still working on that big ol wikipedia list (the country:encodings), and that will soon be finished, it's just tedious doing it all by hand. I should have something to show by this coming weekend and it should make Serbians happy. ;)
E
Elfriede
Halfop
Posts: 67
Joined: Tue Aug 07, 2007 4:21 am

Post by Elfriede »

Code: Select all

# what to use to seperate results, set this to "\n" and it will output each result
    #  on a line of its own. the seperator will be removed from the end of the last result.
    variable seperator "\n"
Thats what my changes look like, but the results:
|22:04:02| <~User> !g mirc
|22:04:02| <&Fantc> 12,400,000 Results | mIRC - An Internet Relay Chat program @ http://www.mirc.com/ | Download mIRC or the mIRC FAQ. @ http://www.mirc.com/get.html | #mIRC-DALnet Resource Center @ http://www.mirc.org/ | - mIRC Scripting Network - mIRC Scri @ http://www.mirc.net/
and yes, i've rehashed :)

Anything else i need to change or have to install ? Are there country differences ?

edit:

there must be something specific wrong on my machine.. a friend of mine has it too .. and it works as it should ..

The question is.. what causes this "error" ^^
Post Reply