egghelp.org community Forum Index
[ egghelp.org home | forum home ]
egghelp.org community
Discussion of eggdrop bots, shell accounts and tcl scripts.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

UNOFFICIAL incith-google 2.1x (Nov30,2o12)
Goto page Previous  1, 2, 3, 4 ... 56, 57, 58  Next
 
Post new topic   Reply to topic    egghelp.org community Forum Index -> Script Support & Releases
View previous topic :: View next topic  
Author Message
awyeah
Revered One


Joined: 26 Apr 2004
Posts: 1580
Location: Switzerland

PostPosted: Tue Jul 31, 2007 4:19 am    Post subject: Reply with quote

You would have to use the [encoding] function and convertfrom, to your respective character encoding.
_________________
·­awyeah·

==================================
Facebook: jawad@idsia.ch (Jay Dee)
PS: Guys, I don't accept script helps or requests personally anymore.
==================================
Back to top
View user's profile Send private message Send e-mail Visit poster's website Yahoo Messenger MSN Messenger
djevrek
Voice


Joined: 31 Jul 2007
Posts: 11

PostPosted: Tue Jul 31, 2007 5:49 am    Post subject: Reply with quote

I don't know how to use that. Maybe the best way to resolve this is to ask autor to update script to works with UTF-8.
Back to top
View user's profile Send private message
speechles
Revered One


Joined: 26 Aug 2006
Posts: 1398
Location: emerald triangle, california (coastal redwoods)

PostPosted: Tue Jul 31, 2007 6:21 pm    Post subject: Reply with quote

The problem is my shallow grasp of tcl syntax. I mean, don't get me wrong I've been programming for 25+ years (Yes, I'm old, started on the atari 400 w/ membrane keyboard when it debuted and I was 8 years old Razz), just not in tcl mind you. So encodings are new/strange to me, but programming/scripting is not. If someone could enlighten me on how to accomplish what you want, then by all means I could probably implement it.

I suspect it would involve some sort of list using "country:encoding" and then using this list to determine how the display should look.. I'm just not sure how to implement it, whenever I try I get this.
Quote:
<sp33chy> Currently: can't read " 5?C1;8:0 !@18X0 X5 :>=B8=5=B0;=0 4@6020 :>X0 A5 =0;078 C XC3>8AB>G=>X 2@>?8 (=0 0;:0=A:>;C>AB@2C) 8 C A@54Z>X 2@>?8 (0=>=A:>X =878X8). # A0AB02C 5?C1;8:5 !@18X5 AC 8 425 0CB>=>:@0X8=5 >X2>48=0 8 >A>2> 8 5B>E8X0. 5?C1;8:0 !@18X0 X5 453 =0@>40 8 A28E 4@C38E 3@0R0=0 :>X8 C Z>X 6825, 70A=>20=0 =0 45X ?@82@548, ?>HB>20ZC YC4A:8E ?@020 8 2;04028=8 ?@020. !@18X0 A": no such variable

I've had similar problems with the translate function, and tried to clumsily add convertto to it to solve some language encoding problems, but it's far from foolproofed.. heh

So for the short answer, I just need someone more advanced in the tcl department to throw me a bone of sorts, as this old dog needs a new trick. Smile
Back to top
View user's profile Send private message
speechles
Revered One


Joined: 26 Aug 2006
Posts: 1398
Location: emerald triangle, california (coastal redwoods)

PostPosted: Wed Aug 01, 2007 2:46 am    Post subject: Re: Cyrillic wikipedias Reply with quote

djevrek wrote:
Code:
<djevrek> !w .sr Srbija
<Grgo> !@18X0 |  5?C1;8:0 !@18X0 X5 :>=B8=5=B0;=0 4@6020 :>X0 A5 =0;078 C XC3>8AB>G=>X 2@>?8 (=0 0;:0=A:>;C>AB@2C) 8 C A@54Z>X 2@>?8 (0=>=A:>X =878X8). # A0AB02C 5?C1;8:5 !@18X5 AC 8 425 0CB>=>:@0X8=5 >X2>48=0 8 >A>2> 8 5B>E8X0. 5?C1;8:0 !@18X0 X5 453 =0@>40 8 A28E 4@C38E 3@0R0=0 :>X8 C Z>X 6825, 70A=>20=0 =0 45X ?@82@548, ?>HB>20ZC YC4A:8E ?@020 8 2;04028=8 ?@020. !@18X0 A @ http://sr.wikipedia.org/wiki/Srbija

Quote:
<speechles> !w .sr Srbija
<sp33chy> Србија | Република Србија је континентална држава која се налази у југоисточној Европи (на Балканском полуострву) и у средњој Европи (Панонској низији). У саставу Републике Србије су и две аутономне покрајине Војводина и Косово и Метохија. Република Србија је демократска држава српског народа и свих других грађана који у њој живе, заснована на демократским начелима, тржишн @ http://sr.wikipedia.org/wiki/Srbija

<speechles> !w .sr Srbija#toc
<sp33chy> Србија | ToC: Географија; Историја; Територијална организација; Градови; Демографија; Народи и националне мањине; Језик; Вероисповест; Државни симболи; Политика; Правосуђе; Права грађана; Економија; Туризам; Саобраћај; Култура; Ликовне уметности; Средњи век; Модерно доба; њижевност; Музика; Класична музика; Позориште и филм; Светска културна баштина УНЕСКО-а у Србији; Фестивали; Развој науке и високог школства;
<sp33chy> Образовање; Празници; Види још; Галерија слика; Референце; Спољашње везе; Влада; Остало @ http://sr.wikipedia.org/wiki/Srbija#toc

<speechles> !w .sr Srbija#Географија
<sp33chy> Србија | Географија Србија се налази на Балкану - региону југоисточне Европе (око 80% територије) и у Панонској низији - региону средње Европе (око 20% територије). Но, географски, а и климатски, једним делом се убраја и у медитеранске земље. Укупна дужина граница са околним земљама износи 2.027 km. Дужина граница по државама суседима износи: Албанија 115 km, Босна и Херце @
<sp33chy> http://sr.wikipedia.org/wiki/Srbija#.D0.93.D0.B5.D0.BE.D0.B3.D1.80.D0.B0.D1.84.D0.B8.D1.98.D0.B0 [1 Redirect(s)]

I think i've found an easy way to remedy this if that looks correct to you. cp1251 = Serbian language encoding.
Code:
set html [encoding convertto "cp1251" $html]

If I hardcode that just after loading the third and final wikipage, the 'destination' page (it's arrived at after traversing), it will work like above. I don't use recursion as all, I use prediction. At most 3 pages load in sequence with each !wiki command, at the least 2 will. So if this is indeed correct and looks Serbian. I can start making a list of "country:encodings" and get this started, hopefully...
Back to top
View user's profile Send private message
djevrek
Voice


Joined: 31 Jul 2007
Posts: 11

PostPosted: Wed Aug 01, 2007 4:58 am    Post subject: Reply with quote

No, this doesn't look good. This is not proper Serbian language. Try to compare it with online wikipedia page on the links above. Proper encoding would be with UTF-8. Can you try it and put it here please? If it's ok with UTF-8 can you tell me what did you change so i can change it too, or can you tell me when we can expect new version of script?
Back to top
View user's profile Send private message
speechles
Revered One


Joined: 26 Aug 2006
Posts: 1398
Location: emerald triangle, california (coastal redwoods)

PostPosted: Wed Aug 01, 2007 5:41 am    Post subject: Reply with quote

djevrek wrote:
No, this doesn't look good. This is not proper Serbian language. Try to compare it with online wikipedia page on the links above. Proper encoding would be with UTF-8. Can you try it and put it here please? If it's ok with UTF-8 can you tell me what did you change so i can change it too, or can you tell me when we can expect new version of script?

Put what here? I'm telling you, if I choose UTF-8 it appears exactly the same as using standard eggdrop unicode, no difference at all. Don't know why either, it just does. So the trick I used above with convertto "cp1251" is working, it just doesn't look right on my American English mIRC 6.12 client (which is what I pasted). But would've looked right to any Serbian in channel that saw it, you see. So give me some time to make a list of "wikipedia country:country encoding". It will be a big list. Then I'll either use a giant case statement or a list with a foreach, haven't decided yet. But it won't be soon (unless soon means a week; yes, it may take that long), as this list takes time to build. Rome wasn't built in a day, and this script is large in scope, and complicated, and best of all.. FREE.

Quote:
Currently: unknown encoding "iso-8859-5"
Currently: while executing
Currently: "encoding convertto "iso-8859-5" $html"
Btw, this why I chose "cp1251" to represent Serbian even tho it's not 100% correct, it's the best possible encoding for eggdrop.
Back to top
View user's profile Send private message
djevrek
Voice


Joined: 31 Jul 2007
Posts: 11

PostPosted: Wed Aug 01, 2007 7:16 am    Post subject: Reply with quote

I think that the problem is with your mIRC program, can you please try xchat or some other client that support UTF-8 completely. Or ... try to update mIRC to newer version (6.21 i think). That one support utf-8 for sure.
Back to top
View user's profile Send private message
speechles
Revered One


Joined: 26 Aug 2006
Posts: 1398
Location: emerald triangle, california (coastal redwoods)

PostPosted: Wed Aug 01, 2007 7:29 am    Post subject: Reply with quote

*yawn*
Code:
set html [encoding convertto "utf-8" $html]

Add that in around line 2668, then save and tell me if that works. If so, there ya go, enjoy. If not, told ya so.
Code:
        regsub -all " " $html " " html
        regsub -all ";;>" $html "" html
      }

      set html [encoding convertto "utf-8" $html]

      set match ""

That area should look like this if you did it right.
Back to top
View user's profile Send private message
djevrek
Voice


Joined: 31 Jul 2007
Posts: 11

PostPosted: Wed Aug 01, 2007 8:12 am    Post subject: Reply with quote

I did that, but nothing much happends. I just got same old weird characters, not what i want to see. OK, for now i will wait for you to find out some other way to fix this.

P.S. I really don't know anything about TCL, but maybe something from here (http://www.google.com/codesearch?hl=en&lr=&q="set+html"+encoding&btnG=Search) can help you with this problem.
Back to top
View user's profile Send private message
speechles
Revered One


Joined: 26 Aug 2006
Posts: 1398
Location: emerald triangle, california (coastal redwoods)

PostPosted: Wed Aug 01, 2007 3:24 pm    Post subject: Reply with quote

djevrek wrote:
I did that, but nothing much happends. I just got same old weird characters, not what i want to see. OK, for now i will wait for you to find out some other way to fix this.

P.S. I really don't know anything about TCL, but maybe something from here (http://www.google.com/codesearch?hl=en&lr=&q="set+html"+encoding&btnG=Search) can help you with this problem.

Okay, let's go over this, #2 results specificially.
Quote:
share/dotlrn0/packages/acs-tcl/tcl/html-email-procs.tcl - 13 identical

41: # convert text to charset
set encoding [ns_encodingforcharset $charset]
if {[lsearch [encoding names] $encoding] != -1} {
set html_body [encoding convertto $encoding $html_body]
set text_body [encoding convertto $encoding $text_body]
} else {

This has potential, but.. like i said, i need to make a list because.. using the bot wih utf-8 and expecting multi-lingual greatness is broke or something. So...?

Edit: Update.. Since there isn't any way to get the page encoding from the page since it tells the bot it is utf-8, and as explained above about utf-8, with the bot it does nothing...Dunno, is eggdrop buggy? Am I missing something? Since I can't answer those questions, forcing encodings is what I need to do. So to do that, I need to read this: http://meta.wikimedia.org/wiki/List_of_Wikipedias

What that means is 253 Countries presently are needing an entry onto the big ol' "wiki country:encoding" list.. Wow! That's going to be tediously tedious.. Does anyone have such a list already? Or conversely, know an easier way to go about this? Is the big ol' list the only way to do it?
Back to top
View user's profile Send private message
c0nv1ct
Voice


Joined: 17 May 2007
Posts: 5

PostPosted: Sat Aug 04, 2007 9:19 pm    Post subject: Reply with quote

Thanks for the wikimedia addition! #sabayon on freenode appreciates your work Very Happy

Only problem i've noticed is the weather parsing for some cities. Here's an example:

Code:

<c0nv1ct> !google weather eindhoven
<risponditore> Weather for Eindhoven, Netherlands: 63°F, Wind: SE at 4 mph, Humidity: 82%, <div style="padding:5px;float:left" align=center>Sun <img style="border:1px solid #bbc;margin-bottom:2px" src="/images/weather/mostly_sunny.gif" alt="Mostly Sunny" title="Mostly Sunny" width=40 height=40 border=0> <nobr>84°F | 66°F</nobr>


Yet Amsterdam is fine:

Code:

<c0nv1ct> !google weather amsterdam
<risponditore> Weather for Amsterdam, Netherlands: 63°F, Clear, Wind: SE at 9 mph, Humidity: 88%
Back to top
View user's profile Send private message
speechles
Revered One


Joined: 26 Aug 2006
Posts: 1398
Location: emerald triangle, california (coastal redwoods)

PostPosted: Sat Aug 04, 2007 10:10 pm    Post subject: Reply with quote

c0nv1ct wrote:
Thanks for the wikimedia addition! #sabayon on freenode appreciates your work Very Happy

Only problem i've noticed is the weather parsing for some cities.

That one is pretty easy to fix, but requires a kludge rather than a real fix, as the method to detect weather results is a bit clumsy.
Code:
        # weather!
        } elseif {[string match "*/images/weather/*" $html] == 1} {
          regexp -- {<p.*?class=e>.*?<td><div.*?>(.+?)</div>.*?<td><div.*?>(.+?)<.*?>(.+?)<.*?>(.+?)<.*?>(.+?)</div>.*?</table>} $html - w1 w2 w3 w4 w5
          regsub -- {<p.*?class=e>(.*?)</table} $html {} html
          if {[string match "*<*" $w5]} {
            set w5 ""
          } else {
            set w5 ", ${w5}"
          }
          set desc "$w1\: $w2, $w3, $w4$w5"
          regsub -all -- {&deg;} $desc {°} desc
          set link ""
          regsub -all -- {weather} $input {} input
##NoWrap################################################################################################################################################
Replace the entire weather section with this to fix it. Disregard the #NoWrap# it's just to defeat word wrap.

The problem was the clumsy weather parser expected to "always" get 5 results; Name, Temp, Condition, Wind, Humidity. All worked fine until it got to one which only held 4 (some weather stations only report 4) causing html spill-over. All I've done is account for this with the kludge. If 5th result contains <, must be html tags in it, so it disregards the 5th result now. Simple & Effective.

This fix will be included shortly, as soon as I finish the wikipedia encodings and get that squared away. I'll finally be able to show you a fully working multi-language wikipedia script, that yes, natively supports Serbian among others. Razz
Back to top
View user's profile Send private message
Elfriede
Halfop


Joined: 07 Aug 2007
Posts: 67

PostPosted: Tue Aug 07, 2007 5:21 am    Post subject: Reply with quote

Script works perfect for me ! Thx .. except one function:

seperator == '\n'

I've changed this in the code.. i've tried " \n " : "\n" and ""

No matter what i do.. when i make eg !google mirc i'll get the results withe the |

Anybody has an idea, what i'm making wrong ? Smile

edit:
I'm using: v1.9.6 - July 27th, 2oo7
Back to top
View user's profile Send private message
speechles
Revered One


Joined: 26 Aug 2006
Posts: 1398
Location: emerald triangle, california (coastal redwoods)

PostPosted: Tue Aug 07, 2007 3:52 pm    Post subject: Reply with quote

Elfriede wrote:
Anybody has an idea, what i'm making wrong ? Smile

edit:
I'm using: v1.9.6 - July 27th, 2oo7

Code:
    # what to use to seperate results, set this to "\n" and it will output each result
    # on a line of its own. the seperator will be removed from the end of the last result.
    variable seperator " | "

Change variable seperator to "\n" and issue a .rehash on your bots partyline. Afterwards it should look similar to how it is below.
Quote:
<speechles> !g mirc
<sp33chy> 12,300,000 Results
<sp33chy> mIRC - An Internet Relay Chat program @ http://www.mirc.com/
<sp33chy> Download mIRC or the mIRC FAQ. @ http://www.mirc.com/get.html
<sp33chy> #mIRC-DALnet Resource Center @ http://www.mirc.org/
<sp33chy> - mIRC Scripting Network - mIRC Scri @ http://www.mirc.net/

sidenote: Still working on that big ol wikipedia list (the country:encodings), and that will soon be finished, it's just tedious doing it all by hand. I should have something to show by this coming weekend and it should make Serbians happy. Wink
Back to top
View user's profile Send private message
Elfriede
Halfop


Joined: 07 Aug 2007
Posts: 67

PostPosted: Tue Aug 07, 2007 4:05 pm    Post subject: Reply with quote

Code:

# what to use to seperate results, set this to "\n" and it will output each result
    #  on a line of its own. the seperator will be removed from the end of the last result.
    variable seperator "\n"


Thats what my changes look like, but the results:

Quote:

|22:04:02| <~User> !g mirc
|22:04:02| <&Fantc> 12,400,000 Results | mIRC - An Internet Relay Chat program @ http://www.mirc.com/ | Download mIRC or the mIRC FAQ. @ http://www.mirc.com/get.html | #mIRC-DALnet Resource Center @ http://www.mirc.org/ | - mIRC Scripting Network - mIRC Scri @ http://www.mirc.net/


and yes, i've rehashed Smile

Anything else i need to change or have to install ? Are there country differences ?

edit:

there must be something specific wrong on my machine.. a friend of mine has it too .. and it works as it should ..

The question is.. what causes this "error" ^^
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    egghelp.org community Forum Index -> Script Support & Releases All times are GMT - 4 Hours
Goto page Previous  1, 2, 3, 4 ... 56, 57, 58  Next
Page 3 of 58

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Forum hosting provided by Reverse.net

Powered by phpBB © 2001, 2005 phpBB Group
subGreen style by ktauber