egghelp.org community Forum Index
[ egghelp.org home | forum home ]
egghelp.org community
Discussion of eggdrop bots, shell accounts and tcl scripts.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Parsing HTML encoded in US-Ascii

 
Post new topic   Reply to topic    egghelp.org community Forum Index -> Scripting Help
View previous topic :: View next topic  
Author Message
Yourmove
Voice


Joined: 18 Jul 2006
Posts: 2

PostPosted: Tue Jul 18, 2006 4:19 pm    Post subject: Parsing HTML encoded in US-Ascii Reply with quote

I've been trying (for a very long time) to parse a website (http://www.anidb.info) using my eggdrop bot, however for some reason all it would return is jibberish. My other scripts that parsed websites worked fine. I wasn't sure what was happening at first but then I realized that my eggdrop didn't have a *.enc file for us-ascii. I tried to create my own however it seems that I couldn't change the encoding files directory. So I came here (after searching the forums for an answer) to ask if anyone has successfully been able to parse a website that was encoded in US-ASCII and what was the process that you used? I read the tutorials on characters and encoding but...that really didn't help me solve the problem. The system is using TCL 8.4 and I'm using the http package.

Edit: I'm still new to TCL so please be patient...
Back to top
View user's profile Send private message
De Kus
Revered One


Joined: 15 Dec 2002
Posts: 1361
Location: Germany

PostPosted: Wed Jul 19, 2006 3:35 am    Post subject: Reply with quote

I somehow doubt its a charset problem (since the default charset iso-8859-1 and most others include US-ASCII). I rather believe its because the server returns a gzipped page. The server sends gzipped content even if you explicitly forbid it in the HTTP request or even a HTTP version which doesnt support that and is therefore a violation against HTTP RFC 2965/RFC 2616 in many ways
.you will most likely have to turn over the content to gunzip so you can read uncompressed file then.
PuTTY wrote:
GET /perl-bin/animedb.pl HTTP/1.1
Host: anidb.info
Accept-Encoding: chunked;q=1, *;q=0

HTTP/1.1 200 OK
Date: Wed, 19 Jul 2006 07:30:27 GMT
Server: Apache/1.3.36 (Unix) mod_perl/1.29
Set-Cookie: adbuin=1153294273-nVfC; path=/; expires=Sat, 16-Jul-2016 07:31:13 GMT
Cache-control: no-cache
Pragma: no-cache
Content-Type: text/html
Expires: Wed, 19 Jul 2006 07:31:13 GMT
X-Cache: MISS from anidb.info
Content-Encoding: gzip
Content-Length: 8216

PuTTY wrote:
GET /perl-bin/animedb.pl HTTP/1.0
Host: anidb.info

HTTP/1.1 200 OK
Date: Wed, 19 Jul 2006 07:31:57 GMT
Server: Apache/1.3.36 (Unix) mod_perl/1.29
Set-Cookie: adbuin=1153294324-QXPa; path=/; expires=Sat, 16-Jul-2016 07:32:04 GMT
Cache-control: no-cache
Pragma: no-cache
Content-Type: text/html
Expires: Wed, 19 Jul 2006 07:32:04 GMT
X-Cache: MISS from anidb.info
Connection: close
Content-Encoding: gzip
Content-Length: 8216

As you can see... it even ignores the HTTP/1.0 request and sends HTTP/1.1 even if its not supported. I wonder if you can make Apache doing that without hardcoding the header in the PERL scripts which would be just plainly stupid from side of the scripter... maybe they don't care about people not being able to use gzip (even old IE would choke on that, since it supported only deflate).

Hint: if you want to show the &...; encoded Japanese charaters you will most likely have to use UTF-8 or SHIFT-JIS output (and of course find a libary that can convert them to a native encoding supported by TCL).
_________________
De Kus
StarZ|De_Kus, De_Kus or DeKus on IRC
Copyright © 2005-2009 by De Kus - published under The MIT License
Love hurts, love strengthens...
Back to top
View user's profile Send private message MSN Messenger
Yourmove
Voice


Joined: 18 Jul 2006
Posts: 2

PostPosted: Wed Jul 19, 2006 8:25 am    Post subject: Ok Reply with quote

Oh, I didn't even notice that part. Thanks for the information. I'll try and see what I can do. I'll report back if I still get problems.

Thanks again.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    egghelp.org community Forum Index -> Scripting Help All times are GMT - 4 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Forum hosting provided by Reverse.net

Powered by phpBB © 2001, 2005 phpBB Group
subGreen style by ktauber