egghelp.org community Forum Index
[ egghelp.org home | forum home ]
egghelp.org community
Discussion of eggdrop bots, shell accounts and tcl scripts.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Help about a link

 
Post new topic   Reply to topic    egghelp.org community Forum Index -> Scripting Help
View previous topic :: View next topic  
Author Message
cerberus_gr
Halfop


Joined: 07 Feb 2003
Posts: 97
Location: 127.0.0.1

PostPosted: Tue Jun 20, 2006 9:50 pm    Post subject: Help about a link Reply with quote

Hello,

I have a code which gets all the links from a webpage. The formats could be:

1) http://www.domain/folder/file.htm
2) www.domain/folder/file.htm
3) http://domain/folder/file.htm
4) /folder/file.htm
5) file.hmt (relative)

I want to create a procedure which takes as parameters the link and the link from the html which parsed and returns the link in the format:

1) http://domain/folder/file.htm or
2) http://www.domain/folder/file.htm


Example:
Code:

proc format_url { link parent } {
   ...
}



Thanks
Back to top
View user's profile Send private message
SaPrOuZy
Halfop


Joined: 24 Mar 2004
Posts: 75
Location: Lebanon

PostPosted: Wed Jun 21, 2006 8:55 am    Post subject: Reply with quote

try to be clearer...
Back to top
View user's profile Send private message
cerberus_gr
Halfop


Joined: 07 Feb 2003
Posts: 97
Location: 127.0.0.1

PostPosted: Wed Jun 21, 2006 10:30 am    Post subject: Reply with quote

Let's try again Smile

I have a webpage in html format with 100 links inside. The links don't have the same format . The formats of the links for the file file.htm are:

1) <a href="http://www.domain/folder/file.htm">
2) <a href="www.domain/folder/file.htm">
3) <a href="http://domain/folder/file.htm">
4) <a href="/folder/file.htm">
5) <a href="file.htm"> (relative)


I have written a code which extracts all the links from the webpage and adds them to a list. So, I have a list like the following:

Code:

(bin) 49 % echo $links
{http://www.domain/folder/file.htm www.domain/folder/file.htm http://domain/folder/file.htm /folder/file.htm file.htm}



Now, I want to create a procedure which takes each one of the links and returns it on the format:

http://www.domain/folder/file.htm or
http://domain/folder/file.htm

Example:
Code:

proc format_url { link_found parent_link } {

}



(bin) 50 % set a [format_url "http://www.domain/folder/file.htm" "http://www.domain/lala"]
http://www.domain/folder/file.htm

(bin) 51 % set a [format_url "www.domain/folder/file.htm" "http://www.domain/lala"]
http://www.domain/folder/file.htm

(bin) 52 % set a [format_url "http://domain/folder/file.htm" "http://www.domain/lala"]
http://www.domain/folder/file.htm

(bin) 53 % set a [format_url "/folder/file.htm" "http://www.domain/lala"]
http://www.domain/folder/file.htm

(bin) 54 % set a [format_url "file.htm" "http://www.domain/lala"]
http://www.domain/lala/file.htm


I'm not so good with regural expressions, so i need some help with this.


Last edited by cerberus_gr on Thu Jun 22, 2006 7:46 am; edited 1 time in total
Back to top
View user's profile Send private message
user
 


Joined: 18 Mar 2003
Posts: 1452
Location: Norway

PostPosted: Thu Jun 22, 2006 7:19 am    Post subject: Reply with quote

Your request is weird.

1) "www.domain" != "domain"
2) links not starting with a protocol are relative, so the absolute version of "www.domain" would be "http://base.href/www.domain"
3) your last example doesn't make any sense to me at all
_________________
Have you ever read "The Manual"?
Back to top
View user's profile Send private message
cerberus_gr
Halfop


Joined: 07 Feb 2003
Posts: 97
Location: 127.0.0.1

PostPosted: Thu Jun 22, 2006 7:56 am    Post subject: Reply with quote

Most of times www.domain is the same with domain, a the www is the default subdomain.

You are correct about 2, I didn't think like this.


I 'll describe you what exactly I want to do:

I want to create a package which extracts data from webpages. I'm going to give it a initial webpage and the script is going to follow every page and check for data inside. I'll have a list with all links that script found, and i'm going to visit every one.

My problem is that a lot of pages have links in different format. It could be a page which has 2 same links ("http://domain/hello.htm" and "/hello.htm") and I want my code to be clever to understand that these links are the same.

That's why I want to add links to the list with format "http://(subdomain.)domain/file.htm" in order to could check if a link already exists to the list and don't loose time to parse it again.

So, I need a procedure which is going to return a link in this format (like a web browser does with links)
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    egghelp.org community Forum Index -> Scripting Help All times are GMT - 4 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Forum hosting provided by Reverse.net

Powered by phpBB © 2001, 2005 phpBB Group
subGreen style by ktauber