| View previous topic :: View next topic |
| Author |
Message |
testebr Halfop
Joined: 01 Dec 2005 Posts: 86
|
Posted: Thu Apr 12, 2007 9:07 pm Post subject: file extension and bugmenot |
|
|
1 - retrieve file extension from www.filext.com
2 - search pass site in www.bugmenot.com |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Fri Apr 13, 2007 3:39 am Post subject: |
|
|
On the files extension site, how much data do you want returned? First match? All matches? Could be quite lengthy.. Hmm, doing a search for com returns 7 matches, and it's quite messy. Tell me what data you want returned.
As far as the bmn site, sure, that should be easy enough. Just give me some example site names (real ones) so I can test the html/regexp's, the only site I can think of offhand is naplesdailynews.com. I suppose there should also be a limit on number of results returned for that too, but I can make that a config option. |
|
| Back to top |
|
 |
testebr Halfop
Joined: 01 Dec 2005 Posts: 86
|
Posted: Fri Apr 13, 2007 5:45 pm Post subject: |
|
|
exact match.
Domains to test:
dreamcam.com.br
uol.com.br
playboy.com
sexyclube.com.br |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Sat Apr 14, 2007 1:22 am Post subject: |
|
|
They're all exact matches, what I meant was, what data do you want returned? Just the "Program and/or Extension Function" field? If you want ALL matches and ALL fields, it can end up returning hundreds of lines.
Look at: http://filext.com/file-extension/com for example. |
|
| Back to top |
|
 |
testebr Halfop
Joined: 01 Dec 2005 Posts: 86
|
Posted: Mon Apr 16, 2007 12:12 am Post subject: |
|
|
"Program and/or Extension Function" field is the best solution.
Or no? |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Mon Apr 16, 2007 12:37 am Post subject: |
|
|
| Sounds good to me, I'll also include the urls in the output for more details.. Give me a few days to work on the 2 scripts. |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
|
| Back to top |
|
 |
testebr Halfop
Joined: 01 Dec 2005 Posts: 86
|
Posted: Mon Apr 23, 2007 9:25 pm Post subject: |
|
|
very nice, work fine here.
when I grow, I want to be same you  |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Tue Apr 24, 2007 5:36 pm Post subject: |
|
|
Only takes a few months of rtfm
I'll let you figure out the bmn script, its not hard, use the filext script as a general guide to scraping webpages for content. If you need help, ask in the script help forum. About all you need to change is the url, the regexp and the particular format for the output.
The url's bugmenot uses take the form:
# http://www.bugmenot.com/view/hostname
Look through the html source from the resulting pages and find what's common to all the output you're looking for and grab it with regexp. Have fun reading that manpage  |
|
| Back to top |
|
 |
|