| View previous topic :: View next topic |
| Author |
Message |
testebr Halfop
Joined: 01 Dec 2005 Posts: 86
|
Posted: Tue Feb 20, 2007 6:02 pm Post subject: |
|
|
incith:google-1.8.6
Always return "Sorry, no search results were found." for video search.
Any problem?
Thanks |
|
| Back to top |
|
 |
euphoriac Voice
Joined: 06 Jul 2006 Posts: 4
|
Posted: Fri Feb 23, 2007 12:07 am Post subject: |
|
|
Bug spotted with 'sponsored links' in google search.
i.e. ".g planet earth" gives:
| Code: | | BBC - Science & Nature - Planet Earth @ http://www.bbc.co.uk/nature/animals/planetearth/ | Earth @ http://seds.lpl.arizona.edu/nineplanets/nineplanets/earth.html | </p><p class=g><a href="/url?q=http://ww @ /search?q=planet+earth+clothing&revid=2076090134&sa=X&oi=revisions_inline&ct=revision&cd=1>planet |
|
|
| Back to top |
|
 |
incith Master

Joined: 23 Apr 2005 Posts: 275 Location: Canada
|
|
| Back to top |
|
 |
speechles Revered One

Joined: 26 Aug 2006 Posts: 1398 Location: emerald triangle, california (coastal redwoods)
|
Posted: Sun Feb 25, 2007 2:36 am Post subject: |
|
|
Here's something I promised long ago, but it's not quite as promised, because I did what I preferred.
the !review parser I was going to make use the complicated gamespot form submittal engine, instead I found it much easier to use single query and field matching. Forms submittal would have been nicer but its more susceptible to breaking.
anyways, to make it short and sweet, here's what u get: | Code: | # !google [define:|spell:] <search terms> <1+1> <1 cm in ft> #
# <patent ##> <weather city|zip> <??? airport> #
# !images <search terms> #
# !groups <search terms> ------------------ currently broken #
# !news <search terms> #
# !local <what> near <where> #
# !localuk <what> near <where> #
# !book <search terms> -------------------- currently broken #
# !video <search terms> #
# !fight <word(s) one> vs <word(s) two> #
# !youtube <search terms> #
# !atomfilms <search terms> --------------- currently broken #
# !ifilms <search terms> ------------------ currently broken #
# !gamespot <search terms> #
# !gamefaqs <system> in <region> #
# !blog <search terms> #
# !ebay <search terms> #
# !ebayfight <word(s) one> vs <word(s) two> #
# !wikipedia <search terms> #
# !locate <ip or hostmask> #
# !review <gamename> [@ <system>] #
# !torrent <search terms> #
# !top <system> #
# !popular <system> # |
If you like it, praise incith, because without his script, I wouldn't have found the desire to plug-in sites to it. As well as madwoota big thanks to him as well for keeping things running. What I've done is only to enhance the script for gaming channels. It's messy, dirty, hack-ridden and bloated for sure, but for the most part it works, if you find use for it please thank incith and madwoota rather than me.
http://ereader.kiczek.com/UNOFFICIAL-incith-google-v1.94.tcl
The version numbering used is to avoid conflicting with any of the official tcl's.
testebr and euphoriac see below | Code: | <speechles> !g planet earth
<sp33chy> 68,800,000 results | Earth @ http://seds.lpl.arizona.edu/nineplanets/nineplanets/earth.html | BBC - Science & Nature - Planet Earth @ http://www.bbc.co.uk/nature/animals/planetearth/ | BBC - Science & Nature - Planet Earth @ http://www.bbc.co.uk/sn/tvradio/programmes/planetearth/ | Earth - Wikipedia, the free encycloped @ http://en.wikipedia.org/wiki/Earth
<speechles> !v funny
<sp33chy> 901,643 videos | Funny Animals (... Clips of an @ http://video.google.com/videoplay?docid=-6768191643962653988 | Funny Commercials II (So many fo @ http://video.google.com/videoplay?docid=-4686887310667479716 | Asian Backstreet Boys Funny Vid @ http://video.google.com/videoplay?docid=-5721216010568488162 | NTU Student survey - Funny comment @ http://video.google.com/videoplay?docid=4677717832230761610 |
|
|
| Back to top |
|
 |
incith Master

Joined: 23 Apr 2005 Posts: 275 Location: Canada
|
Posted: Sun Feb 25, 2007 4:36 am Post subject: |
|
|
Damn man, that's crazy!
I still stand by my firm beliefs that 2.0 is due out soon! =P
I'm glad this script has kept so much attention in the eggdrop community, and I'd also like to take this space to thank Google for not harassing me one bit so far for this script scraping their site (of course I'm sure they don't even notice the hits). _________________ ; Answer a few unanswered posts! |
|
| Back to top |
|
 |
darkwolf Voice
Joined: 26 Feb 2007 Posts: 9
|
Posted: Mon Feb 26, 2007 1:20 pm Post subject: |
|
|
Would there be anyway to get a better output for the gamefaqs result please.
The best i could get is using a "\n" so it create a new line, but would have been nice to have Game date then all the game from this date and so on.
Something like this...
with the \n :
XBOX360 North America (USA)
02/26 Major League Baseball 2K7
02/27 Bullet Witch
Dance Dance Revolution Universe
Samurai Warriors 2 Empires
03/01 Alone in the Dark
Battlefield: Bad Company
03/06 Def Jam: Icon
Tom Clancy's Ghost Recon Advanced Warfighter 2
03/13 Call of Duty 3 (Gold Edition)
with date and game for each date(The way it should be):
XBOX360 North America (USA)
02/26 Major League Baseball 2K7
02/27 Bullet Witch - Dance Dance Revolution Universe - Samurai Warriors 2 Empires
03/01 Alone in the Dark - Battlefield: Bad Company
03/06 Def Jam: Icon - Tom Clancy's Ghost Recon Advanced Warfighter 2
03/13 Call of Duty 3 (Gold Edition)
Im not good enuff in tcl too end up with a format like this but would be really nice if someone could do that.
thanks |
|
| Back to top |
|
 |
speechles Revered One

Joined: 26 Aug 2006 Posts: 1398 Location: emerald triangle, california (coastal redwoods)
|
Posted: Mon Feb 26, 2007 6:39 pm Post subject: |
|
|
Of course there is a way to do that, I thought of doing that at first but it was ugly to me, but I left the way to add it back intact, merely #commented out.  | Code: | } elseif {[string len $game] > 1} {
append output "${incith::google::seperator}${game}"
#append output ",${game}"
} | This is the part that handles games which do not have a date preceeding them, notice that presently it just adds a seperator (whatever you picked) and below it #commented out, is the part which would merely add a comma and game.
This is all you have to change | Code: | } elseif {[string len $game] > 1} {
#append output "${incith::google::seperator}${game}"
append output "- ${game}"
} | now with \n newline set as seperator, and the hypen as the seperator for games without dates, you have what you want.
| Quote: | | with date and game for each date(The way it should be): |
The way it should be depends on taste, you have yours, and I have mine.. at least we agree on something 
Last edited by speechles on Mon Feb 26, 2007 11:02 pm; edited 2 times in total |
|
| Back to top |
|
 |
darkwolf Voice
Joined: 26 Feb 2007 Posts: 9
|
Posted: Mon Feb 26, 2007 11:01 pm Post subject: |
|
|
thanks for the quick answer, and sorry about the comment i means the way i would like ..
nice script! |
|
| Back to top |
|
 |
speechles Revered One

Joined: 26 Aug 2006 Posts: 1398 Location: emerald triangle, california (coastal redwoods)
|
Posted: Mon Feb 26, 2007 11:30 pm Post subject: |
|
|
another small annoyance found, since I prefer bot to give top 4 results.. heh
http://www.google.com/search?q=pft&hl=en&safe=off
If you try that URL, you will see google gives junk results between the 3rd and the 4th results that throws off the parser.
edit:
solved.. before it was only able to parse past a couple of these, such as the 'planet earth' kind (as complained about above).. but I have now got the parser to skip all those 'sponsored' links completely..
http://ereader.kiczek.com/UNOFFICIAL-incith-google-v1.94.tcl |
|
| Back to top |
|
 |
incith Master

Joined: 23 Apr 2005 Posts: 275 Location: Canada
|
Posted: Tue Feb 27, 2007 10:59 pm Post subject: |
|
|
madwoota has updated google to 1.8.6a and submitted it to the Tcl archive. As always, it can be downloaded @ http://xrl.us/incithgoogle (the latest version he has available, released or not).
_________________ ; Answer a few unanswered posts! |
|
| Back to top |
|
 |
speechles Revered One

Joined: 26 Aug 2006 Posts: 1398 Location: emerald triangle, california (coastal redwoods)
|
Posted: Tue Feb 27, 2007 11:21 pm Post subject: |
|
|
That doesn't really fix it tho, since your allowing google to inject paid results into your searching.
This is much closer to the actual google results (yes, I have subresults enabled, I like them ) when shown without helpful nor paid results. You can see this by simply adding &start=1 to the query string, this will make google skip the very 1st result it uses as 0. Google thinks you have already got helpful hints, seen the paid results, and are moving through the results just starting at the 2nd result to display.
Start at 1st result --> http://www.google.com/search?hl=en&q=planet+earth&btnG=Google+Search
Start at 2nd result --> http://www.google.com/search?hl=en&q=planet+earth&btnG=Google+Search&start=1
Notice the difference?
Top url, google is overly helpful even going so far as to suggest things you might want to click that are related (ie, See results for: planet earth clothing, this is paid advertising) within the search results effectively skewing your results. While bottom url, shows how results 2-4 should actually look, which is identical to how the parser of mine displayed it . To each his own, just keeping everyone aware of the differences  |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Wed Feb 28, 2007 1:33 am Post subject: |
|
|
| Just an idea, if you wanted a cleaner result from google, without the ads and garbage, take a look at http://www.scroogle.org/scraper.html which is a google scraper, it produces cleaner output and no ads. Might help with all the parser conflicts, although scroogle doesn't have the image search and other features from google. |
|
| Back to top |
|
 |
incith Master

Joined: 23 Apr 2005 Posts: 275 Location: Canada
|
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
|
| Back to top |
|
 |
incith Master

Joined: 23 Apr 2005 Posts: 275 Location: Canada
|
Posted: Fri Mar 02, 2007 7:43 pm Post subject: |
|
|
I have made a small fix to the script today which will solve some errors/issues of "Illegal characters in URL path" messages.
It can be downloaded @ http://incith.com:88/~incith/eggdrop/incith-google.tcl until madwoota adds it into the CVS (http://xrl.us/incithgoogle) or the next publically released version of google.
diff -Narub:
| Code: | --- incith-google.tcl 2007-02-27 01:12:26.000000000 -0700
+++ eggdrop/scripts/incith-google.tcl 2007-03-02 16:39:17.000000000 -0700
@@ -1048,6 +1048,7 @@
regexp -nocase -- {^(.+?) near (.+?)$} $input - search location
# for the rest
regsub -all -- {\+} $input {%2B} input
+ regsub -all -- {\"} $input {%22} input
regsub -all -- { } $input {+} input
# GOOGLE
@@ -1092,7 +1093,7 @@
# beware, changing the useragent will result in differently formatted html from Google.
set ua "Lynx/2.8.5rel.1 libwww-FM/2.14 SSL-MM/1.4.1 OpenSSL/0.9.7e"
set http [::http::config -useragent $ua]
- set http [::http::geturl $query -timeout [expr 1000 * 10]]
+ set http [::http::geturl "$query" -timeout [expr 1000 * 10]]
set html [::http::data $http]
# generic pre-parsing |
_________________ ; Answer a few unanswered posts! |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|