| View previous topic :: View next topic |
| Author |
Message |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Tue Oct 10, 2006 11:22 am Post subject: Dictionary.com script (finished/final) |
|
|
I'd like to announce a test release of a dictionary.com script I'm working on. The script works to retrieve definitions from dictionary.com's unabridged dictionary v1.0.1. Other dictionaries are available from their site and will be added to the script, with the option of searching them in particular, or the default as a fall-back. It can show suggested spellings if you misspell a word. Shows results including the pronunciation key, word forms, each definition, word origin, synonyms and antonyms.
Currently, the options available are:
Commandline options (privmsg $botnick or typed in channel):
To look up a word, simply use : .dict <word>
To show only the word origin : .dict dcorigin <word>
To show only synonyms/antonyms: .dict dcsyn <word>
Script Configure Options:
# Channels where we allow public use
set dcomchans "#mychan #chan2 #etc"
# Channels that only respond via privmsg
set dcquietchans "#chan2 #etc"
# Timeout for geturl
set dcomtimeout "30000"
# If you want limit output, set the line-limit here
#(this will truncate results.) Set to 0 for no-limit.
set dclinelimit 0
# Show Word Origins when available? 1 == yes, 0 == no
set dcorigin 1
# Show Synonyms and Antynoms for words? 1 == yes, 0 == no
set dcsynant 1
Dictionary.com provides the following additional databases, which will be incorporated into this script:
American Heritage Dictionary
Webster's 1913 Dictionary
WordNet v2.0
American Heritage Steadman Medical Dictionary
Merriam Webster Medical Dictionary
Free On-line Dictionary of Computing (FOLDOC)
Internet Jargon File
Wallstreet Words
Investopedia
Merriam-Webster's Dictionary of Law
I'd appreciate people testing the script and sending me words that produce errors (showing html codes or truncated results for example. Dictionary.com tends to use a LOT of unicode chars and short of adding thousands of them to the string map, I've been adding them as I see them.)
Keep in mind this is a preview of the script, the other databases are not yet incorporated, although the regexp's are in place (just need to finish them to format the results
Check the url for updates, there will no doubt be many
http://members.dandy.net/~fbn/dictcom.tcl.txt
Last edited by rosc2112 on Thu Oct 12, 2006 1:58 am; edited 4 times in total |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Wed Oct 11, 2006 5:35 pm Post subject: v0.01c |
|
|
History
- Oct. 08 2006 - Initial conception
- Oct. 11 2006 - Added more db's, added ability for user to specify line-limit (still respecting admin's choice of max line-limit, and added option for admin to allow user to override that limit or not), etc.
- Removed dcorigin/dcsynant options, added combined commandline options.
Databases: Dictionary.com Unabridged Dictionary; American Heritage Dictionary; Webster's 1913 Dictionary;American Heritage Stedman's Medical Dictionary; Merriam-Webster Medical Dictionary, Investopedia |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Thu Oct 12, 2006 2:12 am Post subject: v0.01d |
|
|
- Added a 'dbmatch' option to show which databases a word is found in.
- Added the rest of the db's (if dictionary.com provided the VERA, GCIDE and Ambrose Bierce's Devil's dictionaries, this script would be complete compared to db's available from dict.org.)
- Consolidated redundant regexp's into a seperate proc.
I found a few more db's in addition to the one's I mentioned earlier. Here's a complete list:
Databases:
Dictionary.com Unabridged Dictionary;
Webster's 1913 Unabridged Dictionary;
Webster's New Millenium Dictionary;
WordNet Dictionary;
American Heritage Dictionary;
American Heritage Stedman's Medical Dictionary;
American Heritage Dictionary of Idioms;
Merriam-Webster Medical Dictionary;
Merriam-Webster Law Dictionary;
Investopedia;
Wall Street Words;
Easton's 1897 Bible Dictionary;
Hitchcock's Bible Names Dictionary;
Free On-line Dictionary of Computing;
Jargon File;
US Gazetteer 1990 Census;
CIA 1995 World Factbook;
Atomic Elements Database |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Sat Oct 14, 2006 2:02 pm Post subject: v0.02a |
|
|
Oct. 12 2006 - Minor changes to dbnames.
Oct. 13 2006 - Fixed a mistake in the variable used for selecting a database.
-Added a string map for 'dbmatch' to show the dbname's as used in the script (rather than the names as known to dictionary.com)
- Added additional error msg when user selects a database and the word is not found.
- Changed proc dictmsg test to see if user is either on the channel or validuser, if neither, script quietly returns (unknown users outside of channels cannot use.)
- Made a configuration option for limiting input length. |
|
| Back to top |
|
 |
v00j00 Voice
Joined: 18 Dec 2005 Posts: 4
|
Posted: Wed Nov 15, 2006 3:58 pm Post subject: |
|
|
| Thank you! |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Wed Nov 15, 2006 4:29 pm Post subject: |
|
|
Welcome
I just made a minor update to the script, added more unicode chars to the string list.. Same url as above. |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Sun Dec 03, 2006 6:21 am Post subject: v0.02k |
|
|
| Dictionary.com changed their html a bit, a fix has been uploaded (same url as above.) There are also some new db's that I'll be adding in the next few days. |
|
| Back to top |
|
 |
cache Master
Joined: 10 Jan 2006 Posts: 306 Location: Mass
|
Posted: Mon Dec 18, 2006 10:10 pm Post subject: |
|
|
did html change again? just wonder since I just added this and see odd characters and no spaces when it lists them like...
1.whatever
2.whatever |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Mon Dec 18, 2006 11:06 pm Post subject: |
|
|
| Which word, and what commandline options did you look up? I just checked and don't see any changes in the html being used. And yes, it will normally list definitions 1 at a time for the default dictionary. |
|
| Back to top |
|
 |
cache Master
Joined: 10 Jan 2006 Posts: 306 Location: Mass
|
Posted: Mon Dec 18, 2006 11:46 pm Post subject: |
|
|
This is how I see it, if this is how it's suppose to run thats fine.. and thanks for all these new scripts you've been making
| Code: |
It shows this and no spaces by numbers while it lists..
<Bot> Dictionary.com Unabridged: Results for 'chat' - [pronunciation key: chat ]
<Bot> verb (used without object)
<Bot> 1.to converse in a familiar or informal manner.
<Bot> noun
<Bot> 2.informal conversation: We had a pleasant chat.
<Bot> 3.any of several small Old World thrushes, esp. of the genus Saxicola, having a chattering cry.
<Bot> 4.yellow-breasted chat.
<Bot> Verb phrase
<Bot> Output limit reached [6 lines max]
<Bot> Origin: 140050; late ME; short for chatter
<Bot> Synonyms: 1, 2. talk, chitchat, gossip, visit.
<Bot> [End Dictionary.com Unabridged - 'chat'] |
|
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Tue Dec 19, 2006 2:05 am Post subject: |
|
|
The binary codes are unicode chars (EN DASH and EM DASH), and unfortunately I'm not able to cut/paste them into the string map to translate them.. I also tried using the hex code for them, but that don't work for my platform either. Curiously, when I save the page in Mozilla, I get this char:
â
because the first part of the hex for 'EN DASH' is 0xE2 (the full hex code for it is 0xE2 0x80 0x93 (e28093))
If anyone has a clue about how to add these pesky unicode chars into string map, I'd appreciate a hint (I'm not even able to reproduce the chars using bash's \x codes, I don't know what the octals for the chars are so I didn't try that..)
Here's info about the chars:
http://www.fileformat.info/info/unicode/char/2013/index.htm
http://www.fileformat.info/info/unicode/char/2014/index.htm
Needless to say, dictionary.com is turning into a real pain in the rump cos they keep changing these silly little things (they were originally using the html codes for dash..) |
|
| Back to top |
|
 |
BeBoo Halfop
Joined: 26 Sep 2007 Posts: 42
|
Posted: Tue Nov 20, 2007 2:31 pm Post subject: |
|
|
I'm getting the following error when loading it:
| Code: | can't read "dcomdef": no such variable
while executing
"regexp {class="sectionLabel">.+?Synonyms.*?</span>(.*?)</div>} $dcomdef match dcsynon"
invoked from within
"if {[regexp {class="sectionLabel">.+?Synonyms.*?</span>(.*?)</div>} $dcomdef match dcsynon]} {
regsub -all -nocase {<b>.*?</b>} $dcsynon } dcsynon" |
Any ideas? |
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
|
| Back to top |
|
 |
rosc2112 Revered One

Joined: 19 Feb 2006 Posts: 1454 Location: Northeast Pennsylvania
|
Posted: Sat Apr 05, 2008 8:26 am Post subject: |
|
|
I'm assuming there's a version incompatibility between the version of tcl I'm using, and the version the people getting that "dcomdef" var error are using - something about the regexp, since obviously $dcomfef IS defined and the other regexp's don't throw any errors about it. I'm using tcl 8.4.11 and eggdrop 1.6.18.
Or perhaps its a unix-vs-windrop incompatibility (if you're using windrop and get that error, I can only suggest using a real unix system, or at the least, eggdrop & tcl compiled under cygwin.) I don't do windoze =) |
|
| Back to top |
|
 |
speechles Revered One

Joined: 26 Aug 2006 Posts: 1398 Location: emerald triangle, california (coastal redwoods)
|
Posted: Sun Apr 06, 2008 1:43 am Post subject: |
|
|
rosc2112, it's this part: | Code: | | regsub -all -nocase {<b>.*?</b>} $dcsynon {[*control-code 002 is here*]&[*control-code 002 is here*]} dcsynon |
You've embedded unescaped control-codes to handle the bold rather than the proper escape sequence \002. Synonyms and antonyms suffer from this. Those who aren't saving the link directly, and instead copying and resaving in their editor of choice will most certainly have problems.. It's a simple fix.
The problem becomes noticeable in BeBoo's error message: | Code: | can't read "dcomdef": no such variable
while executing
"regexp {class="sectionLabel">.+?Synonyms.*?</span>(.*?)</div>} $dcomdef match dcsynon"
invoked from within
"if {[regexp {class="sectionLabel">.+?Synonyms.*?</span>(.*?)</div>} $dcomdef match dcsynon]} {
regsub -all -nocase {<b>.*?</b>} $dcsynon } dcsynon" |
Notice particularly that part below with the stange brace all alone, this is where the embedded control-codes are causing conflict.
Edit: Appears this embedding happens in other spots as well, but it's the same exact 'regsub' concerning bold, except each is done on a different variable. When this control-character crosses platforms it causes unexpected problems. This directly stems from using *.tcl.txt to name the script, the browser will display these itself. Named as *.tcl the browser would only ask to save them or open them with another program, not display them. But....This would not even be an issue if people would stop embedding in the first place and properly generate their characters using escape sequences or similar means of creation. Calling windrop inferior rather than investigate your own code is just silly. Most windrop related script issues are usually: 1) the script author not following accepted standards or 2) dependencies on incompatible/missing modules (read this as the user now has to fully install cygwin to create it). |
|
| Back to top |
|
 |
|