This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

Dictionary.com script (finished/final)

Support & discussion of released scripts, and announcements of new releases.
User avatar
rosc2112
Revered One
Posts: 1454
Joined: Sun Feb 19, 2006 8:36 pm
Location: Northeast Pennsylvania

Dictionary.com script (finished/final)

Post by rosc2112 »

I'd like to announce a test release of a dictionary.com script I'm working on. The script works to retrieve definitions from dictionary.com's unabridged dictionary v1.0.1. Other dictionaries are available from their site and will be added to the script, with the option of searching them in particular, or the default as a fall-back. It can show suggested spellings if you misspell a word. Shows results including the pronunciation key, word forms, each definition, word origin, synonyms and antonyms.

Currently, the options available are:

Commandline options (privmsg $botnick or typed in channel):
To look up a word, simply use : .dict <word>
To show only the word origin : .dict dcorigin <word>
To show only synonyms/antonyms: .dict dcsyn <word>

Script Configure Options:

# Channels where we allow public use
set dcomchans "#mychan #chan2 #etc"

# Channels that only respond via privmsg
set dcquietchans "#chan2 #etc"

# Timeout for geturl
set dcomtimeout "30000"

# If you want limit output, set the line-limit here
#(this will truncate results.) Set to 0 for no-limit.
set dclinelimit 0

# Show Word Origins when available? 1 == yes, 0 == no
set dcorigin 1

# Show Synonyms and Antynoms for words? 1 == yes, 0 == no
set dcsynant 1

Dictionary.com provides the following additional databases, which will be incorporated into this script:
American Heritage Dictionary
Webster's 1913 Dictionary
WordNet v2.0
American Heritage Steadman Medical Dictionary
Merriam Webster Medical Dictionary
Free On-line Dictionary of Computing (FOLDOC)
Internet Jargon File
Wallstreet Words
Investopedia
Merriam-Webster's Dictionary of Law

I'd appreciate people testing the script and sending me words that produce errors (showing html codes or truncated results for example. Dictionary.com tends to use a LOT of unicode chars and short of adding thousands of them to the string map, I've been adding them as I see them.)

Keep in mind this is a preview of the script, the other databases are not yet incorporated, although the regexp's are in place (just need to finish them to format the results ;)

Check the url for updates, there will no doubt be many :)
http://members.dandy.net/~fbn/dictcom.tcl.txt
Last edited by rosc2112 on Thu Oct 12, 2006 1:58 am, edited 4 times in total.
User avatar
rosc2112
Revered One
Posts: 1454
Joined: Sun Feb 19, 2006 8:36 pm
Location: Northeast Pennsylvania

v0.01c

Post by rosc2112 »

History
- Oct. 08 2006 - Initial conception
- Oct. 11 2006 - Added more db's, added ability for user to specify line-limit (still respecting admin's choice of max line-limit, and added option for admin to allow user to override that limit or not), etc.
- Removed dcorigin/dcsynant options, added combined commandline options.

Databases: Dictionary.com Unabridged Dictionary; American Heritage Dictionary; Webster's 1913 Dictionary;American Heritage Stedman's Medical Dictionary; Merriam-Webster Medical Dictionary, Investopedia
User avatar
rosc2112
Revered One
Posts: 1454
Joined: Sun Feb 19, 2006 8:36 pm
Location: Northeast Pennsylvania

v0.01d

Post by rosc2112 »

- Added a 'dbmatch' option to show which databases a word is found in.
- Added the rest of the db's (if dictionary.com provided the VERA, GCIDE and Ambrose Bierce's Devil's dictionaries, this script would be complete compared to db's available from dict.org.)
- Consolidated redundant regexp's into a seperate proc.

I found a few more db's in addition to the one's I mentioned earlier. Here's a complete list:

Databases:
Dictionary.com Unabridged Dictionary;
Webster's 1913 Unabridged Dictionary;
Webster's New Millenium Dictionary;
WordNet Dictionary;
American Heritage Dictionary;
American Heritage Stedman's Medical Dictionary;
American Heritage Dictionary of Idioms;
Merriam-Webster Medical Dictionary;
Merriam-Webster Law Dictionary;
Investopedia;
Wall Street Words;
Easton's 1897 Bible Dictionary;
Hitchcock's Bible Names Dictionary;
Free On-line Dictionary of Computing;
Jargon File;
US Gazetteer 1990 Census;
CIA 1995 World Factbook;
Atomic Elements Database
User avatar
rosc2112
Revered One
Posts: 1454
Joined: Sun Feb 19, 2006 8:36 pm
Location: Northeast Pennsylvania

v0.02a

Post by rosc2112 »

Oct. 12 2006 - Minor changes to dbnames.
Oct. 13 2006 - Fixed a mistake in the variable used for selecting a database.
-Added a string map for 'dbmatch' to show the dbname's as used in the script (rather than the names as known to dictionary.com)
- Added additional error msg when user selects a database and the word is not found.
- Changed proc dictmsg test to see if user is either on the channel or validuser, if neither, script quietly returns (unknown users outside of channels cannot use.)
- Made a configuration option for limiting input length.
v
v00j00
Voice
Posts: 4
Joined: Sun Dec 18, 2005 8:59 am

Post by v00j00 »

Thank you!
User avatar
rosc2112
Revered One
Posts: 1454
Joined: Sun Feb 19, 2006 8:36 pm
Location: Northeast Pennsylvania

Post by rosc2112 »

Welcome :)

I just made a minor update to the script, added more unicode chars to the string list.. Same url as above.
User avatar
rosc2112
Revered One
Posts: 1454
Joined: Sun Feb 19, 2006 8:36 pm
Location: Northeast Pennsylvania

v0.02k

Post by rosc2112 »

Dictionary.com changed their html a bit, a fix has been uploaded (same url as above.) There are also some new db's that I'll be adding in the next few days.
c
cache
Master
Posts: 306
Joined: Tue Jan 10, 2006 4:59 am
Location: Mass

Post by cache »

did html change again? just wonder since I just added this and see odd characters and no spaces when it lists them like...

1.whatever
2.whatever
User avatar
rosc2112
Revered One
Posts: 1454
Joined: Sun Feb 19, 2006 8:36 pm
Location: Northeast Pennsylvania

Post by rosc2112 »

Which word, and what commandline options did you look up? I just checked and don't see any changes in the html being used. And yes, it will normally list definitions 1 at a time for the default dictionary.
c
cache
Master
Posts: 306
Joined: Tue Jan 10, 2006 4:59 am
Location: Mass

Post by cache »

This is how I see it, if this is how it's suppose to run thats fine.. and thanks for all these new scripts you've been making :)

Code: Select all

It shows this  and no spaces by numbers while it lists..

<Bot> Dictionary.com Unabridged: Results for 'chat' - [pronunciation key: chat ]
<Bot> verb (used without object)
<Bot> 1.to converse in a familiar or informal manner.
<Bot> noun
<Bot> 2.informal conversation: We had a pleasant chat.
<Bot> 3.any of several small Old World thrushes, esp. of the genus Saxicola, having a chattering cry.
<Bot> 4.yellow-breasted chat.
<Bot> Verb phrase
<Bot> Output limit reached [6 lines max]
<Bot> Origin: 140050; late ME; short for chatter
<Bot> Synonyms: 1, 2. talk, chitchat, gossip, visit.
<Bot> [End Dictionary.com Unabridged - 'chat']
User avatar
rosc2112
Revered One
Posts: 1454
Joined: Sun Feb 19, 2006 8:36 pm
Location: Northeast Pennsylvania

Post by rosc2112 »

The binary codes are unicode chars (EN DASH and EM DASH), and unfortunately I'm not able to cut/paste them into the string map to translate them.. I also tried using the hex code for them, but that don't work for my platform either. Curiously, when I save the page in Mozilla, I get this char:

â

because the first part of the hex for 'EN DASH' is 0xE2 (the full hex code for it is 0xE2 0x80 0x93 (e28093))

If anyone has a clue about how to add these pesky unicode chars into string map, I'd appreciate a hint (I'm not even able to reproduce the chars using bash's \x codes, I don't know what the octals for the chars are so I didn't try that..)

Here's info about the chars:

http://www.fileformat.info/info/unicode ... /index.htm
http://www.fileformat.info/info/unicode ... /index.htm

Needless to say, dictionary.com is turning into a real pain in the rump cos they keep changing these silly little things (they were originally using the html codes for dash..)
B
BeBoo
Halfop
Posts: 42
Joined: Wed Sep 26, 2007 1:44 am

Post by BeBoo »

I'm getting the following error when loading it:

Code: Select all

can't read "dcomdef": no such variable
    while executing
"regexp {class="sectionLabel">.+?Synonyms.*?</span>(.*?)</div>} $dcomdef match dcsynon"
    invoked from within
"if {[regexp {class="sectionLabel">.+?Synonyms.*?</span>(.*?)</div>} $dcomdef match dcsynon]} {
                        regsub -all -nocase {<b>.*?</b>} $dcsynon } dcsynon"
Any ideas?
User avatar
rosc2112
Revered One
Posts: 1454
Joined: Sun Feb 19, 2006 8:36 pm
Location: Northeast Pennsylvania

Post by rosc2112 »

User avatar
rosc2112
Revered One
Posts: 1454
Joined: Sun Feb 19, 2006 8:36 pm
Location: Northeast Pennsylvania

Post by rosc2112 »

I'm assuming there's a version incompatibility between the version of tcl I'm using, and the version the people getting that "dcomdef" var error are using - something about the regexp, since obviously $dcomfef IS defined and the other regexp's don't throw any errors about it. I'm using tcl 8.4.11 and eggdrop 1.6.18.

Or perhaps its a unix-vs-windrop incompatibility (if you're using windrop and get that error, I can only suggest using a real unix system, or at the least, eggdrop & tcl compiled under cygwin.) I don't do windoze =)
User avatar
speechles
Revered One
Posts: 1398
Joined: Sat Aug 26, 2006 10:19 pm
Location: emerald triangle, california (coastal redwoods)

Post by speechles »

rosc2112, it's this part:

Code: Select all

regsub -all -nocase {<b>.*?</b>} $dcsynon {[*control-code 002 is here*]&[*control-code 002 is here*]} dcsynon
You've embedded unescaped control-codes to handle the bold rather than the proper escape sequence \002. Synonyms and antonyms suffer from this. Those who aren't saving the link directly, and instead copying and resaving in their editor of choice will most certainly have problems.. It's a simple fix.

The problem becomes noticeable in BeBoo's error message:

Code: Select all

can't read "dcomdef": no such variable
    while executing
"regexp {class="sectionLabel">.+?Synonyms.*?</span>(.*?)</div>} $dcomdef match dcsynon"
    invoked from within
"if {[regexp {class="sectionLabel">.+?Synonyms.*?</span>(.*?)</div>} $dcomdef match dcsynon]} {
                        regsub -all -nocase {<b>.*?</b>} $dcsynon } dcsynon"
Notice particularly that part below with the stange brace all alone, this is where the embedded control-codes are causing conflict.

Edit: Appears this embedding happens in other spots as well, but it's the same exact 'regsub' concerning bold, except each is done on a different variable. When this control-character crosses platforms it causes unexpected problems. This directly stems from using *.tcl.txt to name the script, the browser will display these itself. Named as *.tcl the browser would only ask to save them or open them with another program, not display them. But....This would not even be an issue if people would stop embedding in the first place and properly generate their characters using escape sequences or similar means of creation. Calling windrop inferior rather than investigate your own code is just silly. Most windrop related script issues are usually: 1) the script author not following accepted standards or 2) dependencies on incompatible/missing modules (read this as the user now has to fully install cygwin to create it).
Post Reply