egghelp.org community Forum Index
[ egghelp.org home | forum home ]
egghelp.org community
Discussion of eggdrop bots, shell accounts and tcl scripts.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

[script/library] Levenshtein's distance v1.0

 
Post new topic   Reply to topic    egghelp.org community Forum Index -> Script Support & Releases
View previous topic :: View next topic  
Author Message
MenzAgitat
Op


Joined: 04 Jul 2006
Posts: 118
Location: France

PostPosted: Sat Jul 25, 2009 5:22 pm    Post subject: [script/library] Levenshtein's distance v1.0 Reply with quote

 
 
This script provides package Levenshtein :
Code:
Package provide Levenshtein 1.0



Description:

In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences (i.e., the so called edit distance). The Levenshtein distance between two strings is given by the minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution of a single character. A generalization of the Levenshtein distance (Damerau–Levenshtein distance) allows the transposition of two characters as an operation.

( see http://en.wikipedia.org/wiki/Levenshtein_distance )


Interest:

  • Allows an orthographical corrector to suggest alternate words with a low Levenshtein distance.
  • Allows pseudo-AI to have an orthographical tolerance.
  • ...

Syntax:

levenshtein::distance <string 1> <string 2>

you can also use a public command if you want to test things :

!test_levenshtein <string 1> <string 2>


Examples (in french, sorry):
Code:
levenshtein::distance "bonjour" "bougeoir"
-> 4
you must manipulate 4 characters to transform the word "bonjour" into the word "bougeoir" :
  • BONJOUR
  • BOUJOUR -> we replace N by U
  • BOUGOUR -> we replace J by G
  • BOUGEOUR -> we insert E
  • BOUGEOIR -> we replace U by I
Code:
levenshtein::distance "antiquaire" "antikaire"
-> 2
We can conclude from this 10 letters long example that it is very similar to the second word, with a distance of only 2.
You must keep in mind that a distance of 2 between two words of 10 letters means they are very similar, while a distance of 2 between two words of 3 letters means they are very different as you can see in the following example :
Code:
levenshtein::distance "pin" "pas"
-> 2
As you can see, a distance of 2 doesnt mean much difference in a 10 letters word but represents important modifications in a 3 letters one.
In order to preserve relevance of results, you'll take care to always link the tolerance to the length, proportionately.
Code:
levenshtein::distance "antiquaire" "dimanche"
-> 8
In this last example, we can see that the distance between the first word and the second is 8. They are very different words.


Download:

Levenshtein's distance v1.0
    Back to top
    View user's profile Send private message Visit poster's website
    Display posts from previous:   
    Post new topic   Reply to topic    egghelp.org community Forum Index -> Script Support & Releases All times are GMT - 4 Hours
    Page 1 of 1

     
    Jump to:  
    You cannot post new topics in this forum
    You cannot reply to topics in this forum
    You cannot edit your posts in this forum
    You cannot delete your posts in this forum
    You cannot vote in polls in this forum


    Forum hosting provided by Reverse.net

    Powered by phpBB © 2001, 2005 phpBB Group
    subGreen style by ktauber