Installing the Python-levenshtein module
Pip Install Python-levenshtein
Using the Python-levenshtein module
Import Levenshtein
Algorithm description
1). levenshtein.hamming (str1, STR2)
Calculates the Hamming distance. Requirements str1 and str2 must be of the same length. is to describe the number of different characters in the corresponding position between the two equal long strings.
2). Levenshtein.distance (str1, STR2)
Calculates the edit distance (also known as the Levenshtein distance). is to describe the number of operations that are converted from one string to another, including inserting, deleting, and replacing.
The algorithm realizes the reference dynamic planning arrangement.
3). Levenshtein.ratio (str1, STR2)
Calculate Levinsteinby. Calculates the formula R = (sum-ldist)/sum, where sum refers to the sum of the lengths of the str1 and str2 strings, Ldist is the class edit distance
Note: The class editing distance here is not 2 of the edit distance, 2 of three operations in each operation +1, and here, delete, insert still +1, but replace +2
The purpose of this design: ratio (' A ', ' C '), sum=2, calculated as 2 (2-1)/2 = 0.5, ' A ', ' C ' does not overlap, obviously not cost-effective, but the replacement operation of +2, you can solve this problem.
4). Levenshtein.jaro (S1, S2)
Calculate Jaro Distance,
Where the M is S1, the S2 match length, when a position is considered to match when the position character is the same, or in no more than
T is half the number of exchanges.
5.) Levenshtein.jaro_winkler (s 1, s 2)
Calculate Jaro–winkler Distance:
Import Levenshtein Error: Importerror:no module named Levenshtein
So go: Python-levenshtein download source to install (in http://www.lfd.uci.edu/~gohlke/pythonlibs/#python-levenshtein actually have compiled EXE), The first time the installation of the error: Error:unable to find Vcvarsall.bat, but in fact I was installed VS2010, so perform the following steps normal installation:
1. Set the environment variable to execute:
SET vs90comntools=%vs100comntools%
2. To install again:
setup.py Install
Can be normal, compiled, installed.
$ python >>> import Levenshtein >>> help (Levenshtein.ratio) ratio (...)
Compute similarity of two strings. Ratio (string1, string2) The similarity is a number between 0 and 1, it ' s usually equal or somewhat higher than di Fflib.
Sequencematcher.ratio (), becuase it ' s based on real minimal edit distance.
Examples: >>> ratio (' Hello world! ', ' Holly grail! ') 0.58333333333333337 >>> ratio (' Brian ', ' Jesus ') 0.0 >>> help (levenshtein.distance) distance (..
.)
Compute absolute Levenshtein distance of two strings. Distance (string1, string2) examples (it s hard to spell Levenshtein): >>> correctly (' distance n ', ' Lenvinsten ') 4 >>> distance (' Levenshtein ', ' Levensthein ') 2 >>> distance (' Levenshte In ', ' Levenshten ') 1 >>> distance (' Levenshtein ', ' Levenshtein ') 0
Difflib Library
>>> import difflib
>>> difflib. Sequencematcher (None, ' abcde ', ' abcde '). Ratio ()
1.0
>>> difflib. Sequencematcher (None, ' abcde ', ' zbcde '). Ratio ()
0.80000000000000004
>>> difflib. Sequencematcher (None, ' abcde ', ' zyzzy '). Ratio ()
0.0
Fuzzywuzzy
git clone git://github.com/seatgeek/fuzzywuzzy.git fuzzywuzzy cd fuzzywuzzy python setup.py install >>> from Fuz Zywuzzy import fuzz >>> from Fuzzywuzzy import process simple Ratio >>> fuzz.ratio ("This is a test", "t
He is a test! ")
Partial Ratio >>> Fuzz.partial_ratio ("This are a test", "This is a test!") Token Sort Ratio >>> fuzz.ratio ("Fuzzy Wuzzy is a bear", "Wuzzy Fuzzy is a Bear") >>> Fuzz . Token_sort_ratio ("Fuzzy Wuzzy is a bear", "Wuzzy Fuzzy is a Bear") token Set ratio >>> fuzz.token_sor T_ratio ("Fuzzy is a Bear", "Fuzzy fuzzy is a Bear") >>> Fuzz.token_set_ratio ("Fuzzy is a Bear", "fuzzy Fuzzy is a bear ") 100
Gitclone git://github.com/seatgeek/fuzzywuzzy.git fuzzywuzzy cdfuzzywuzzy pythonsetup.pyinstall >>> Fromfuzzywuzzyimportfuzz >>> fromfuzzywuzzyimportprocess simpleratio >>> fuzz.ratio ("This is a test"
, "This is a test!")
Partialratio >>> Fuzz.partial_ratio ("This are a test", "This is a test!") Tokensortratio >>> fuzz.ratio ("Fuzzy Wuzzy is a bear", "Wuzzy Fuzzy is a Bear") >>> fuzz. Token_sort_ratio ("Fuzzy Wuzzy is a bear", "Wuzzy Fuzzy is a Bear") Tokensetratio >>> Fuzz.token_sort_ Ratio ("Fuzzy is a Bear", "Fuzzy fuzzy is a Bear") >>> Fuzz.token_set_ratio ("Fuzzy is a Bear", "Fuzzy Fu" Zzy is a bear ") 100
Google-diff-match-patch
Import diff match patch texta = "The Cat in the Red Hat" TEXTB = "The feline in the Blue Hat"
DMP = diff match Patch.diff match patch () #create a diff match patch object diffs = Dmp.diff Main (texta, TEXTB) # all ' Diff ' Jobs start with invoking diff main ()
D value = Dmp.diff Levenshtein (diffs) print D_value
Maxlenth = max (len (texta), Len (TEXTB)) print float (d_value)/float (maxlenth)
Similarity = (1-float (d_value)/float (maxlenth)) * Print similarity
importdiff_match_patch texta = "The Cat in the Red Hat" TEXTB = "The feline in the Blue hat" DMP = DIFF_MATCH_PATC H.diff_match_patch () #create a Diff_match_patch object diffs = Dmp.diff_main (texta, TEXTB) # All ' diff ' jobs start with invoking Diff_main () D_value = Dmp.diff_levenshtein (diffs) Printd_value m Axlenth = max (len (texta), Len (TEXTB)) printfloat (D_value)/float (maxlenth) similarity = (1-float (d_value)/float ( Maxlenth)) * Printsimilarity
Title2
The second method installs the reference blog: http://blog.csdn.net/TH_NUM/article/details/77095177 install pip install Python-levenshtein, an error occurred: Microsoft Visual C + + 14.0 is required
The error is mainly due to the use of the PIP install "third party library name" To install the Third-party libraries that you need.
Solution:
Be sure to install a Third-party library that corresponds to your version of Windows and the Python version
Download the required Third-party libraries here: http://www.lfd.uci.edu/~gohlke/pythonlibs
Installation steps:
First: Download the PYTHON_LEVENSHTEIN-0.12.0-CP36-CP36M-WIN_AMD64.WHL for the version on the website
then: Then switch to the drop position on the console and enter pip install PYTHON_LEVENSHTEIN-0.12.0-CP36-CP36M-WIN_AMD64.WHL
It was finished ...