The installation package for Python similarity detection __python

Source: Internet
Author: User
Tags diff git clone
Installing the Python-levenshtein module

Pip Install Python-levenshtein

Using the Python-levenshtein module

Import Levenshtein

Algorithm description

1). levenshtein.hamming (str1, STR2)
Calculates the Hamming distance. Requirements str1 and str2 must be of the same length. is to describe the number of different characters in the corresponding position between the two equal long strings.

2). Levenshtein.distance (str1, STR2)
Calculates the edit distance (also known as the Levenshtein distance). is to describe the number of operations that are converted from one string to another, including inserting, deleting, and replacing.
The algorithm realizes the reference dynamic planning arrangement.

3). Levenshtein.ratio (str1, STR2)
Calculate Levinsteinby. Calculates the formula R = (sum-ldist)/sum, where sum refers to the sum of the lengths of the str1 and str2 strings, Ldist is the class edit distance
Note: The class editing distance here is not 2 of the edit distance, 2 of three operations in each operation +1, and here, delete, insert still +1, but replace +2
The purpose of this design: ratio (' A ', ' C '), sum=2, calculated as 2 (2-1)/2 = 0.5, ' A ', ' C ' does not overlap, obviously not cost-effective, but the replacement operation of +2, you can solve this problem.

4). Levenshtein.jaro (S1, S2)
Calculate Jaro Distance,

Where the M is S1, the S2 match length, when a position is considered to match when the position character is the same, or in no more than

T is half the number of exchanges.

5.) Levenshtein.jaro_winkler (s 1, s 2)
Calculate Jaro–winkler Distance:

Import Levenshtein Error: Importerror:no module named Levenshtein

So go: Python-levenshtein download source to install (in http://www.lfd.uci.edu/~gohlke/pythonlibs/#python-levenshtein actually have compiled EXE), The first time the installation of the error: Error:unable to find Vcvarsall.bat, but in fact I was installed VS2010, so perform the following steps normal installation:

1. Set the environment variable to execute:

SET vs90comntools=%vs100comntools%

2. To install again:

setup.py Install

Can be normal, compiled, installed.

$ python >>> import Levenshtein >>> help (Levenshtein.ratio) ratio (...)

    Compute similarity of two strings. Ratio (string1, string2) The similarity is a number between 0 and 1, it ' s usually equal or somewhat higher than di Fflib.

    Sequencematcher.ratio (), becuase it ' s based on real minimal edit distance.
    Examples: >>> ratio (' Hello world! ', ' Holly grail! ') 0.58333333333333337 >>> ratio (' Brian ', ' Jesus ') 0.0 >>> help (levenshtein.distance) distance (..
    .)

    Compute absolute Levenshtein distance of two strings. Distance (string1, string2) examples (it s hard to spell Levenshtein): >>> correctly (' distance n ', ' Lenvinsten ') 4 >>> distance (' Levenshtein ', ' Levensthein ') 2 >>> distance (' Levenshte In ', ' Levenshten ') 1 >>> distance (' Levenshtein ', ' Levenshtein ') 0

Difflib Library
>>> import difflib

>>> difflib. Sequencematcher (None, ' abcde ', ' abcde '). Ratio ()
1.0

>>> difflib. Sequencematcher (None, ' abcde ', ' zbcde '). Ratio ()
0.80000000000000004

>>> difflib. Sequencematcher (None, ' abcde ', ' zyzzy '). Ratio ()
0.0

Fuzzywuzzy

git clone git://github.com/seatgeek/fuzzywuzzy.git fuzzywuzzy cd fuzzywuzzy python setup.py install >>> from Fuz Zywuzzy import fuzz >>> from Fuzzywuzzy import process simple Ratio >>> fuzz.ratio ("This is a test", "t
    He is a test! ")
    Partial Ratio >>> Fuzz.partial_ratio ("This are a test", "This is a test!") Token Sort Ratio >>> fuzz.ratio ("Fuzzy Wuzzy is a bear", "Wuzzy Fuzzy is a Bear") >>> Fuzz . Token_sort_ratio ("Fuzzy Wuzzy is a bear", "Wuzzy Fuzzy is a Bear") token Set ratio >>> fuzz.token_sor T_ratio ("Fuzzy is a Bear", "Fuzzy fuzzy is a Bear") >>> Fuzz.token_set_ratio ("Fuzzy is a Bear", "fuzzy Fuzzy is a bear ") 100
Gitclone git://github.com/seatgeek/fuzzywuzzy.git fuzzywuzzy cdfuzzywuzzy pythonsetup.pyinstall >>> Fromfuzzywuzzyimportfuzz >>> fromfuzzywuzzyimportprocess simpleratio >>> fuzz.ratio ("This is a test"
    , "This is a test!")
    Partialratio >>> Fuzz.partial_ratio ("This are a test", "This is a test!") Tokensortratio >>> fuzz.ratio ("Fuzzy Wuzzy is a bear", "Wuzzy Fuzzy is a Bear") >>> fuzz. Token_sort_ratio ("Fuzzy Wuzzy is a bear", "Wuzzy Fuzzy is a Bear") Tokensetratio >>> Fuzz.token_sort_ Ratio ("Fuzzy is a Bear", "Fuzzy fuzzy is a Bear") >>> Fuzz.token_set_ratio ("Fuzzy is a Bear", "Fuzzy Fu" Zzy is a bear ") 100

Google-diff-match-patch

Import diff match patch texta = "The Cat in the Red Hat" TEXTB = "The feline in the Blue Hat"

DMP = diff match Patch.diff match patch () #create a diff match patch object diffs = Dmp.diff Main (texta, TEXTB) # all ' Diff ' Jobs start with invoking diff main ()

D value = Dmp.diff Levenshtein (diffs) print D_value

Maxlenth = max (len (texta), Len (TEXTB)) print float (d_value)/float (maxlenth)

Similarity = (1-float (d_value)/float (maxlenth)) * Print similarity

importdiff_match_patch texta = "The Cat in the Red Hat" TEXTB = "The feline in the Blue hat"   DMP = DIFF_MATCH_PATC H.diff_match_patch ()    #create a Diff_match_patch object diffs = Dmp.diff_main (texta, TEXTB)     # All ' diff ' jobs start with invoking Diff_main ()   D_value = Dmp.diff_levenshtein (diffs) Printd_value   m Axlenth = max (len (texta), Len (TEXTB)) printfloat (D_value)/float (maxlenth)   similarity = (1-float (d_value)/float ( Maxlenth)) * Printsimilarity 
Title2
The second method installs the reference blog: http://blog.csdn.net/TH_NUM/article/details/77095177 install pip install Python-levenshtein, an error occurred: Microsoft Visual C + + 14.0 is required The error is mainly due to the use of the PIP install "third party library name" To install the Third-party libraries that you need. Solution: Be sure to install a Third-party library that corresponds to your version of Windows and the Python version
Download the required Third-party libraries here: http://www.lfd.uci.edu/~gohlke/pythonlibs
Installation steps: First: Download the PYTHON_LEVENSHTEIN-0.12.0-CP36-CP36M-WIN_AMD64.WHL for the version on the website then: Then switch to the drop position on the console and enter pip install PYTHON_LEVENSHTEIN-0.12.0-CP36-CP36M-WIN_AMD64.WHL It was finished ...


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.