Computation of string similarity algorithm--levenshtein

Source: Internet
Author: User

0. This algorithm is simple to implement 1. Introduction of Baidu Encyclopedia:

Levenshtein distance, also known as the editing distance, refers to the minimum number of edit operations required between two strings, converted from one to another.

Permission edits include replacing one character with another character, inserting a character, and deleting a character.

The algorithm of editing distance is first put forward by Russian scientist Levenshtein, so it is called Levenshtein Distance.

2. Use

Fuzzy query

3. Implementation process A. First there are two strings, here write a simple ABC and ABEB. Think of the string as the structure below.

A is a mark, in order to facilitate the explanation, is not the contents of this table.

Abc A B C
Abe 0 1 2 3
A 1 A place
B 2
E 3
C. To calculate the value of a at

Its value depends on: 1 on the left, 1 on top, 0 in the upper left corner.

In accordance with the meaning of Levenshtein distance:

Both the above value and the value on the left require an additional 1, which will get 1+1=2.

A at the same as two A, the upper left corner of the value plus 0. This gets 0+0=0.

This is after three values, the left side of the calculation is 2, the top of the calculation is 2, the upper left corner of the calculation is 0, so a at the bottom of their inside the smallest 0.

D. So the table becomes the following
Abc A B C
Abe 0 1 2 3
A 1 0
B 2 Place b
E 3

At B will also get three values, the left side of the calculation is 3, the top is calculated as 1, at B because the corresponding characters are a, B, unequal, so the upper left corner should be on the basis of the current value plus 1, so get 1+1=2, in (3,1,2) selected the smallest value of B.

E. The table is updated

Abc A B C
Abe 0 1 2 3
A 1 0
B 2 1
E 3 At c

C After calculation: The value above is 2, the left value is 4, the upper left corner: A and E are not the same, so add 1, that is 2+1, the upper left corner is 3.

In (2,4,3), take the smallest value at C.

F. Then, in turn, push
A B C
0 1 2 3
A 1 Place a 0 D at 1 G at 2
B 2 Place B 1 E at 0 H at 1
E 3 C at 2 Place F 1 I at 1

I: For ABC and Abe there are 1 actions that need to be edited. This needs to be calculated.

At the same time, some additional information is obtained.

A: Indicates that a and a need to have 0 operations. string-like

B: Indicates that AB and a need to have 1 operations.

C: Indicates that Abe and a need to have 2 operations.

D: Indicates that A and AB need to have 1 operations.

E: Indicates that AB and AB need to have 0 operations. string-like

F: Indicates that Abe and AB need to have 1 operations.

G: Indicates that A and ABC require 2 operations.

H: Indicates that AB and ABC need to have 1 operations.

I: There are 1 operations required to represent Abe and ABC.

G. Calculation of similarity

First take the maximum value of two string length maxlen, with 1-(need to manipulate the number of maxlen), to obtain the similarity degree.

For example, ABC and Abe an operation with a length of 3, so the similarity is 1/3=0.333.

Computation of string similarity algorithm--levenshtein

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.