Edit Distance--levenshtein distance__ Edit distance

Source: Internet
Author: User
first, the concept

The edit distance of the string, also known as Levenshtein distance, was presented by Vladimir Levenshtein, a Russian mathematician, in 1965.
Levenshtein distance is a string metric (string metric) that calculates the degree of difference between two strings. We can assume that the Levenshtein distance is the minimum number of times required to edit a single character (for example, modify, insert, delete) when modified from one string to another. second, the thought

Suppose there is a string A and a string B, and a string of at least the number of steps a character operation becomes B. Solution

(1) Assuming that the string a length is m and the string B length is n, a matrix DP (m+1) * (N+1) is established
(2) Ax is the first I character of String A, by is the first J character of String B, Dp[i][j] is the minimum edit distance of the string ax and by
For characters Ax[i] and by[j], there are several situations: 1 Situation 1

Suppose that string A and string B have a length of 0, such as: i=0, at this point:
For converting string A to string B, as long as you add a character to the string A, the minimum edit distance is the length of the string B, which is dp[0][j] = j
Conversely, for j=0, there are dp[i][0] = i 2) 2– modified

Suppose Ax[0:i-1] is converted to by[0:j-1] The number of character operands required dp[i-1][j-1] = opt, then for the string ax and by, if:
At this time ax[i] = By[j], then do not need to modify, namely: dp[i][j] = dp[i-1][j-1] = opt
At this time ax[i]. = By[j], then modify ax[i] to by[j], operand plus 1, i.e.: dp[i][j] = dp[i-1][j-1] +1 = opt +1 3) 3– Insert

Assuming that ax[0:i] is converted to by[0:j-1] The number of character operands required dp[i][j-1] = opt, then you only need to insert a character that is the same as by[j at the end of the string ax, and the operand adds 1, namely:
DP[I][J] = dp[i][j-1] +1 = opt +1 4) Case 3– Delete

Assuming that ax[0:i-1] is converted to by[0:j] The number of character operands required dp[i-1][j] = opt, then simply delete the last character of the string ax Ax[i] and add 1 to the operand, namely:
DP[I][J] = dp[i-1][j] +1 = opt +1

For the above scenario 2 to 3, select one of the smallest edit distances to repeat until the last position of the DP matrix, i.e. DP[-1][-1], is the minimum edit distance. third, the actual combat

Suppose there is a string a= ' Cafe ', the string b= ' coffee ', and the minimum editing distance is as follows:

- by c o F e e
Ax 0 1 2 3 4 5 6
c 1 0 1 2 3 4 5
a 2 1 1 2 3 4 5
F 3 2 2 1 2 3 4
e 4 3 3 2 2 2 3

(1) When i=0, the ax length is 0,
when j=0, by length 0, that is dp[0][0] = 0;
when i=1, by length 1, dp[0][1] = 1;
and so on, DP value is sequence length
(2) when i=1,j =1, at this time ax[1] = c,by[1] = c, at which time two strings are equal, that is,
dp[1][1] = dp[0][0] = 0
(3) When i=2,j=1, at this time ax[2] = a,by[1] = c, two strings are not equal at this time,
By table, dp[2][1] = dp[1][1] + 1 = 1, with minimum edit distance
(4) and so on, and minimum edit distance of 3, i.e.: Cafe→caffe→coffe→coffee

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.