first, the concept
The edit distance of the string, also known as Levenshtein distance, was presented by Vladimir Levenshtein, a Russian mathematician, in 1965.
Levenshtein distance is a string metric (string metric) that calculates the degree of difference between two strings. We can assume that the Levenshtein distance is the minimum number of times required to edit a single character (for example, modify, insert, delete) when modified from one string to another. second, the thought
Suppose there is a string A and a string B, and a string of at least the number of steps a character operation becomes B. Solution
(1) Assuming that the string a length is m and the string B length is n, a matrix DP (m+1) * (N+1) is established
(2) Ax is the first I character of String A, by is the first J character of String B, Dp[i][j] is the minimum edit distance of the string ax and by
For characters Ax[i] and by[j], there are several situations: 1 Situation 1
Suppose that string A and string B have a length of 0, such as: i=0, at this point:
For converting string A to string B, as long as you add a character to the string A, the minimum edit distance is the length of the string B, which is dp[0][j] = j
Conversely, for j=0, there are dp[i][0] = i 2) 2– modified
Suppose Ax[0:i-1] is converted to by[0:j-1] The number of character operands required dp[i-1][j-1] = opt, then for the string ax and by, if:
At this time ax[i] = By[j], then do not need to modify, namely: dp[i][j] = dp[i-1][j-1] = opt
At this time ax[i]. = By[j], then modify ax[i] to by[j], operand plus 1, i.e.: dp[i][j] = dp[i-1][j-1] +1 = opt +1 3) 3– Insert
Assuming that ax[0:i] is converted to by[0:j-1] The number of character operands required dp[i][j-1] = opt, then you only need to insert a character that is the same as by[j at the end of the string ax, and the operand adds 1, namely:
DP[I][J] = dp[i][j-1] +1 = opt +1 4) Case 3– Delete
Assuming that ax[0:i-1] is converted to by[0:j] The number of character operands required dp[i-1][j] = opt, then simply delete the last character of the string ax Ax[i] and add 1 to the operand, namely:
DP[I][J] = dp[i-1][j] +1 = opt +1
For the above scenario 2 to 3, select one of the smallest edit distances to repeat until the last position of the DP matrix, i.e. DP[-1][-1], is the minimum edit distance. third, the actual combat
Suppose there is a string a= ' Cafe ', the string b= ' coffee ', and the minimum editing distance is as follows:
f
- |
by |
c |
o |
F |
e |
e |
Ax |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
c |
1 |
0 |
1 |
2 |
3 |
4 |
5 |
a |
2 |
1 |
1 |
2 |
3 |
4 |
5 |
F |
3 |
2 |
2 |
1 |
2 |
3 |
4 |
e |
4 |
3 |
3 |
2 |
2 |
2 |
3 |
(1) When i=0, the ax length is 0,
when j=0, by length 0, that is dp[0][0] = 0;
when i=1, by length 1, dp[0][1] = 1;
and so on, DP value is sequence length
(2) when i=1,j =1, at this time ax[1] = c,by[1] = c, at which time two strings are equal, that is,
dp[1][1] = dp[0][0] = 0
(3) When i=2,j=1, at this time ax[2] = a,by[1] = c, two strings are not equal at this time,
By table, dp[2][1] = dp[1][1] + 1 = 1, with minimum edit distance
(4) and so on, and minimum edit distance of 3, i.e.: Cafe→caffe→coffe→coffee