Editing distance algorithm detailed: Levenshtein distance algorithm

Source: Internet
Author: User

Algorithm Fundamentals: Suppose we can use d[I, j] steps (you can use a two-dimensional array to hold this value), representing the minimum number of steps required to convert a string s[1...i] to a string T [1...J], then, in the most basic case, that is, at I equals 0 o'clock, that is, the string s is empty, then the corresponding d[0 , j] is to increase the J characters, so that s into T, at J equals 0 o'clock, that is, the string T is empty, then the corresponding d[i,0] is to reduce the I character, so that s into T.

Then we consider the general situation, add a bit of the idea of dynamic planning, we want to get s[1..i] after the least number of increments, delete, or replace the operation into T[1..J], then we must be in the least number of times before the increase, delete, or replace operation, So that now the string s and the string T only need to do another operation or do not do it can complete s[1..i] to T[1..J] conversion. The so-called "before" is divided into the following three kinds of situations:

1) We can convert s[1...i] to t[1...j-1 in K operations)

2) We can convert s[1..i-1] to T[1..J in K operation)

3) We can convert s[1...i-1] to T [1...j-1] in K-step

For the 1th case, we only need to complete the match at the end of the t[j] plus s[1..i], so that a total of k+1 operations is required.

For the 2nd case, we only need to remove the s[i] at the end, and then do the k operations, so there is a total of k+1 operations required.

For the 3rd case, we only need to replace S[i] with t[j] in the end, so that s[1..i] = = T[1..J] is needed, so that a total of k+1 operations are required. And if in the 3rd case, S[i] is just equal to t[j], then we can complete this process with just k operations.

Finally, to ensure that the number of operations we get is always minimal, we can choose the minimum number of operations required to convert S[1..I] to T[1..J] from the top three cases.

Algorithm basic steps:

(1) Constructs the number of rows is a matrix of m+1 column number n+1, used to save the number of operations required to complete a transformation, the string S[1..N] into the string t[1...m] the number of operations required to perform the value of matrix[n][m];

(2) Initialize matrix first behavior 0 to N, first column 0 to M.

MATRIX[0][J] Represents the value of the 1th row of column j-1, which represents the string s[1 ... 0] The number of operations required to convert to T[1..J], it is obvious to convert an empty string to a string of length J, only the add operation of J, so the value of matrix[0][j] should be J, other values and so on.

(3) Check each of the s[i] characters from 1 to n;

(4) Check each of the s[i] characters from 1 to M;

(5) Each character of string s and string T is compared 22, if equal, the cost is 0, if not, then the cost is 1 (this cost will be used);

(6) A, if we can be in the K operation to convert s[1..i-1] to T[1..J], then we can remove the s[i], and then do the k operation, so a total of k+1 operations.

b, if we can convert s[1...i] to t[1...j-1] in K operations, that is d[i,j-1]=k, then we can add t[j] to s[1..i], so that a total of k+1 operations.

C, if we can be in the K steps to convert s[1...i-1] to T [1...j-1], then we can convert s[i] to t[j], so as to meet s[1..i] = = T[1..J], so that a total of k+1 operations. (add cost here, because if s[i] is just equal to t[j], then do not need to do the replacement operation, can meet, if not equal, then need to do another replacement operation, then need to k+1 operations)

Because we want to get the minimum number of operations, we also need to compare the number of operations of these three cases, take the minimum value as d[i,j];

D, and then repeated execution of 3,4,5,6, the final result is in d[n,m];

Graphic:

The plot process is as follows:

Step 1: Initialize the following matrix

Step 2: Starting from the first character ("J") of the source string, and comparing it to the target string from top to bottom

If two characters are equal, the smallest value is taken from the left, top, and top three positions in this position, and if not, the smallest value is taken from the left, upper, and upper left three positions in this position plus 1;

For the first time, the first character "J" of the source string is compared with the "J" of the target string, left, top, top left three take the smallest value 0, because the two characters are equal, so add 0; Then, compare "J" → "E", "J" → "R", "J" → "R", "J" → "Y" to scan the target string.

Step 3: Traverse the entire source string versus the target string:

Step 4: After the last column is scanned, the last one is the shortest editing distance:

To find the editing distance, the similarity of two strings similarity = (Max (x, y)-Levenshtein)/max (x, y), where x, Y is the length of the source string and the target string.

The core code is as follows:

  Public classLevenshteindistance {Private StaticLevenshteindistance _instance =NULL;  Public Staticlevenshteindistance Instance {Get            {                if(_instance = =NULL)                {                    return Newlevenshteindistance (); }                return_instance; }        }               Public intLowerofthree (intFirstintSecondintthird) {            intMin =First ; if(Second <min) min=second; if(Third <min) min=third; returnmin; }         Public intCompare_distance (stringSTR1,stringstr2) {            int[,] Matrix; intn =str1.            Length; intm =str2.            Length; inttemp =0; Charch1; CharCH2; inti =0; intj =0; if(n = =0)            {                returnm; }            if(M = =0)            {                returnN; } Matrix=New int[n +1, M +1];  for(i =0; I <= N; i++) {matrix[i,0] =i; }             for(j =0; J <= M; J + +) {matrix[0, j] =J; }             for(i =1; I <= N; i++) {ch1= Str1[i-1];  for(j =1; J <= M; J + +) {CH2= Str2[j-1]; if(CH1. Equals (CH2)) {temp=0; }                    Else{Temp=1; } Matrix[i, J]= Lowerofthree (Matrix[i-1, J] +1, Matrix[i, J-1] +1, Matrix[i-1, J-1] +temp); }            }                      returnMatrix[n, M]; }         Public decimalLevenshteindistancepercent (stringSTR1,stringstr2) {            intMaxlenth = str1. Length > str2. Length?str1. Length:str2.            Length; intval =compare_distance (str1, str2); return 1- (decimal) Val/Maxlenth; }    }  

Editing distance algorithm detailed: Levenshtein distance algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.