Levenshtein distance, Chinese named minimum editing distance, is designed to find out how many characters need to be changed between two strings to become consistent. The algorithm uses the algorithm strategy of dynamic programming, the problem has the optimal substructure, the minimum editing distance contains the minimum editing distance, the following formula.
Where d[i-1,j]+1 represents the string s2 insert a letter is the same as S1, d[i,j-1]+1 for the string S1 Delete a letter is the same as S2, and then when Xi=yj, no cost, so and the previous step d[i-1,j-1] the same price, otherwise +1, then D[i, J] is the smallest of the three.
Algorithm implementation (C #):
Suppose that two strings are source,target, each with a length of columnsize,rowsize, first requesting a (columnsize+1) * (rowsize+1) size matrix, then initializing the first and first columns, Matrix[i, 0]=i,matrix[0,j]=j, followed by the formula to find the other elements of the matrix, after the end, the editing distance between the two strings is Matrix[rowsize, ColumnSize], the code is as follows:
Public classStringcomparator { Public Static intLevenshteindistance (stringSourcestringtarget) { intColumnSize =source. Length; intRowsize =Target. Length; if(ColumnSize = =0) { returnrowsize; } if(Rowsize = =0) { returnColumnSize; }
int[,] matrix =New int[Rowsize +1, ColumnSize +1]; for(inti =0; I <= columnsize; i++) {matrix[0, I] =i; } for(intj =1; J <= Rowsize; J + +) {matrix[j,0] =J; }
for(inti =0; i < rowsize; i++) { for(intj =0; J < ColumnSize; J + +) { intSign ; if(Source[j]. Equals (Target[i] ) sign=0; Else Sign=1; Matrix[i+1, J +1] = Math.min (Math.min (Matrix[i, J] + sign, Matrix[i +1, j]), Matrix[i, J +1] +1); } } returnmatrix[rowsize, ColumnSize]; } Public Static floatStringsimilarity (stringSourcestringtarget) { intDistance =levenshteindistance (source, target); floatMaxLength =Math.max (source. Length, Target. Length); return(maxlength-distance)/maxLength; } }
C # Implementation of Levenshtein distance minimum editing distance algorithm