The original: A daily walkthrough of the classic Algorithm--the fifth question string similarity
Let's look at another version of the longest common subsequence, find the string similarity (edit distance), I also said, this is a very practical algorithm, in the DNA contrast, the net
Page clustering and other aspects are useful.
One: Concept
For two strings A and B, change the string A to B through basic additions or deletions, or change B to a, the least steps we use in the process of change are called "editing distances".
For example, the following string: We through a variety of operations, after the seizure of the editing distance of 3, do not know you see it?
Second: Analysis
It may be a bit complicated and difficult to understand, we try to split the big question, "String vs string", break it down to "character vs string", then decompose
into "character vs. character".
<1> "character" vs "character"
This is the simplest case, such as the editing distance between "A" and "B" is obviously 1.
<2> "character" vs "string"
"A" to "AB" editing distance of 1, "a" and "ABA" editing distance of 2.
<3> "string" vs "string"
The editorial distance of "ABA" and "BBA" is 1, and we can draw a conclusion that "ABA" is a set of editing distances from 23 subsequence and "BBA" strings.
The minimum editing distance taken out in this case, which means that we have the problem of repeated computations, and I am seeking the editing distance of the subsequence "AB" and "BBA"
A minimum value is chosen for the editing distance between the subsequence "a" and "BBA" and "B" and "BBA", but I have already calculated the sequence A and sequence B earlier, and this repeated calculation
The problem is a bit like "Fibonacci", just to meet the "dynamic planning" in the optimal sub-structure and overlap sub-problem, so we decided to use dynamic programming to solve.
Three: Formula
As with the longest common subsequence, we use a two-dimensional array to hold the minimum editing distance for the current position of the string x and Y.
Existing two sequence X={x1,x2,x3,...xi},y={y1,y2,y3,....,yi},
Set a C[I,J]: Saves the current smallest LD of Xi and YJ.
①: When Xi = Yi, then c[i,j]=c[i-1,j-1];
②: when Xi! = Yi, then c[i,j]=min{c[i-1,j-1],c[i-1,j],c[i,j-1]};
Eventually our C[i,j] kept the smallest ld.
Four: Code
1 usingSystem;2 3 namespaceConsoleApplication24 {5 Public class Program6 {7 Static int[,] Martix;8 9 Static stringSTR1 =string. Empty;Ten One Static stringSTR2 =string. Empty; A - Static voidMain (string[] args) - { the while(true) - { -STR1 =console.readline (); - +STR2 =console.readline (); - +Martix =New int[STR1. Length +1, str2. Length +1]; A atConsole.WriteLine ("the editing distance for the string {0} and {1} is: {2}\n", str1, str2, LD ()); - } - } - - /// <summary> - ///calculating the editing distance of a string in /// </summary> - /// <returns></returns> to Public Static intLD () + { - //Initialize boundary values (ignoring boundary conditions at calculation) the for(inti =0; I <= str1. Length; i++) * { $Martix[i,0] =i;Panax Notoginseng } - the for(intj =0; J <= str2. Length; J + +) + { Amartix[0, j] =J; the } + - //the X-coordinate of the matrix $ for(inti =1; I <= str1. Length; i++) $ { - //the Y-coordinate of the matrix - for(intj =1; J <= str2. Length; J + +) the { - //Equality CasesWuyi if(Str1[i-1] = = Str2[j-1]) the { -Martix[i, j] = Martix[i-1, J-1]; Wu } - Else About { $ //take "left Front", "Top", and "left" minimum value - varTemp1 = Math.min (Martix[i-1, j], Martix[i, J-1]); - - //Get Minimum value A varmin = Math.min (Temp1, Martix[i-1, J-1]); + theMartix[i, J] = min +1; - } $ } the } the the //returns the editing distance of a string the returnmartix[str1. Length, str2. Length]; - } in } the}
The daily walkthrough of the classic Algorithm question--the fifth question string similarity