Reference: V_july_v
the longest common sub-sequence definition:
Note the difference between the longest common substring (longest commonsubstring) and the longest common subsequence (Longestcommon subsequence, LCS): a substring (Substring) is a contiguous part of a string, A subsequence (subsequence) is a new sequence obtained by removing any element from a sequence without altering the order of the sequence, or, more simply, the position of the character of the former (substring) must be continuous, and the latter (the subsequence LCS). For example, the longest common substring of the string ACDFG with AKDFC is DF, and their longest common subsequence is ADF. LCS can be solved by using dynamic programming.
LCS Problem Solving ideas (DP algorithm):
1, in fact, the longest common sub-sequence problem also has the best sub-structure properties.
Remember:
Xi=﹤x1,?,xi﹥ is the first I character (1≤i≤m) (prefix) of the x sequence
Yj=﹤y1,?,yj﹥ is the first J character (1≤j≤n) (prefix) of the y sequence
Assume Z=﹤z1,?,zk﹥∈lcs (X, Y).
If Xm=yn(the last character is the same), it is not difficult to prove with contradiction that the character must be the last character of any of the longest common subsequence Z (set length k) of X and Y, that is, ZK = XM = yn and there is obviously a prefix of Zk-1∈lcs (Xm-1, Yn-1) that is Z Zk-1 is the longest common subsequence of Xm-1 and Yn-1 . At this point, the problem is attributed to the Xm-1 and Yn-1 LCS (thelength of the LCS (X, Y) equals the length of the LCS (Xm-1, Yn-1) plus 1).
If Xm≠yn, it is also not difficult to prove with contradiction: either Z∈lcs (Xm-1, Y), or Z∈lcs (X, Yn-1). Since ZK≠XM and Zk≠yn have at least one of them to be established, if ZK≠XM has Z∈lcs (Xm-1, Y), similarly, Zk≠yn (X, Z∈lcs). At this point, the problem is attributed to Xm-1 and y LCS and X and Yn-1 LCS. The length of the LCS (x, y) is: Max{lcs (Xm-1, y), length of LCS (X, Yn-1)}.
Because the length of the LCS (Xm-1, Y) and the length of the LCS (X, Yn-1) are not independent of each other in the case of Xm≠yn : Both require the length of the LCS (xm-1,yn-1). The two other sequences of LCS contain two sequence prefixes of LCS, so the problem has the optimal substructure properties considering the dynamic programming method.
In other words, to solve this LCS problem, you ask for three things:1, LCS (xm-1,yn-1) +1,2, LCS (Xm-1,y), LCS (x,yn-1),3, max{ LCS (Xm-1,y), LCS (x,yn-1)}.
2. The structure of the longest common subsequence is indicated as follows:
Set sequence x=<x1, X2, ..., xm> and y=<y1, Y2, ..., yn>, one of the longest common subsequence z=<z1, Z2, ..., zk>, then:
- If Xm=yn, then Zk=xm=yn and Zk-1 are the longest common subsequence of Xm-1 and Yn-1;
- If Xm≠yn and ZK≠XM, then Z is the longest common subsequence of Xm-1 and y;
- If Xm≠yn and Zk≠yn, Z is the longest common sub-sequence of x and Yn-1.
Among them xm-1=<x1, x2, ..., xm-1>,yn-1=<y1, y2, ..., yn-1>,zk-1=<z1, Z2, ..., zk-1>.
3, for example, set the given two sequences for x=<a,b,c,b,d,a,b> and y=<b,d,c,a,b,a>. As shown in the following:
To understand this figure, the LCS algorithm is almost understood:
<span style= "Color:rgb (51, 51, 51); > </span><span style= "color: #ff0000;" > if (Str1.charat (i-1) ==str2.charat (j-1)) {dp[i][j]=dp[i-1][j-1]+1;} Else{dp[i][j]=math.max (Dp[i-1][j], dp[i][j-1]);} </span>
Code implementation:
public class lcs{public static void Main (string[] args) {//Set string length int substringLength1 = 20; int substringLength2 = 20; The specific size can be set by itself//randomly generated string x = Getrandomstrings (substringLength1); String y = getrandomstrings (substringLength2); Long startTime = System.nanotime (); Constructs a two-dimensional array to record sub-problems x[i] and y[i] The length of the LCS int[][] opt = new Int[substringlength1 + 1][substringlength2 + 1]; Dynamic programming calculates all sub-problems for (int i = substringlength1-1; I >= 0; i--) {for (int j = substringlength2-1 ; J >= 0; j--) {if (X.charat (i) = = Y.charat (j)) Opt[i][j] = opt[i + 1][j + 1] + 1; Refer to the formula I gave above. else opt[i][j] = Math.max (Opt[i + 1][j], Opt[i][j + 1]); Refer to the formula I gave above. }} System.out.println (Opt[20][20]);}
LCS (longest common sub-sequence) and DP (Dynamic planning)