[Question 100] Question 56th Longest Common subsequence [essence of Dynamic Planning]

Source: Internet
Author: User

I. Question

If String 1'sAll characters in the string orderIf it appears in another string, it is a substring called string 2.


Note that the character of a substring (string 1) must appear in string 2 consecutively.
Compile a function and enterTwo strings, Find theirLongest public substringAnd print outLongest public substring.
For example, if two strings bdcaba and abcbdab are input, and both bcba and bdab are their longest common substrings, the output length is 4 and any substring is printed.

 

Ii. Analysis

Longest Common subsequence (LCS) is a classic dynamic programming question.


The nature of LCS problems: Note XM = {x0, X1 ,... Xm-1} and YN = {y0, Y1 ,..., Yn-1} is two strings, and zk = {z0, Z1 ,... Zk-1} is their LCS, then:

1.
If the xm-1 = yn-1, then the zk-1 = xm-1 = yn-1, And the Zk-1 is the Xm-1 and Yn-1 of LCs;
2. If xm-1 is less than yn-1, then when zk-1 is less than xm-1
Z is the Xm-1 and Y LCs;
3. If xm-1 is less than yn-1, then when zk-1 is less than yn-1
Z is the Yn-1 and the LCS of X;

 

Below is a simple proof of these properties:
1. if the zk-1 is not xm-1, then we can add the xm-1 (yn-1) to Z to get Z', so that we can get a length of X and Y is k + 1 of the Public substring Z '. This is in conflict with Z whose length is K and LCs of X and Y. So there must be zk-1 = xm-1 = yn-1.
Since zk-1 = xm-1 = yn-1, if we delete the zk-1 (xm-1, yn-1) to get the Zk-1, Xm-1 and Yn-1, apparently the Zk-1 is a public substring of the Xm-1 and Yn-1, now we prove that the Zk-1 is Xm-1 and the Yn-1 of LCS. It is not difficult to prove it by using the reverse verification method. Suppose there is a Xm-1 and a Yn-1 with a public substring W longer than the K-1, then we add it to W to get W', then W' is the public substring of X and Y, and the length exceeds K, which is in conflict with known conditions.
2. Verify it by Reverse verification. If Z is not the Xm-1 and Y of LCS, there is a length more Than k W is the Xm-1 and Y of LCS, then W must also X and y of the public substring, in the known conditions, the maximum length of the Public substrings X and Y is K. Conflict.
3. The proof is the same as 2.

 

With the above nature, we can come up with the following ideas:

Evaluate the two strings XM = {x0, X1 ,... Xm-1} and YN = {y0, Y1 ,..., Yn-1} LCS,

If the xm-1 = yn-1, then just get the Xm-1 and Yn-1 LCS, and then add the xm-1 (yn-1;

If the xm-1 is not yn-1, we obtain the lcs of the Xm-1 and Y and the lcs of the Yn-1 and x respectively, and the LCS of the two LCS is longer for the X and Y.

 

If we remember that the length of the LCS of string XI and YJ is C [I, j], we can recursively calculate C [I, j]:

/0 if I <0 or j <0
C [I, j] = C [I-1, J-1] + 1 if I, j> = 0 and xi = XJ
/MAX (C [I, J-1], C [I-1, J]) if I, j> = 0 and xi = XJ

 

The above formula is not difficult to obtain using recursive functions. However, from the analysis of the n-th item (100 questions in the 19th question series such as Microsoft) of Fibonacci, we know that there will be a lot of repeated computations in direct recursion, it is more efficient to use the bottom-up loop solution.

 

To be able to useLoop SolutionWe use a matrix (refer to the lcs_length in the Code) to save the computed C [I, j], when the subsequent computation requires the data, the data can be directly read from the matrix. In addition, C [I, j] can be obtained from C [I-1, J-1]
C [I, J-1] or C [I-1, J] Three direction calculation, equivalent to in the matrix lcs_length is from C [I-1, J-1], C [I, one of the J-1] Or one of C [I-1, J] moves to C [I, j], so there are three different moving directions in the matrix: left, up, and top left, only moving to the top left indicates that one character in LCS is found. Therefore, we need to use another matrix (refer to the lcs_direction in the Code) to save the moving direction.

 

3. Two methods are used in the code.

# Include "stdio. H "int C [10] [10]; // It indicates that X has seven elements, Y has six elements (the number of the oldest sequence is saved) char B [8] [7]; // record how to output the largest headers void lcs_length (char X [], char y [], int M, int N) {for (INT I = 1; I <= m; I ++) // initialize a column C [I] [0] = 0 when y has no elements; for (Int J = 1; j <= N; j ++) C [0] [J] = 0; For (int K = 1; k <= m; k ++) {for (int l = 1; L <= N; l ++) {If (X [k] = Y [l]) {c [k] [l] = C [k-1] [L-1] + 1; B [k] [l] = '! '; // Top left} else if (C [k-1] [l]> = C [k] [L-1]) {c [k] [l] = C [k-1] [l]; B [k] [l] = '@'; // go up} else {C [k] [l] = C [k] [L-1]; B [k] [l] = '#'; // left direction }}} void print_lcs (char B [] [7], char X [], int I, Int J) {if (I = 0 | j = 0) return; If (B [I] [J] = '! ') // Look for {print_lcs (B, X, I-1, J-1); printf ("% C", X [I]);} else if (B [I] [J] = '@') // look up print_lcs (B, X, I-1, J); else print_lcs (B, X, I, j-1);} void no_ B _print (char X [], char y [], int M, int N) // No need to mark only C output {int I = m; // 7 Int J = N; // 6 char a [m]; int K = 0; while (I> 0 & J> 0) {If (X [I] = Y [J]) {A [k ++] = x [I]; // printf ("% C ", X [I]); // be sure to write % C. Do not write % s I --; j --;} else if (C [I-1] [J]> = C [I] [J-1]) // large I --; else j --;} For (INT L = K; L> = 0; l --) // prints printf ("% C", a [l]) in positive order;} int Lookup (char X [], char y [], int I, Int J) {If (C [I] [J]>-1) return C [I] [J]; if (I = 0 | j = 0) C [I] [J] = 0; else {If (X [I] = Y [J]) c [I] [J] = Lookup (X, Y, I-1, J-1) + 1; else {If (Lookup (X, Y, I-1, j)> = Lookup (x, y, I, J-1) C [I] [J] = Lookup (X, Y, I-1, J ); else C [I] [J] = Lookup (x, y, I, J-1);} return C [I] [J];} void memor_lcs (char X [], char y [], int M, int N) {for (INT I = 1; I <= m; I ++) // mark a number (not passed later For (Int J = 1; j <= N; j ++) C [I] [J] =-1; lookup (X, Y, m, n);} int main () {char X [] = {'@', 'A', 'B', 'C ', 'B', 'D', 'A', 'B'}; char y [] = {'@', 'B', 'D', 'C', 'C ', 'A', 'B', 'A'}; // lcs_length (X, Y, 7, 6); memor_lcs (X, Y, 7, 6 ); // memo // print_lcs (B, X, 7, 6); // output no_ B _print (X, Y, 7, 6) by marking the array of B ); // only use C to output printf ("\ n"); For (INT I = 0; I <8; I ++) {for (Int J = 0; j <7; j ++) printf ("% 2D", C [I] [J]); // is the selection of special symbols incorrect? Printf ("\ n");} // printf ("% C \ n", C [2] [1]); // is the selection of special symbols incorrect? Return 0 ;}



Extension: if the question is changed to the longest common substring of the two strings, how can this problem be solved? The definition of a substring is similar to that of a substring, but must be continuously distributed in other strings. For example, the longest common strings of bdcaba and abcbdab include BD and AB, and their lengths are both 2.

Refer to a blog post on this topic: 24 classic algorithm series: 3. A dynamic planning algorithm solves an interview question
Http://blog.csdn.net/v_JULY_v/archive/2010/12/31/6110269.aspx
 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.