Question: If all the characters in string 1 appear in the order of the strings in another string,
Then, string 1 is called a substring of string 2.
Note that the character of a substring (string 1) must appear in string 2 consecutively.
Compile a function, enter two strings, calculate their longest public substrings, and print the longest public substrings.
For example, input two strings: bdcaba and abcbdab. Both bcba and bdab are their longest common substrings,
The output length is 4 and any substring is printed.
Analysis: finding the longest common substring (LCS) is a very classic dynamic planning question.
For the following analysis, see another blog.
Step 1. Describe a Longest Common subsequence
First introduce the nature of LCS problems: Note XM = {x0, X1 ,... Xm-1} and YN = {y0, Y1 ,..., Yn-1} is two strings,
Set zk = {z0, Z1 ,... Zk-1} is any LCS of X and Y, three properties can be obtained:
1. If the xm-1 = yn-1, then the zk-1 = xm-1 = yn-1, And the Zk-1 is an lcs of the Xm-1 and Yn-1;
2. If the xm-1 is not yn-1, then when the zk-1 is not xm-1, z is Xm-1 and Y LCs;
3. If xm-1 is less than yn-1, then when zk-1 is less than yn-1, z is the LCS of x and Yn-1;
Below is a simple proof of these properties derived from the above conditions:
1. if the zk-1 is not xm-1, then we can add the xm-1 (yn-1) to Z to get Z', so that we can get a length of X and Y is k + 1 of the Public substring Z '.
This is in conflict with Z whose length is K and LCs of X and Y. So there must be zk-1 = xm-1 = yn-1.
Since zk-1 = xm-1 = yn-1, if we delete the zk-1 (xm-1, yn-1) to get the Zk-1, Xm-1 and Yn-1, apparently the Zk-1 is a public substring of the Xm-1 and Yn-1, now we prove that the Zk-1 is Xm-1 and the Yn-1 of LCS. It is not difficult to prove it by using the reverse verification method. Suppose there is a Xm-1 and a Yn-1 with a public substring W longer than the K-1, then we add it to W to get W', then W' is the public substring of X and Y, and the length exceeds K, which is in conflict with known conditions.
2. Verify it by Reverse verification. If Z is not the Xm-1 and Y of LCS, there is a length more Than k W is the Xm-1 and Y of LCS, then W must also X and y of the public substring, in the known conditions, the maximum length of the Public substrings X and Y is K. Conflict.
3. The proof is the same as 2.
Step 2. a recursive Solution
Based on the above nature, we can come up with the following ideas:
Evaluate the two strings XM = {x0, X1 ,... Xm-1} and YN = {y0, Y1 ,..., Yn-1} LCS,
If the xm-1 = yn-1, then just get the Xm-1 and the Yn-1 of LCS, and add the xm-1 (yn-1) after it (the above properties 1 );
If the xm-1 is not yn-1, we obtain the LCS of Xm-1 and Y and the LCS of Yn-1 and x respectively, in addition, the long LCS of the two LCS is X and Y (the above properties are 2 and 3 ).
According to the above conclusions, the following formula can be obtained,
If we remember that the length of the LCS of string XI and YJ is C [I, j], we can recursively calculate C [I, j]:
/0 if I <0 or j <0
C [I, j] = C [I-1, J-1] + 1 if I, j> = 0 and xi = XJ
/MAX (C [I, J-1], C [I-1, J] If I, j> = 0 and xi = XJ
The above formula is not difficult to obtain using recursive functions. Naturally, we can see from the solution to the n-th question (question 100 of the 19th question series, such as Microsoft, v0.1) of the Fibonacci,
Direct recursion involves a lot of repeated computations. Therefore, it is more efficient to use the bottom-up and upward-loop solution.
In order to be able to use the idea of loop solution, we use a matrix (refer to lcs_length in the code at the end of the following section) to save the computed C [I, j],
When the subsequent computation requires the data, the data can be directly read from the matrix.
In addition, C [I, j] can be calculated from C [I-1, J-1], C [I, J-1] or C [I-1, J,
It is equivalent to moving one of the two in the matrix lcs_length from C [I-1, J-1], C [I, J-1] or C [I-1, J] to C [I, j],
Therefore, there are three different moving directions in the matrix: left, up, and top left. Only moving to the top left indicates that one character in LCS is found.
So we need to use another matrix (refer to lcs_direction in the code at the end of the following) to save the moving direction.
The following figure shows the C ++ implementation source code after the modification:
// Dynamic plan_maximum substring. cpp: defines the entry point of the console application. // # Include "stdafx. H "# include <string >#include <iostream> using namespace STD; Enum decreasedir {kinit = 0, kleft, Kup, kleftup}; void lcs_print (INT ** lcs_dirction, string pstr1, string pstr2, int row, int col); int LCS (string pstr1, string pstr2) {// If (! Pstr1 |! Pstr2) return 0; int length1 = pstr1.length (); int lengh2 = pstr2.length (); If (! Length1 |! Lengh2) return 0; int I, j; int ** lcs_length; lcs_length = (INT **) (new int [length1]); for (I = 0; I <length1; I ++) lcs_length [I] = (int *) New int [lengh2]; for (I = 0; I <length1; ++ I) for (j = 0; j <leng2; ++ J) lcs_length [I] [J] = 0; // initialize the length matrixint ** lcs_dirction; lcs_dirction = (INT **) (new int [length1]); for (I = 0; I <length1; ++ I) lcs_dirction [I] = (int *) New int [lengh2]; for (I = 0; I <length1; ++ I) for (j = 0; j <length1; ++ J) lcs_dirction [I] [J] = Kinit; // initialize dirction matrixfor (I = 0; I <length1; ++ I) {for (j = 0; j <length1; ++ J) {if (I = 0 | j = 0) {If (pstr1 [I] = pstr2 [J]) {lcs_length [I] [J] = 1; lcs_dirction [I] [J] = kleftup;} else lcs_length [I] [J] = 0;} else if (pstr1 [I] = pstr2 [J]) {lcs_length [I] [J] = lcs_length [I-1] [J-1] + 1; lcs_dirction [I] [J] = kleftup ;} else if (lcs_length [I-1] [J]> lcs_length [I] [J-1]) {lcs_length [I] [J] = lcs_length [I-1] [J]; lcs_dirction [I] [J] = Kup;} else {lcs_len Direction [I] [J] = lcs_length [I] [J-1]; lcs_dirction [I] [J] = kleft ;}} lcs_print (lcs_dirction, pstr1, pstr2, length1-1, length2-1); Return lcs_length [length1-1] [length2-1];} void lcs_print (INT ** lcs_dirction, string pstr1, string pstr2, int row, int col) {// If (pstr1 = NULL | pstr2 = NULL) return; int length1 = pstr1.length (); int length1 = pstr2.length (); if (length1 = 0 | length1 = 0 |! (Row <length1 & Col <length1) return; If (lcs_dirction [row] [col] = kleftup) {If (row> 0 & Col> 0) lcs_print (lcs_dirction, pstr1, pstr2, row-1, col-1); printf ("% C", pstr1 [row]);} else if (lcs_dirction [row] [col] = kleft) {If (COL> 0) lcs_print (lcs_dirction, pstr1, pstr2, row, col-1 );} else if (lcs_dirction [row] [col] = Kup) {If (row> 0) lcs_print (lcs_dirction, pstr1, pstr2, row-1, col );}} int _ tmain (INT argc, _ tchar * argv []) {string str1 = "bdcaba"; // char str1 [] = {'B', 'D ', 'C', 'A', 'B', 'A'}; string str2 = "abcbdab"; // char str2 [] = {'A', 'B ', 'C', 'B', 'D', 'A', 'B'}; cout <"the largest substring is:" <Endl; int length = LCS (str1, str2); cout <Endl <"Maximum substring length:" <length <Endl; int K = 0; cin> K; return 0 ;}
Program running: