Http://zhedahht.blog.163.com/blog/static/254111742007376431815/

Question: If all the characters of string 1 appear in the second string in the order of the strings, then string 1 is called a substring of string 2. Note that the character of a substring (string 1) must appear in string 2 consecutively. Compile a function, enter two strings, calculate their longest public substrings, and print the longest public substrings.

For example, if two strings bdcaba and abcbdab are input, and both bcba and bdab are their longest common substrings, the output length is 4 and any substring is printed.

Analysis: finding the longest common subsequence (LCS) is a very classic dynamic programming question. Therefore, some companies that place importance on algorithms, such as microstrategy, regard it as an interview question.

It will take a long time to fully introduce dynamic planning. Therefore, I do not intend to fully discuss the concepts related to dynamic planning here. I will only focus on the content directly related to LCS. If you are not familiar with dynamic planning, refer to the relevant algorithm book for example, algorithm discussion.

First, we will introduce the nature of the LCS problem: Note X_{M = {x0, x1 ,... XM-1} and YN = {Y0, y1 ,..., YN-1} is two strings, while ZK = {z0, Z1 ,... ZK-1} is their LCS, then:}

1. If x_{M-1 = yN-1, then zK-1 = xM-1 = yN-1, and ZK-1 is XM-1 AND YN-1 LCs;2. If xM-1 =yN-1, then when ZK-1 =xM-1, z is X.LCS of m-1 and Y;3. If xM-1 =yN-1, then when ZK-1 =yIf N-1 is used, z is Y.N-1 and X LCs;}

Below is a simple proof of these properties:

1. If Z_{K-1 =xM-1 we canM-1 (YN-1) add it to Z to get Z'. In this way, a public substring Z' with the length of K + 1 is obtained for X and Y '. This is in conflict with Z whose length is K and LCs of X and Y. Therefore, there must be ZK-1 = xM-1 = yN-1.}

Since Z_{K-1 = xM-1 = yN-1, if we delete ZK-1 (xM-1, yN-1 ).K-1, XM-1 AND YN-1, apparently ZK-1 is XM-1 AND YA public substring of n-1. Now we prove that ZK-1 is XM-1 AND YN-1 LCS. It is not difficult to prove it by using the reverse verification method. Assume that XM-1 AND YN-1 has a common substring W whose length exceeds the K-1, So we add it to W to get W', then W' is the common substring of X and Y, and the length exceeds K, this is in conflict with known conditions.}

2. Verify it by Reverse verification. Assume that Z is not X_{If an LCS of m-1 and Y exists, a W with a length greater than K is XFor the LCS of m-1 and Y, W must also be the public substrings of X and Y. In the known conditions, the maximum length of the Public substrings of X and Y is K. Conflict.}

3. The proof is the same as 2.

With the above properties, we can come up with the following idea: find two strings x_{M = {x0, x1 ,... XM-1} and YN = {Y0, y1 ,..., YN-1} LCS, if XM-1 = yN-1, then you only need to obtain xM-1 AND YN-1 LCS, and add xM-1 (YN-1). If xM-1 =yN-1, we obtain xLCs and Y of m-1 and YN-1 and x lcs, and the long LCS of the two LCS are X and Y.}

If we record string x_{I and YThe length of LCS in J is C [I, j]. We can recursively calculate C [I, j]:}

/0 if I <0 or j <0

C [I, j] = C [I-1, J-1] + 1 if I, j> = 0 and X_{I = xJ\ Max (C [I, J-1], C [I-1, J] If I, j> = 0 and XI =xJ}

The above formula is not difficult to obtain using recursive functions. However, from the analysis of the first n items (16th questions in this exam series) of the fiber ACCI, we know that there will be a lot of repeated computations in direct recursion, and the efficiency of solving them through bottom-up loops is higher.

In order to be able to use the idea of loop solution, we use a matrix (refer to the lcs_length in the Code) to save the calculated C [I, j], when the subsequent computation requires the data, the data can be directly read from the matrix. In addition, C [I, j] can be calculated from C [I-1, J-1], C [I, J-1] or C [I-1, J, it is equivalent to moving one of the two in the matrix lcs_length from C [I-1, J-1], C [I, J-1] or C [I-1, J] to C [I, j], therefore, there are three different moving directions in the matrix: left, up, and top left. Only moving to the top left indicates that one character in LCS is found. Therefore, we need to use another matrix (refer to the lcs_direction in the Code) to save the moving direction.

The reference code is as follows:

#include "string.h"// directions of LCS generationenum decreaseDir {kInit = 0, kLeft, kUp, kLeftUp};/////////////////////////////////////////////////////////////////////////////// Get the length of two strings' LCSs, and print one of the LCSs// Input: pStr1 - the first string// pStr2 - the second string// Output: the length of two strings' LCSs/////////////////////////////////////////////////////////////////////////////int LCS(char* pStr1, char* pStr2){ if(!pStr1 || !pStr2) return 0; size_t length1 = strlen(pStr1); size_t length2 = strlen(pStr2); if(!length1 || !length2) return 0; size_t i, j; // initiate the length matrix int **LCS_length; LCS_length = (int**)(new int[length1]); for(i = 0; i < length1; ++ i) LCS_length[i] = (int*)new int[length2]; for(i = 0; i < length1; ++ i) for(j = 0; j < length2; ++ j) LCS_length[i][j] = 0; // initiate the direction matrix int **LCS_direction; LCS_direction = (int**)(new int[length1]); for( i = 0; i < length1; ++ i) LCS_direction[i] = (int*)new int[length2]; for(i = 0; i < length1; ++ i) for(j = 0; j < length2; ++ j) LCS_direction[i][j] = kInit; for(i = 0; i < length1; ++ i) { for(j = 0; j < length2; ++ j) { if(i == 0 || j == 0) { if(pStr1[i] == pStr2[j]) { LCS_length[i][j] = 1; LCS_direction[i][j] = kLeftUp; } else LCS_length[i][j] = 0; } // a char of LCS is found, // it comes from the left up entry in the direction matrix else if(pStr1[i] == pStr2[j]) { LCS_length[i][j] = LCS_length[i - 1][j - 1] + 1; LCS_direction[i][j] = kLeftUp; } // it comes from the up entry in the direction matrix else if(LCS_length[i - 1][j] > LCS_length[i][j - 1]) { LCS_length[i][j] = LCS_length[i - 1][j]; LCS_direction[i][j] = kUp; } // it comes from the left entry in the direction matrix else { LCS_length[i][j] = LCS_length[i][j - 1]; LCS_direction[i][j] = kLeft; } } } LCS_Print(LCS_direction, pStr1, pStr2, length1 - 1, length2 - 1); return LCS_length[length1 - 1][length2 - 1];} /////////////////////////////////////////////////////////////////////////////// Print a LCS for two strings// Input: LCS_direction - a 2d matrix which records the direction of // LCS generation// pStr1 - the first string// pStr2 - the second string// row - the row index in the matrix LCS_direction// col - the column index in the matrix LCS_direction/////////////////////////////////////////////////////////////////////////////void LCS_Print(int **LCS_direction, char* pStr1, char* pStr2, size_t row, size_t col){ if(pStr1 == NULL || pStr2 == NULL) return; size_t length1 = strlen(pStr1); size_t length2 = strlen(pStr2); if(length1 == 0 || length2 == 0 || !(row < length1 && col < length2)) return; // kLeftUp implies a char in the LCS is found if(LCS_direction[row][col] == kLeftUp) { if(row > 0 && col > 0) LCS_Print(LCS_direction, pStr1, pStr2, row - 1, col - 1); // print the char printf("%c", pStr1[row]); } else if(LCS_direction[row][col] == kLeft) { // move to the left entry in the direction matrix if(col > 0) LCS_Print(LCS_direction, pStr1, pStr2, row, col - 1); } else if(LCS_direction[row][col] == kUp) { // move to the up entry in the direction matrix if(row > 0) LCS_Print(LCS_direction, pStr1, pStr2, row - 1, col); }}

Extension: if the question is changed to the longest common substring of the two strings, how can this problem be solved? The definition of a substring is similar to that of a substring, but must be continuously distributed in other strings. For example, the longest common strings of bdcaba and abcbdab include BD and AB, and their lengths are both 2.