Programmer interview question selection 100 questions (20)-longest public substrings

Source: Internet
Author: User
Tags kinit

Http://zhedahht.blog.163.com/blog/static/254111742007376431815/

Question: If all the characters of string 1 appear in the second string in the order of the strings, then string 1 is called a substring of string 2. Note that the character of a substring (string 1) must appear in string 2 consecutively. Compile a function, enter two strings, calculate their longest public substrings, and print the longest public substrings.

For example, if two strings bdcaba and abcbdab are input, and both bcba and bdab are their longest common substrings, the output length is 4 and any substring is printed.

Analysis: finding the longest common subsequence (LCS) is a very classic dynamic programming question. Therefore, some companies that place importance on algorithms, such as microstrategy, regard it as an interview question.

It will take a long time to fully introduce dynamic planning. Therefore, I do not intend to fully discuss the concepts related to dynamic planning here. I will only focus on the content directly related to LCS. If you are not familiar with dynamic planning, refer to the relevant algorithm book for example, algorithm discussion.

First, we will introduce the nature of the LCS problem: Note XM = {x0, x1 ,... XM-1} and YN = {Y0, y1 ,..., YN-1} is two strings, while ZK = {z0, Z1 ,... ZK-1} is their LCS, then:

1. If xM-1 = yN-1, then zK-1 = xM-1 = yN-1, and ZK-1 is XM-1 AND YN-1 LCs;
2. If xM-1 =yN-1, then when ZK-1 =xM-1, z is X.LCS of m-1 and Y;
3. If xM-1 =yN-1, then when ZK-1 =yIf N-1 is used, z is Y.N-1 and X LCs;

Below is a simple proof of these properties:

1. If ZK-1 =xM-1 we canM-1 (YN-1) add it to Z to get Z'. In this way, a public substring Z' with the length of K + 1 is obtained for X and Y '. This is in conflict with Z whose length is K and LCs of X and Y. Therefore, there must be ZK-1 = xM-1 = yN-1.

Since ZK-1 = xM-1 = yN-1, if we delete ZK-1 (xM-1, yN-1 ).K-1, XM-1 AND YN-1, apparently ZK-1 is XM-1 AND YA public substring of n-1. Now we prove that ZK-1 is XM-1 AND YN-1 LCS. It is not difficult to prove it by using the reverse verification method. Assume that XM-1 AND YN-1 has a common substring W whose length exceeds the K-1, So we add it to W to get W', then W' is the common substring of X and Y, and the length exceeds K, this is in conflict with known conditions.

2. Verify it by Reverse verification. Assume that Z is not XIf an LCS of m-1 and Y exists, a W with a length greater than K is XFor the LCS of m-1 and Y, W must also be the public substrings of X and Y. In the known conditions, the maximum length of the Public substrings of X and Y is K. Conflict.

3. The proof is the same as 2.

With the above properties, we can come up with the following idea: find two strings xM = {x0, x1 ,... XM-1} and YN = {Y0, y1 ,..., YN-1} LCS, if XM-1 = yN-1, then you only need to obtain xM-1 AND YN-1 LCS, and add xM-1 (YN-1). If xM-1 =yN-1, we obtain xLCs and Y of m-1 and YN-1 and x lcs, and the long LCS of the two LCS are X and Y.

If we record string xI and YThe length of LCS in J is C [I, j]. We can recursively calculate C [I, j]:

/0 if I <0 or j <0
C [I, j] = C [I-1, J-1] + 1 if I, j> = 0 and XI = xJ
\ Max (C [I, J-1], C [I-1, J] If I, j> = 0 and XI =xJ

The above formula is not difficult to obtain using recursive functions. However, from the analysis of the first n items (16th questions in this exam series) of the fiber ACCI, we know that there will be a lot of repeated computations in direct recursion, and the efficiency of solving them through bottom-up loops is higher.

In order to be able to use the idea of loop solution, we use a matrix (refer to the lcs_length in the Code) to save the calculated C [I, j], when the subsequent computation requires the data, the data can be directly read from the matrix. In addition, C [I, j] can be calculated from C [I-1, J-1], C [I, J-1] or C [I-1, J, it is equivalent to moving one of the two in the matrix lcs_length from C [I-1, J-1], C [I, J-1] or C [I-1, J] to C [I, j], therefore, there are three different moving directions in the matrix: left, up, and top left. Only moving to the top left indicates that one character in LCS is found. Therefore, we need to use another matrix (refer to the lcs_direction in the Code) to save the moving direction.

The reference code is as follows:

#include "string.h"// directions of LCS generationenum decreaseDir {kInit = 0, kLeft, kUp, kLeftUp};/////////////////////////////////////////////////////////////////////////////// Get the length of two strings' LCSs, and print one of the LCSs// Input: pStr1         - the first string//        pStr2         - the second string// Output: the length of two strings' LCSs/////////////////////////////////////////////////////////////////////////////int LCS(char* pStr1, char* pStr2){      if(!pStr1 || !pStr2)            return 0;      size_t length1 = strlen(pStr1);      size_t length2 = strlen(pStr2);      if(!length1 || !length2)            return 0;      size_t i, j;      // initiate the length matrix      int **LCS_length;      LCS_length = (int**)(new int[length1]);      for(i = 0; i < length1; ++ i)            LCS_length[i] = (int*)new int[length2];      for(i = 0; i < length1; ++ i)            for(j = 0; j < length2; ++ j)                  LCS_length[i][j] = 0;       // initiate the direction matrix      int **LCS_direction;      LCS_direction = (int**)(new int[length1]);      for( i = 0; i < length1; ++ i)            LCS_direction[i] = (int*)new int[length2];      for(i = 0; i < length1; ++ i)            for(j = 0; j < length2; ++ j)                  LCS_direction[i][j] = kInit;      for(i = 0; i < length1; ++ i)      {            for(j = 0; j < length2; ++ j)            {                  if(i == 0 || j == 0)                  {                        if(pStr1[i] == pStr2[j])                        {                              LCS_length[i][j] = 1;                              LCS_direction[i][j] = kLeftUp;                        }                        else                              LCS_length[i][j] = 0;                  }                  // a char of LCS is found,                   // it comes from the left up entry in the direction matrix                  else if(pStr1[i] == pStr2[j])                  {                        LCS_length[i][j] = LCS_length[i - 1][j - 1] + 1;                        LCS_direction[i][j] = kLeftUp;                  }                  // it comes from the up entry in the direction matrix                  else if(LCS_length[i - 1][j] > LCS_length[i][j - 1])                  {                        LCS_length[i][j] = LCS_length[i - 1][j];                        LCS_direction[i][j] = kUp;                  }                  // it comes from the left entry in the direction matrix                  else                  {                        LCS_length[i][j] = LCS_length[i][j - 1];                        LCS_direction[i][j] = kLeft;                  }            }      }      LCS_Print(LCS_direction, pStr1, pStr2, length1 - 1, length2 - 1);      return LCS_length[length1 - 1][length2 - 1];} /////////////////////////////////////////////////////////////////////////////// Print a LCS for two strings// Input: LCS_direction - a 2d matrix which records the direction of //                        LCS generation//        pStr1         - the first string//        pStr2         - the second string//        row           - the row index in the matrix LCS_direction//        col           - the column index in the matrix LCS_direction/////////////////////////////////////////////////////////////////////////////void LCS_Print(int **LCS_direction,                     char* pStr1, char* pStr2,                     size_t row, size_t col){      if(pStr1 == NULL || pStr2 == NULL)            return;      size_t length1 = strlen(pStr1);      size_t length2 = strlen(pStr2);      if(length1 == 0 || length2 == 0 || !(row < length1 && col < length2))            return;      // kLeftUp implies a char in the LCS is found      if(LCS_direction[row][col] == kLeftUp)      {            if(row > 0 && col > 0)                  LCS_Print(LCS_direction, pStr1, pStr2, row - 1, col - 1);            // print the char            printf("%c", pStr1[row]);      }      else if(LCS_direction[row][col] == kLeft)      {            // move to the left entry in the direction matrix            if(col > 0)                  LCS_Print(LCS_direction, pStr1, pStr2, row, col - 1);      }      else if(LCS_direction[row][col] == kUp)      {            // move to the up entry in the direction matrix            if(row > 0)                  LCS_Print(LCS_direction, pStr1, pStr2, row - 1, col);      }}

Extension: if the question is changed to the longest common substring of the two strings, how can this problem be solved? The definition of a substring is similar to that of a substring, but must be continuously distributed in other strings. For example, the longest common strings of bdcaba and abcbdab include BD and AB, and their lengths are both 2.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.