Longest common substring (continuous) problem

Source: Internet
Author: User

Transferred from: http://blog.csdn.net/steven30832/article/details/8260189


A classic problem with dynamic programming is the longest common subsequence, but here the subsequence does not require continuous, if the request sequence is continuous, we call the common substring, how should we get this string?


The simplest method is to compare, in turn, a string as the parent string, and then generate another string of all the length of the substring, in turn, to compare lookups in the string, which can be used to start from the longest substring, reduce the number of comparisons, but the complexity is still very high.


Then look at the problem again, we build a comparison matrix to compare two strings of str1 and str2


Define LCS (I,J) when str1[i] = Str2[j] LCS (I,J) = 1, otherwise equal to 0.

Example

str1 = "Bab"

str2 = "Caba"


Build a matrix

--b a B

C 0 0 0

A 0 1 0

B 1 0 1

A 0 1 0


The feature of continuous I substrings is that if str1[i] and Str2[j] are the last characters belonging to a common substring, then there must be Str1[i]=str2[j] && str1[i-1] = str2[j-1], visually from the matrix, that is, by "1" The "slash" represented by the sequence is a common substring, then the longest common substring must be the longest string of the slash "1".

So now the problem can be transformed, as long as the structure of a matrix, with n^2 time to get the matrix, and then to the matrix to find the longest "1" of the formation of the slash can be. Now, then, there is a new problem. How to quickly find the "1" composed of the longest slash it.

Using the DP idea, if str1[i] = Str2[j], then the length of the common substring containing str1[i] and str2[j] is necessarily the length of the common substring containing str1[i-1] and str2[j-1] plus 1, so now we can redefine the LCS (I,J) , which is LCS (i,j) = LCS (i-1,j-1) + 1, and vice versa, LCS (i,j) = 0. Then the matrix above becomes the following:

--b a B

C 0 0 0

A 0 1 0

B 1 0 2

A 0 2 0

Now the problem has become simpler, only need to spend n^2 time to construct such a matrix, and then spend n^2 time to find the largest value in the matrix, corresponding to the length of the longest common substring, and the maximum corresponding to the position of the corresponding character, is the longest common substring of the last character.


The algorithm can also be improved, we can find the maximum length and the work of the corresponding characters in the construction of the Matrix, while the construction side of the record current maximum length and corresponding position, so that the n^2 to save time to find.

The space can also be improved, if constructed as above, we find that when the value of the i+1 line of the matrix is calculated, the value of line I is useless, even if the longest length appears on line I, we have already recorded it with a variable. Therefore, the matrix can be reduced to a vector to process, the current value of the vector corresponds to line I, the next loop of the vector value corresponding to the i+1 line.


The code is as follows:

[CPP]  View Plain copy//   longest common substring (continuous)   LCS  //  deng chao     2012.12.4      #include  <iostream>   #include  < cstring>   using namespace std;        //   Find common substrings   //  lcs Record common strings   //  return   common substring lengths    Int lcs ( const char *str1  , int len1 , const char *str2 ,  INT&NBSP;LEN2&NBSP;,&NBSP;CHAR&NBSP;*&AMP;LCS)    {       if (NULL ==  str1 | | &NBSP;NULL&NBSP;==&NBSP;STR2)        {            return -1;  //null parameters        }               //   Compressed oldest string record vector  &nbsP     int *c = new int[len2+1];       for ( Int i = 0 ; i < len2 ; ++i)         {           c[i] = 0;        }       int max_len = 0;    // Matching lengths        int pos = 0;         //match on str2 last position        for (int i = 0 ; i <  len1 ; ++i)        {            for (INT&NBSP;J&NBSP;=&NBSP;LEN2&NBSP;;&NBSP;J&NBSP;&GT;&NBSP;0&NBSP;;&NBSP;--J)   / /update-forward traversal from backward            {          &nBsp;      if (Str1[i] == str2[j-1])                 {                    c[j] = c[j-1] + 1;                    if (C[j] > max _len)                     {                        max_len = c[j];                        pos = j-1;                   }               }                else                {                    c[j] = 0;                }           }  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.