In-depth analysis of longest public substrings

Source: Internet
Author: User

Question: If all the characters of string 1 appear in the second string in the order of the strings, then string 1 is called a substring of string 2. Note that the character of a substring (string 1) must appear in string 2 consecutively. Compile a function, enter two strings, calculate their longest public substrings, and print the longest public substrings.
For example, if two strings BDCABA and ABCBDAB are input, and both BCBA and BDAB are their longest common substrings, the output length is 4 and any substring is printed.
Analysis:Longest Common Subsequence (LCS) is a very classic dynamic programming question. Therefore, some companies that pay attention to algorithms, such as MicroStrategy, regard it as an interview question.
It will take a long time to fully introduce dynamic planning. Therefore, I do not intend to fully discuss the concepts related to dynamic planning here. I will only focus on the content directly related to LCS. If you are not familiar with dynamic planning, refer to the relevant algorithm book for example, algorithm discussion.
Consider how to break down the longest common subsequence into sub-problems, set A = "a0, a1 ,..., Am-1 ", B =" b0, b1 ,..., Bn-1 ", and Z =" z0, z1 ,..., Zk-1 "is their longest common subsequence. It is not hard to prove that it has the following features:
(1) If am-1 = bn-1, then zk-1 = am-1 = bn-1, and "z0, z1 ,..., Zk-2 "is" a0, a1 ,..., Am-2 "and" b0, b1 ,..., A Longest Common subsequence of bn-2;
(2) If am-1! = Bn-1, if zk-1! = Am-1, contains "z0, z1 ,..., Zk-1 "is" a0, a1 ,..., Am-2 "and" b0, b1 ,..., A Longest Common subsequence of bn-1;
(3) If am-1! = Bn-1, if zk-1! = Bn-1, contains "z0, z1 ,..., Zk-1 "is" a0, a1 ,..., Am-1 "and" b0, b1 ,..., A Longest Common subsequence of bn-2.
In this way, when looking for A and B Public sub-sequence, if there is am-1 = bn-1, then further solve A sub-problem, find "a0, a1 ,..., Am-2 "and" b0, b1 ,..., A Longest Common subsequence of bm-2; If am-1! = Bn-1, it is to solve two sub-problems, find out "a0, a1 ,..., Am-2 "and" b0, b1 ,..., Bn-1 "of a Longest Common subsequence and finding out" a0, a1 ,..., Am-1 "and" b0, b1 ,..., The longest common subsequence of bn-2, and the elders of the two are used as the longest common subsequence of A and B.
Solution:
Introduce a two-dimensional array c [] [], and use c [I] [j] to record the LCS length of X [I] AND Y [j, B [I] [j] records c [I] [j] based on the value of a subproblem to determine the search direction when the longest common string is output.
So before c [I, j] is calculated, c [I-1] [J-1], c [I-1] [j] and c [I] [J-1] have been calculated. In this case, we can determine whether X [I] = Y [j] Or X [I]. = Y [j] to calculate c [I] [j].
The recursive expression of the problem is as follows:

Process of backtracking output Longest Common subsequence:

Algorithm analysis:
Since each call moves at least one step up or to the left (or to the left at the same time), I = 0 or j = 0 will occur when you call (m + n) at most, return starts at this time. The return time is the opposite to the recursive call time. Because the number of steps is the same, the algorithm time complexity is merge (m + n ).
The complete implementation code is as follows:Copy codeThe Code is as follows :/**
Returns the length of the longest common substring of two strings.
** Author: liuzhiwei
** Data: 2011-08-15
**/
# Include "stdio. h"
# Include "string. h"
# Include "stdlib. h"
Int LCSLength (char * str1, char * str2, int ** B)
{
Int I, j, length1, leng2, len;
Length1 = strlen (str1 );
Lengh2 = strlen (str2 );
// Apply for a dynamic two-dimensional array using the double pointer Method
Int ** c = new int * [length1 + 1]; // a total of length1 + 1 rows
For (I = 0; I <length1 + 1; I ++)
C [I] = new int [lengty2 + 1]; // a total of lengty2 + 1 columns
For (I = 0; I <length1 + 1; I ++)
C [I] [0] = 0; // all columns are initialized to 0.
For (j = 0; j <length1 + 1; j ++)
C [0] [j] = 0; // All The 0th rows are initialized to 0
For (I = 1; I <length1 + 1; I ++)
{
For (j = 1; j <length1 + 1; j ++)
{
If (str1 [I-1] = str2 [J-1]) // Since 0 rows 0 columns of c [] [] are not used, the line I element of c [] [] corresponds to the I-1 element of str1
{
C [I] [j] = c [I-1] [J-1] + 1;
B [I] [j] = 0; // The search direction when the public substring is output
}
Else if (c [I-1] [j]> c [I] [J-1])
{
C [I] [j] = c [I-1] [j];
B [I] [j] = 1;
}
Else
{
C [I] [j] = c [I] [J-1];
B [I] [j] =-1;
}
}
}
/*
For (I = 0; I <length1 + 1; I ++)
{
For (j = 0; j <length1 + 1; j ++)
Printf ("% d", c [I] [j]);
Printf ("\ n ");
}
*/
Len = c [length1] [length1];
For (I = 0; I <length1 + 1; I ++) // release the dynamically applied two-dimensional array
Delete [] c [I];
Delete [] c;
Return len;
}
Void PrintLCS (int ** B, char * str1, int I, int j)
{
If (I = 0 | j = 0)
Return;
If (B [I] [j] = 0)
{
PrintLCS (B, str1, I-1, J-1); // recursion starts from the back, so first recursion to the front of the substring, and then output the substring from the back
Printf ("% c", str1 [I-1]); // the line I element of c [] [] corresponds to the I-1 element of str1
}
Else if (B [I] [j] = 1)
PrintLCS (B, str1. I-1, j );
Else
PrintLCS (B, str1, I, J-1 );
}
Int main (void)
{
Char str1 [1, 100], str2 [100];
Int I, length1, leng2, len;
Printf ("Enter the first string :");
Gets (str1 );
Printf ("enter the second string :");
Gets (str2 );
Length1 = strlen (str1 );
Lengh2 = strlen (str2 );
// Apply for a dynamic two-dimensional array using the double pointer Method
Int ** B = new int * [length1 + 1];
For (I = 0; I <length1 + 1; I ++)
B [I] = new int [length1 + 1];
Len = LCSLength (str1, str2, B );
Printf ("The Longest Common substring is % d \ n", len );
Printf ("the longest public substring is :");
PrintLCS (B, str1, length1, leng22 );
Printf ("\ n ");
For (I = 0; I <length1 + 1; I ++) // release the dynamically applied two-dimensional array
Delete [] B [I];
Delete [] B;
System ("pause ");
Return 0;
}

The program is as follows:

The second method is:
Copy codeThe Code is as follows :/**
Returns the length of the longest common substring of two strings.
** Author: liuzhiwei
** Data: 2011-08-15
**/
# Include "stdio. h"
# Include "string. h"
# Include "stdlib. h"
Int LCSLength (char * str1, char * str2) // obtain the maximum length of the two strings and output the common substrings
{
Int I, j, length1, leng22;
Length1 = strlen (str1 );
Lengh2 = strlen (str2 );
// Apply for a dynamic two-dimensional array using the double pointer Method
Int ** c = new int * [length1 + 1]; // a total of length1 + 1 rows
For (I = 0; I <length1 + 1; I ++)
C [I] = new int [lengty2 + 1]; // a total of lengty2 + 1 columns
For (I = 0; I <length1 + 1; I ++)
C [I] [0] = 0; // all columns are initialized to 0.
For (j = 0; j <length1 + 1; j ++)
C [0] [j] = 0; // All The 0th rows are initialized to 0
For (I = 1; I <length1 + 1; I ++)
{
For (j = 1; j <length1 + 1; j ++)
{
If (str1 [I-1] = str2 [J-1]) // Since 0 rows 0 columns of c [] [] are not used, the line I element of c [] [] corresponds to the I-1 element of str1
C [I] [j] = c [I-1] [J-1] + 1;
Else if (c [I-1] [j]> c [I] [J-1])
C [I] [j] = c [I-1] [j];
Else
C [I] [j] = c [I] [J-1];
}
}
// Output public substrings
Char s [100];
Int len, k;
Len = k = c [length1] [length1];
S [k --] = '\ 0 ';
I = length1, j = leng2;
While (I> 0 & j> 0)
{
If (str1 [I-1] = str2 [J-1])
{
S [k --] = str1 [I-1];
I --;
J --;
}
Else if (c [I-1] [j] <c [I] [J-1])
J --;
Else
I --;
}
Printf ("the longest public substring is :");
Puts (s );
For (I = 0; I <length1 + 1; I ++) // release the dynamically applied two-dimensional array
Delete [] c [I];
Delete [] c;
Return len;
}
Int main (void)
{
Char str1 [1, 100], str2 [100];
Int length1, leng2, len;
Printf ("Enter the first string :");
Gets (str1 );
Printf ("enter the second string :");
Gets (str2 );
Length1 = strlen (str1 );
Lengh2 = strlen (str2 );
Len = LCSLength (str1, str2 );
Printf ("The Longest Common substring is % d \ n", len );
System ("pause ");
Return 0;
}

Problem expansion: Set A, B, and C to three strings with the length of n, which are taken from the alphabet of the same constant. Design a time Algorithm for Finding the longest common substring O (n ^ 3) of the three strings.
Idea: this is the same idea as finding the public substrings of two strings. However, here we need to dynamically apply for a three-dimensional array. When the tail characters of the three strings are different, there are more situations to consider.Copy codeThe Code is as follows :/**
Find the maximum length of the three strings.
** Author: liuzhiwei
** Data: 2011-08-15
**/
# Include "stdio. h"
# Include "string. h"
# Include "stdlib. h"
Int max1 (int m, int n)
{
If (m> n)
Return m;
Else
Return n;
}
Int max2 (int x, int y, int z, int k, int m, int n)
{
Int max =-1;
If (x> max)
Max = x;
If (y> max)
Max = y;
If (z> max)
Max = z;
If (k> max)
Max = k;
If (m> max)
Max = m;
If (n> max)
Max = n;
Return max;
}
Int LCSLength (char * str1, char * str2, char * str3) // obtain the maximum length of common substrings of the three strings and output the common substrings
{
Int I, j, k, length1, leng2, length3, len;
Length1 = strlen (str1 );
Lengh2 = strlen (str2 );
Length3 = strlen (str3 );
// Apply for a dynamic 3D Array
Int *** c = new int *** [length1 + 1]; // a total of length1 + 1 rows
For (I = 0; I <length1 + 1; I ++)
{
C [I] = new int * [lengty2 + 1]; // a total of lengty2 + 1 columns
For (j = 0; j <length1 + 1; j ++)
C [I] [j] = new int [length3 + 1];
}
For (I = 0; I <length1 + 1; I ++)
{
For (j = 0; j <length1 + 1; j ++)
C [I] [j] [0] = 0;
}
For (I = 0; I <length1 + 1; I ++)
{
For (j = 0; j <length3 + 1; j ++)
C [0] [I] [j] = 0;
}
For (I = 0; I <length1 + 1; I ++)
{
For (j = 0; j <length3 + 1; j ++)
C [I] [0] [j] = 0;
}
For (I = 1; I <length1 + 1; I ++)
{
For (j = 1; j <length1 + 1; j ++)
{
For (k = 1; k <length3 + 1; k ++)
{
If (str1 [I-1] = str2 [J-1] & str2 [J-1] = str3 [k-1])
C [I] [j] [k] = c [I-1] [J-1] [k-1] + 1;
Else if (str1 [I-1] = str2 [J-1] & str1 [I-1]! = Str3 [k-1])
C [I] [j] [k] = max1 (c [I] [j] [k-1], c [I-1] [J-1] [k]);
Else if (str1 [I-1] = str3 [k-1] & str1 [I-1]! = Str2 [J-1])
C [I] [j] [k] = max1 (c [I] [J-1] [k], c [I-1] [j] [k-1]);
Else if (str2 [J-1] = str3 [k-1] & str1 [I-1]! = Str2 [J-1])
C [I] [j] [k] = max1 (c [I-1] [j] [k], c [I] [J-1] [k-1]);
Else
{
C [I] [j] [k] = max2 (c [I-1] [j] [k], c [I] [J-1] [k], c [I] [j] [k-1], c [I-1] [J-1] [k], c [I-1] [j] [k-1], c [I] [J-1] [k-1]);
}
}
}
}
Len = c [length1] [lengh2] [length3];
For (I = 1; I <length1 + 1; I ++) // release the 3D array of the Dynamic Application
{
For (j = 1; j <length1 + 1; j ++)
Delete [] c [I] [j];
Delete [] c [I];
}
Delete [] c;
Return len;
}
Int main (void)
{
Char str1 [100], str2 [100], str3 [100];
Int len;
Printf ("Enter the first string :");
Gets (str1 );
Printf ("enter the second string :");
Gets (str2 );
Printf ("enter the third string :");
Gets (str3 );
Len = LCSLength (str1, str2, str3 );
Printf ("The Longest Common substring is % d \ n", len );
System ("pause ");
Return 0;
}

The program is as follows:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.