Algorithm Series note 6 (Dynamic planning-longest common sub-sequence/string LCS)

Source: Internet
Author: User

Sub-sequences require that the elements be in the same order, and the strings must be contiguous. such as Abcbdab and Bdcaba two strings, the longest common subsequence has BCBA, Bdab, and Bcab, while the longest common string is only AB and bd< continuous >. Of course, the solution here is just one, but it is usually the case that the longest common substring, subsequence, is exactly one.

Longest common sub-sequence

Law one: Poor lifting method

Check the string x all word sequences, a total of 2^m, check whether it appears in the Y string, each need O (n), time complexity is exponential.

Law II: Dynamic Programming (DP)

Place two strings x[1...m] and Y[1...N] in the x-axis and y-axis direction to get a two-dimensional array c[i,j] to record the maximum number of common subsequence of x[1...i] and Y[1...J].

when X[i]==y[j] the time C[i,j] = c[i-1,j-1]+1; when it's not equal, C[i,j] = Max{c[i-1, j], C[i,j-1]}.

Using the idea of bottom-up, so that time complexity is equal to the number of LCS independent sub-Problem O (MN), or need to repeat the calculation of sub-problems, time complexity is still exponential.

The code is as follows:

The longest common subsequence (discontinuous) requires a tag array for backtracking void lcs_sequences (const char* str1, const char* str2, int len1, int len2) {int **c = new int *[len1+1];int **b = new Int*[len1+1];int I, j;for (i = 0; i < len1+1; i++) {C[i] = new Int[len2+1];b[i] = new Int[len2+1] ;} for (i = 0; I <= len1; i + +) for (j = 0; J <= Len2; j + +) C[i][j] = 0;for (i = 1; I <= len1; i++) {for (j = 1; J <= le N2;  J + +) {if (str1[i-1] = = Str2[j-1]) {c[i][j] = c[i-1][j-1] +1;b[i][j] = 0;    From upper left}else{if (C[i-1][j] > C[i][j-1]) {c[i][j] = c[i-1][j];b[i][j] = 1;   From above}else{c[i][j] = c[i][j-1];    From left b[i][j] = 2; From top}}}}cout << "longest common subsequence length:" << c[len1][len2] << endl;//backtracking path i = len1;j = Len2;char *x = new char[ C[len1][len2]];int k = 0;/*while (i > 0 && J > 0) {if (b[i][j] = = 0)//from upper left {x[k++] = Str1[i-1];//cout &LT;&L T str1[i-1];i--;j--;} else if (b[i][j] = = 1) i--;else j--;} *///do not use tag arrays for backtracking directly using STR1 and STR2 and C[i][j] to derive results while (i > 0 && J > 0) {if (str1[i-1] = = Str2[j-1]) {x[k++] = str1[i-1];i--;j--;} else if (c[i][j] = = c[i][j-1]) j--;else i--;} cout << "The lcs_opt is:"; for (i = c[len1][len2]-1; I >= 0; i--) {cout << x[i];} cout << endl;for (i= 0; i < len1; i++) delete[] c[i];d elete []c;delete []x;}

The above annotated code, we use the tag array to track the source, of course, we can not apply the tag number, directly using C[i,j] and str1,str2 to judge, here you can save O (MN) space, but only in the spatial complexity of the constant factor improvement.

If you only ask for the length of the common string without asking what the string is, the spatial complexity can be reduced to O (Min{m,n}). a two-dimensional array is used here, but the number of rows is fixed to 2.

The code is as follows:

The longest common sub-sequence optimizes space, using two-dimensional arrays for both rows, but at this point only the length of the longest common subsequence can be obtained, and the path void swap (int **c, int len2) {for (int i = 0; i < len2; i++) {int temp = C[0][i];c[0][i] = c[1][i];c[1][i] = temp;}} void lcs_sequences_opt (const char* str1, const char* str2, int len1, int len2) {int *c[2];int i,j;for (i = 0; i < 2; i++) C[i] = new Int[len2];for (j = 0; J < Len2; J + +) C[0][j] = 0;for (i = 0; i < len1; i++) {for (j = 0; J < Len2; J + +) {if ( Str1[i] = = Str2[j]) {if (j = = 0) c[1][j] = 1;else C[1][j] = c[0][j-1] + 1;} Else{if (J = = 0) c[1][j] = C[0][j];else C[1][j] = C[0][j] > c[1][j-1]? c[0][j]: C[1][j-1];}} Swap (c, len2);           Do  not exchange directly c[1] assigned to c[0] can be}cout << "the longest common subsequence length is:" << c[0][len2-1] << Endl; for (i = 0; i < 2; i++) delete[] c[i];}


The longest common child string

The solution is to use a matrix to record the match between two characters of all the positions in two strings, or 1 if the match is 0. Then the longest diagonal 1 sequence is calculated, and its corresponding position is the position of the longest matched substring.

Optimization: When a character is matched, we do not simply assign 1 to the corresponding element, but instead assign the value of the upper-left corner element plus one. We use two tag variables to mark the position of the largest element in the matrix, in the process of matrix generation to determine whether the value of the currently generated element is the largest, thereby changing the value of the tag variable, then to the completion of the matrix, the longest matching substring of the position and length has been out.

That

when X[i]==y[j] the time C[i,j] = c[i-1,j-1]+1; when it's not equal, C[i,j] = 0.

The code is as follows:

Find the longest common substring void lcs_string (const CHAR*STR1, const char *STR2, int len1, int len2) {int **c = new Int*[len1];int I, J;int max C = 0;    Maximum value int position = 0;   Position for (i = 0; i < len1; i++) {C[i] = new Int[len2];for (j = 0; J < Len2; J + +) {if (str1[i] = = Str2[j]) {if (i = = 0 | | j = = 0) {C[i][j] = 1;} ELSE{C[I][J] = c[i-1][j-1] + 1;}} ELSE{C[I][J] = 0;} if (C[i][j] > MaxC) {maxC = C[i][j];p osition = j;}}} cout << "Maximum common substring length:" << maxC << endl;cout << "The LCS is:"; for (i = position-maxc+1; I <= position; i++) {cout << str2[i];} cout << endl;for (i = 0; i < len1; i++) delete[] c[i];d elete []c;}

the complexity of time and space O (MN) . Of course, the space complexity of the code can be optimized to O (min{m,n}), where only one-dimensional array is required, but from the back to the front of the traversal, so C[j] before the previous results , otherwise, such as DBB and AB will go wrong.

The code is as follows:

Find the longest common substring optimize spatial complexity  with one-dimensional arrays you can fix void lcs_string_opt (const CHAR*STR1, const char *STR2, int len1, int len2) {int *c = new I Nt[len2];int I, J;int maxC = 0;    Maximum value int position = 0;   Position memset (c, 0, sizeof (int) *len2), for (i = 0; i < len1; i++) {for (j = len2-1; J >= 0; j--)   //traversal from back to front such c[j] It's the last time. Otherwise, such as DBB and AB will be wrong {if (str1[i] = = Str2[j]) {if (j = = 0) c[j] = 1;else c[j] = c[j-1]+1;       The only difference}else c[j] = 0;if (C[j] > MaxC)           {MaxC = c[j];p osition = j;}}} cout << "Maximum common substring length:" << maxC << endl;cout << "The lcs_opt is:"; for (i = position-maxc+1; I <= position; i++)        //output the longest common string {cout << str2[i];} cout << endl;delete []c;}

Dynamic planning

Dynamic programming has two major characteristics, and we use the longest common sub-sequence as an example.

1 : Optimal sub-structure

This means that the optimal solution of the problem contains the optimal solution of the sub-problem.

As the longest common subsequence of x[1...i] and Y[1...J] in LCS, when X[i]==y[j], the longest common subsequence that can be converted to x[1...i-1] and y[1...j-1]. When X[i] is not equal to y[j], it is necessary to calculate the longest common subsequence of x[1...i] and y[1...j-1] and x[1...i-1] and Y[1...J]. These two sub-problems contain a common sub-problem, the longest common subsequence that computes x[1...i-1] and y[1...j-1].

2 : Overlapping sub-problems

We also see that the sub-problem contains a common sub-problem, that is, overlapping problems occur.

This means that a recursive problem consists of a small number of independent sub-problems that are repeatedly calculated. The LCS problem contains m*n independent sub-problems.

Reference documents

1:http://blog.csdn.net/steven30832/article/details/8260189

2:http://blog.csdn.net/imzoer/article/details/8031478

Algorithm Series note 6 (Dynamic planning-longest common sub-sequence/string LCS)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.