[Post] Longest Common subsequence

Source: Internet
Author: User

From: http://zhedahht.blog.163.com/blog/static/254111742007376431815/

Question: If all the characters of string 1 appear in the second string in the order of the strings, then string 1 is called a substring of string 2. Note that the character of a substring (string 1) must appear in string 2 consecutively. Compile a function, enter two strings, calculate their longest public substrings, and print the longest public substrings. For example, if two strings bdcaba and abcbdab are input, and both bcba and bdab are their longest common substrings, the output length is 4 and any substring is printed.

Analysis: finding the longest common subsequence (LCS) is a very classic dynamic programming question. Therefore, some companies that place importance on algorithms, such as microstrategy, regard it as an interview question. It will take a long time to fully introduce dynamic planning. Therefore, I do not intend to fully discuss the concepts related to dynamic planning here. I will only focus on the content directly related to LCS. If you are not familiar with dynamic planning, refer to the relevant algorithm book for example, algorithm discussion.

First introduce the nature of LCS problems: Note XM = {x0, X1 ,... Xm-1} and YN = {y0, Y1 ,..., Yn-1} is two strings, and zk = {z0, Z1 ,... Zk-1} is their LCS, then:

(1) If the xm-1 = yn-1, then the zk-1 = xm-1 = yn-1, And the Zk-1 is the Xm-1 and Yn-1 of LCs;

(2) If the xm-1 is yn-1, then when the zk-1 is xm-1 and Y LCs;

(3) If the xm-1 is less than yn-1, then when the zk-1 is less than yn-1, z is the Yn-1 and the LCS of X;

With the above properties, we can come up with the following ideas: Find the two strings XM = {x0, X1 ,... Xm-1} and YN = {y0, Y1 ,..., Yn-1} LCS, if the xm-1 = yn-1, only need to obtain the Xm-1 and the Yn-1 of LCS, and add xm-1 (yn-1) after it can; if the xm-1 = yn-1, the LCS of Xm-1 and Y are obtained respectively, and the LCS of Yn-1 and X are obtained, and the long LCS of the two LCS is the LCS of X and Y. If we remember that the length of the LCS of string XI and YJ is C [I, j], we can recursively calculate C [I, j]:

/0 if I <0 or j <0
C [I, j] = C [I-1, J-1] + 1 if I, j> = 0 and xi = XJ
\ Max (C [I, J-1], C [I-1, J] If I, j> = 0 and xi = XJ

The above formula is not difficult to obtain using recursive functions. However, from the analysis of the n-th item (Question 16th in this series) in the first step of obtaining the Fibonacci, we know that there will be a lot of repeated computations in direct recursion, it is more efficient to use the bottom-up and upward-loop solution.

In order to be able to use the idea of loop solution, we use a matrix (refer to the lcs_length in the Code) to save the calculated C [I, j], when the subsequent computation requires the data, the data can be directly read from the matrix. In addition, C [I, j] can be calculated from C [I-1, J-1], C [I, J-1] or C [I-1, J, it is equivalent to moving one of the two in the matrix lcs_length from C [I-1, J-1], C [I, J-1] or C [I-1, J] to C [I, j], therefore, there are three different moving directions in the matrix: left, up, and top left. Only moving to the top left indicates that one character in LCS is found. Therefore, we need to use another matrix (refer to the lcs_direction in the Code) to save the moving direction. The reference code is as follows:

 

01 # include <iostream>
02 # include <cstring>
03 # include <stack>
04 # include <utility>
05 # define leftup 0
06 # define left 1
07 # define up 2
08 using namespace STD;
09 int max (int A, int B, int C, int * max) {// the priority of A is the highest when the hacker is found, and the maximum value of C is saved in * max.
10 int res = 0; // cell from which the res record is generated
11 * max =;
12 if (B> * max ){
13 * max = B;
14 res = 1;
15}
16 if (C> * max ){
17 * max = C;
18 res = 2;
19}
20 return res;
21}
22 // when calling this function, please pay attention to assigning a long string to str1, which is mainly to save time in backtracking the longest-growing sequence. If a long string is not assigned to str1, the proper execution of the program is not affected.
23 string LCS (const string & str1, const string & str2 ){
24 int xlen = str1.size (); // horizontal length
25 int ylen = str2.size (); // vertical length
26 if (xlen = 0 | ylen = 0) // if either of str1 and str2 is null, null is returned.
27 return "";
28 pair <int, int> arr [ylen + 1] [xlen + 1]; // construct a pair two-dimensional array, first record data, and second record source
29 for (INT I = 0; I <= xlen; I ++) // clear 0 in the first line
30 arr [0] [I]. First = 0;
31 For (Int J = 0; j <= ylen; j ++) // clear 0 from the first column
32 arr [J] [0]. First = 0;
33 for (INT I = 1; I <= ylen; I ++ ){
34 char S = str2.at (I-1 );
35 For (Int J = 1; j <= xlen; j ++ ){
36 int leftup = arr [I-1] [J-1]. first;
37 int left = arr [I] [J-1]. first;
38 int up = arr [I-1] [J]. first;
39 if (str1.at (J-1) = s) // C1 = c2
40 leftup ++;
41 int Max;
42 arr [I] [J]. Second = max (leftup, left, up, & arr [I] [J]. First );
43 // cout <arr [I] [J]. First <arr [I] [J]. Second <"";
44}
45 // cout <Endl;
46}/* matrix constructed */
47 // trace back to find the longest common subsequence
48 stack <int> st;
49 int I = ylen, j = xlen;
50 while (! (I = 0 & J = 0 )){
51 if (ARR [I] [J]. Second = leftup ){
52 If (ARR [I] [J]. First = arr [I-1] [J-1]. First + 1)
53 st. Push (I );
54 -- I;
55 -- J;
56}
57 else if (ARR [I] [J]. Second = left ){
58 -- J;
59}
60 else if (ARR [I] [J]. Second = up ){
61 -- I;
62}
63}
64 string res = "";
65 while (! St. Empty ()){
66 int Index = ST. Top ()-1;
67 res. append (str2.substr (index, 1 ));
68 st. Pop ();
69}
70 return res;
71}
72 int main (){
73 string str1 = "gccctagcg ";
74 string str2 = "gcgcaatg ";
75 string LCS = LCS (str1, str2 );
76 cout <LCS <Endl;
77 return 0;
78}

Extension: if the question is changed to the longest common substring of the two strings, how can this problem be solved? The definition of a substring is similar to that of a substring, but must be continuously distributed in other strings. For example, the longest common strings of bdcaba and abcbdab include BD and AB, and their lengths are both 2.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.