Solving the longest common subsequence of two strings

Source: Internet
Author: User

One, the problem description

Given two strings, the longest common subsequence of the two strings (longest Common Sequence) is solved. For example, string 1:bDCaBA; string 2: ABCBdab

The longest common subsequence length for these two strings is 4, and the longest common subsequence is: BCBA

Second, the algorithm solves

This is a topic of dynamic planning. For the problem of available dynamic programming, there are generally two characteristics: ① optimal substructure; ② overlapping sub-problem

① Optimal sub-structure

Set x= (x1,x2,..... xn) and Y={y1,y2,..... ym} are two sequences that record the longest common subsequence of X and Y as LCS (x, y)

Finding the LCS (x, y) is an optimization problem. Because, we need to find the longest common subsequence in X and Y. To find the LCS for X and Y, first consider the last element of x and the last element of Y.

1) If Xn=ym, that is, the last element of X is the same as the last element of Y, this means that the element must be in a common subsequence. So now we just need to look for: LCS (xn-1,ym-1)

LCS (xn-1,ym-1) is a sub-problem of the original problem. Why is it called a sub-question? Because it is smaller than the original problem. (Small one element is also small ....) )

Why is the optimal sub-problem? Because we're looking for the longest common subsequence of Xn-1 and Ym-1 ... The Longest!!! In other words, it's the best one. (The best here is the longest meaning)

2) if Xn! = ym, this is a bit of a hassle because it produces two sub-issues: LCS (XN-1,YM) and LCS (XN,YM-1)

Because the sequence x and the last element of the sequence y are unequal, it means that the last element cannot be the element in the longest common subsequence. (Not equal, how public).

LCS (XN-1,YM) indicates that the longest common sequence can be found in (x1,x2,.... x (n-1)) and (Y1,y2,... yn).

LCS (xn,ym-1) indicates that the longest common sequence can be found in (x1,x2,.... Xn) and (Y1,y2,... y (n-1)).

Solve the above two sub-problems, get the common sub-sequence who is the longest, then who is the LCS (x, y). In mathematical notation, it is:

Lcs=max{lcs (Xn-1,ym), LCS (Xn,ym-1)}

Due to conditions 1) and 2) all possible scenarios are taken into account. As a result, we succeeded in translating the original problem into three smaller sub-problems.

② overlapping sub-problems

What is the overlapping sub-problem? That is, after the original problem into a sub-problem, the sub-problem has the same problem. Hey? I did not find the above three sub-problems have the same ah????

OK, let's see, the original question is: LCS (x, y). A sub-problem? LCS (xn-1,ym-1)? LCS (XN-1,YM)? LCS (xn,ym-1)

First of all, these three sub-problems are not overlapping. They are, in essence, overlapping because they overlap only a large part. Example:

The second sub-question: LCS (XN-1,YM) contains: Question? LCS (xn-1,ym-1), why?

Because, when the last element of Xn-1 and Ym is not the same, we need to decompose LCS (XN-1,YM) into:LCS (XN-1,YM-1) and LCS (XN-2,YM)

In other words: In the continuation of sub-problem decomposition, some problems overlap.

Because of problems like LCS, it has the nature of overlapping sub-problems, so: solving with recursion is not a good deal. Because recursion is used, it solves the sub-problem again and again. And notice Oh, all the sub-problems add up to the number of points of the oh ....

This article shows an example of a recursive solution to overlapping sub-problems.

So the question comes, you say, with recursive solution, there are several levels of sub-problems, so time complexity is the number of points. This refers to a number of sub-problems, is it using dynamic planning, it becomes a polynomial time??

Oh da ....

The key is that when using dynamic planning, there is no need to go to one by one to calculate the overlapping sub-problems. Or, after using the dynamic programming, some sub-problems are obtained directly through the "look-up table", instead of being computed again and again. Talk less: Give me an example! For example, to find the fib sequence. For the fib sequence, refer to:

The FIB (5) is decomposed into two sub-problems: FIB (4) and FIB (3), solving fib (4) and FIB (3), and decomposing a series of small problems ....

You can see: The root of the left and right subtree: fib (4) and FIB (3), there is a lot of overlap!!! For example, for FIB (2), it appears three times altogether. If the solution is solved by recursion, the FIB (2) will be calculated three times, and with DP (Dynamic programming), then the FIB (2) will only be calculated once, the other two times by the "look-up table" directly obtained.

Having said so much, it's time to write down the recursion of the longest common sub-sequence to complete. Borrow a picture of netizens:)

C[I,J] means: (X1,X2....XI) and (Y1,y2...yj) the length of the longest common subsequence. (It's the length, it's an integer). The specific explanation of the formula can be referred to the dynamic Planning section of the introduction to algorithms

Third, the LCS Java implementation

1  Public classLcsequence {2     3     //solving the longest common sub-sequences of str1 and str24      Public Static intLCS (String str1, String str2) {5         int[] C =New int[Str1.length () + 1] [Str2.length () + 1];6          for(introw = 0; Row <= str1.length (); row++)7C[row][0] = 0;8          for(intcolumn = 0; Column <= str2.length (); column++)9C[0][column] = 0;Ten          One          for(inti = 1; I <= str1.length (); i++) A              for(intj = 1; J <= Str2.length (); J + +) -             { -                 if(Str1.charat (i-1) = = Str2.charat (j-1)) theC[I][J] = c[i-1][j-1] + 1; -                 Else if(C[i][j-1] > C[i-1][j]) -C[I][J] = c[i][j-1]; -                 Else +C[I][J] = c[i-1][j]; -             } +         returnc[str1.length ()][str2.length ()]; A     } at      -     //Test -      Public Static voidMain (string[] args) { -String str1 = "Bdcaba"; -String str2 = "Abcbdab"; -         intresult =LCS (str1, str2); in System.out.println (result); -     } to}

It feels like the entire code is written directly on the recursive expression above.

① Line 5th defines an array to hold the length of the longest common subsequence

Lines 6th through 9th of ② are initialized. Why initialize to 0? because: c[0,j] what do you mean? Indicates that the length of string 1 is 0, the length of String 2 is J, and the length of the longest common subsequence of these two strings is? Of course it's 0. Because, the string 1 is not at all .

③ lines 11th through 20th are the program representations of recursive expressions. Line 16th to 19th, that is: c[i,j] = max{c[i][j-1], c[i-1][j]}

④ Line 21st returns the final result. Why is return c[str1.length ()][str2.length ()]??? Look at C[i][j] and you'll know what it means.

Four, references

https://www.zhihu.com/question/23995189

Http://www.cnblogs.com/huangxincheng/archive/2012/11/11/2764625.html

Solving the longest common subsequence of two strings

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.