Algorithm Series (v) two methods for the longest common subsequence (LCS) problem (discontinuous subsequence)

Source: Internet
Author: User

The longest common subsequence is also called the longest common substring, and the abbreviation is the LCS (longest Common subsequence). It is defined as a sequence s, which is the longest common subsequence of a known sequence if it is a subsequence of two or more known sequences, and is the longest in a subsequence that conforms to this condition.

There are usually two ways to define a subsequence, one of which is that there is no continuous requirement for the child sequence, and its subsequence is defined as the sequence from which several elements are removed from the original sequence. The other is that there is a continuous requirement for the sub sequence, and its subsequence is defined as a sequence of several elements that appear successively in the original sequence. The problem of solving the longest common subsequence of a subsequence is a very practical one, which can describe the similarity between two paragraphs, that is, their similarity, so that they can be used to identify plagiarism. In this article, we will introduce how to solve the longest common subsequence problem with a computer, and how to solve the longest common subsequence problem with a computer in the case of the continuity requirement of the subsequence, which will be introduced in the following article.

Dynamic programming method (dynamical programming)

The longest common subsequence problem belongs to the problem of finding the optimal solution in the multistage decision problem, where the dynamic programming method should be given priority in compiling the computer program, if the dynamic programming method is not used, and no other solution can be found, the exhaustive method may be considered. For this problem, the dynamic programming method can be considered as long as the optimal substructure and the optimal solution of the longest common subsequence can be found and the "no validity" is satisfied for every optimal decision in the optimal substructure. The key of using dynamic programming method is to decompose the problem, decompose it into sub problem according to certain law (the decomposed sub problem can be decomposed again, this is a recursive process), find out the best by the definition of the pair problem. The optimal decision sequence in a substructure (for a child problem is the subsequence of the most decision sequence) and the recursive relation of the subsequence of the optimal decision sequence (including, of course, the boundary value of the recursive relation).

If a subsequence of a given sequence is a sequence that is removed from a number of elements in the sequence, it means that the position index (subscript) of the subsequence in the original sequence is kept in a strictly ascending order. For example, the sequence s = <B,C,D,B> is a subsequence (discontinuous) sequence k = <A,B,C,B,D,A,B>, and the element of sequence s in K is indexed by the position index i = [2,3,5,7],i is a strictly ascending sequence.

1.1 Optimal substructure definition and boundary value

Now to analyze the optimal substructure of the problem. First define the problem, assuming that there is a string str1 length m, string str2 length of n, can be the problem described as: the string str1<1..m> from the 1th to the first (I <= m) characters of the substring str1<1...i> and strings The longest common sequence of substrings in the str2<1..n> from the 1th to the J (J <= N) character, the longest public sequence of the child problem can be described as d[i,j] = {Z1,Z2, ... Zk}, which z1-zk the character of the longest common subsequence that has been matched to the current child problem. After the child problem is defined, it is also needed to find the recursive relation of the optimal sequence d[i,j of the sub problem. The analysis of the recurrence relationship of D [I,j] begins with the relationship between Str1[i] and str2[j], and if Str1[i and Str2[j] are the same, then D[i,j is the longest public sequence d[i-1,j-1 of +1,zk=str1[i]=str2[j]; if str1 [i] and str2[j] are not the same, then D[i,j] is the larger of the longest public sequence of the longest public sequence and d[i,j-1 of D [I-1,j].

Finally, the boundary value of d[i,j] is determined, and when the string str1 is empty or the string str2 is empty, its longest common substring should be 0, that is, when i=0 or j=0, d [I,j] is 0. The full recursive relationship of D[i,j] is as follows:

1.1 Reverse-seeking the longest common child sequence

According to the recursive relation of the optimal solution substructure obtained from 1.1, the d[i,j of the m,j from 1 to n is computed sequentially, and the last d[m,n] is the length of the longest common subsequence. D[m,n] is only the length value of the longest common subsequence, indicates the similarity of two strings, if we want to get the longest common subsequence, we need to analyze the result of each step decision after calculating the value of D[m,n ' matrix, and construct the longest common subsequence according to each optimal decision. Therefore, in the process of recursive computing d[i,j], it is necessary to record the process of the optimal decision at the same time, the process of the optimal decision is represented by the Matrix R, R[i,j] the "recursive source" that represents the length value of the longest common subsequence d[i,j]. According to the recursive relationship of the preceding arrangement, if the value of R[i,j] is 1, then the value of d[i,j] is recursively obtained by d[i-1,j-1] + 1, and if the value of R[i,j is 2, the value of d[i,j] is recursive, and if the value of D[i-1,j is 3, the R[i,j The value of i,j] is recursively obtained by d[i,j-1. In the case of the string "Abcdea" and "AEBCDA", the D and r from the recursive relationship are merged into a matrix to display:

Figure (1) schematic diagram of the longest common subsequence of reverse construction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.