Implementation of Longest Common subsequence instance code using javascript, and javascript instance

Source: Internet
Author: User

Implementation of Longest Common subsequence instance code using javascript, and javascript instance

Introduction

Longest Common Subsequence LCS is used to extract as many characters as possible from the given two sequences X and Y, which are sorted in the order of the original sequence. LCS algorithms are widely used. For example, in the management of different software versions, LCS algorithms are used to find similarities and differences between the new and old versions. In software testing, compare the recording and playback sequences using the LCS algorithm. In the field of genetic engineering, the LCS algorithm is used to check the similarities and differences between patient DNA connections and keykang DNA chains. In the anti-plagiarism system, use the LCS algorithm to check the paper plagiarism rate. The LCS algorithm can also be used for program code similarity measurement, serial search of human body operation, and video segment matching. Therefore, the LCS algorithm is of great application value.

Basic Concepts

Subsequence: The subsequence of a specific sequence is the result of removing zero or multiple elements from a given sequence (without changing the relative sequence between elements ). For example, the subsequences of sequences <A, B, C, B, D, A, B> include: <A, B>, <B, C, A>, <, b, C, D, A>.

Common subsequence: Given sequences X and Y, where Z is a subsequence of X and Y, Z is a public subsequence of X and Y. For example, X = [A, B, C, B, D, A, B], Y = [B, D, C, A, B, [, then the sequence Z = [B, C, A] is A public subsequence of X and Y, and its length is 3. However, Z is not the longest common subsequence of X and Y, while the sequences [B, C, B, A] and [B, D,, b] They are also the longest common subsequences of X and Y, with a length of 4, while X and Y do not have public subsequences with a length greater than or equal to 5. For the common subsequences of sequences [A, B, C] and [E, F, G], there is only A null sequence [].

Longest Common subsequence: for the given sequence X and Y, select the longest one or several from all their common subsequences.
Substring: a new series of zero or several characters from the beginning, last, or at the same time. Differences and subsequences: subsequences can escape characters from the middle. How many neutron sequences are there in the cnblogs string? Obviously there are 27, for example, cb and cgs are subsequences.

Explain the following to a diagram:

We can see that the subsequence is not necessarily continuous, and the continuous is the substring.

Problem Analysis

We still analyze from a matrix and export the State Migration equation by ourselves.

First, we should convert the problem into a concept that is familiar to the front end. Instead of sequential calling, we can think of it as an array or a string. Everything is simplified. Let's estimate and compare the two strings.

We should pay attention to the concept of "subsequence". It can delete multiple or zero subsequences or completely eliminate them. At this time, our first subsequence is a null string (if our sequence is not a string, we can also )! This is really worth noting! Many people simply do not understand the chart in "Introduction to algorithms", and many blog authors do not understand it. We always compare from left to right. Of course, the first string is placed vertically as the matrix is high.

X "" B D C A B A
""
A
B
C
D
A
B

X = "ABCDAB", Y = "BDCABA", respectively, to obtain the shortest sequence, that is, compare the null string with the Null String. If the LCS equation is interpreted as a number, this table can only contain numbers. The length of the Public region of the two empty strings is 0.

X "" B D C A B A
"" 0
A
B
C
D
A
B

Then we will not move X, continue to let the Null String out of the array, and Y let the "B" out of the array. Obviously, their public region length is 0. replace Y with other characters, such as D, C, or continuous combination of DC and DDC. The situation remains unchanged and is still 0. therefore, the first row is 0. then we will not move Y, and Y will only output any Null String. So, like the above analysis, all values are 0, and the first column is 0.

X "" B D C A B A
"" 0 0 0 0 0 0 0
A 0
B 0
C 0
D 0
A 0
B 0

The LCS problem is a little different from that of the backpack. You can set-1 for the backpack problem. The longest common subsequence is fixed to the left and top at the beginning because of the appearance of an empty subsequence.

Then let's zoom in on the problem. This time, both sides have a character. Obviously, only when both are the same can there be a public subsequence that is not a null string, the length is also interpreted as 1.

Any sub-sequence in which A is "X" and Y is "BDCA"

X "" B D C A B A
"" 0 0 0 0 0 0 0
A 0 0 0 0 1
B 0
C 0
D 0
A 0
B 0

Continue to fill in the blanks on the right. What should I do if this parameter is set? Obviously, the length of LCS cannot be greater than the length of X. How can the sub-Sequence starting from string A of Y be equal to 1 compared with the sequence of string B.

X "" B D C A B A
"" 0 0 0 0 0 0 0
A 0 0 0 0 1 1 1
B 0
C 0
D 0
A 0
B 0

If X only sends the first character A, B, that is, "", "A", "B", "AB, the first two have been explained. Let's take a look at B, ${X_1 }== {Y_0} first. We get a new public substring, and we should add 1. Why? This matrix is a state table that describes the State migration process from left to right, from top to bottom, and these statuses are accumulated based on existing States. Now, we need to confirm the relationship between the value of the grid we want to fill in and the value of the grid that has been filled around it. At present, there is too little information, which is an isolated point. Enter 1 directly.

X "" B D C A B A
"" 0 0 0 0 0 0 0
A 0 0 0 0 1 1 1
B 0 1
C 0
D 0
A 0
B 0

Then let Y have another D helper, {"", A, B, AB} vs {"", B, D, BD}. Obviously, continue to fill in 1. until the second B of Y is filled in, it is 1. This is because when it comes to BDCAB, they have another common subsequence, AB.

X "" B D C A B A
"" 0 0 0 0 0 0 0
A 0 0 0 0 1 1 1
B 0 1 1 1 1 2
C 0
D 0
A 0
B 0

In this step, we can sum up some rules. Then, we will verify our ideas through computation and add new rules or restrictions to improve them.

Y sends all characters, and X is still 2 characters. After careful observation, enter 2.

Looking at the five elements, X sends another C, and the sub-sequence set of ABC is larger than the sub-sequence set of AB, so it is larger than the B Sub-sequence set of Y, even if it is not big, it cannot be smaller than the original one. Obviously, the newly added C cannot be a battle force, not a common character of the two. Therefore, the value should be equal to the sub-sequence set of AB.

× "" B D C A B A
"" 0 0 0 0 0 0 0
A 0 0 0 0 1 1 1
B 0 1 1 1 1 2 2
C 0 1
D 0
A 0
B 0

We can also make sure that if the two strings have different characters to be compared, the space to be filled in is related to the left or top, and the space to be filled in is the one that is big over there.

If the comparison characters are the same, just A little bit more secure. C of X should be compared with C of Y, that is, the subsequence set of ABC {"", A, B, C, AB, in comparison with the subsequence set {"", B, D, C, BD, DC, BDC} of BDC, the common substrings include "", B, and D. At this time, it is still the same as the previous conclusion. When the character is equal, the value of the grid corresponding to it is equal to the value between the left and the right and the top left, and the left, top and top left sides are always equal. These mysteries need more rigorous mathematical knowledge for demonstration.

Suppose there are two arrays, A and B. A [I] is the I-th element of A, and A (I) is the prefix composed of the first element of A to the I-th element. M (I, j) is the longest common subsequence length of A (I) and B (j.

Due to the recursive nature of the algorithm itself, as long as it is proved, for an I and j:

M (I, j) = m (I-1, J-1) + 1 (when A [I] = B [j)

M (I, j) = max (m (I-1, j), m (I, J-1) (when A [I]! = B [j)

The first statement proves that when A [I] = B [j. It can be proved that m (I, j)> m (I-1, J-1) + 1 (m (I, j) cannot be smaller than m (I-1, J-1) + 1, the reason is obvious), then we can roll out m (I-1, J-1) is not the longest result of this conflict.

The second is trick. When A [I]! = B [j], or reverse evidence, assuming m (I, j)> max (m (I-1, j), m (I, J-1 )).

From the proof hypothesis, We can get m (I, j)> m (I-1, j ). We recommend that A [I] be included in the LCS sequence corresponding to m (I, j ). Because A [I]! = B [j], so B [j] must not be in the LCS sequence corresponding to m (I, j. So m (I, j) = m (I, J-1) can be introduced ). This introduces the conflict with the assumption.

.

We are now using the following equation to continue to fill out the table.

Program Implementation

// By SITU zhengmei function LCS (str1, str2) {var rows = str1.split ("") rows. unshift ("") var cols = str2.split ("") cols. unshift ("") var m = rows. length var n = cols. length var dp = [] for (var I = 0; I <m; I ++) {dp [I] = [] for (var j = 0; j <n; j ++) {if (I = 0 | j = 0) {dp [I] [j] = 0 continue} if (rows [I] = cols [j]) {dp [I] [j] = dp [I-1] [J-1] + 1 // diagonal + 1} else {dp [I] [j] = Math. max (dp [I-1] [j], dp [I] [J-1]) // to the left, top max} console. log (dp [I]. join ("") // debug} return dp [I-1] [J-1]}

LCS can be further simplified. You only need to move the location to save the generation of new arrays.

// By SITU zhengmei function LCS (str1, str2) {var m = str1.length var n = str2.length var dp = [new Array (n + 1 ). fill (0)] // The first line is all 0 for (var I = 1; I <= m; I ++) {// a total of m + 1 rows dp [I] = [0] // The first column is all 0 for (var j = 1; j <= n; j ++) {// a total of n + 1 columns if (str1 [I-1] === str2 [J-1]) {// note that the first character of str1 is in the second column, therefore, we need to subtract 1, str2 likewise dp [I] [j] = dp [I-1] [J-1] + 1 // diagonal + 1} else {dp [I] [j] = Math. max (dp [I-1] [j], dp [I] [J-1]) }}return dp [m] [n];}

Print an LCS

Let's look at how to print a print function first. We searched from the bottom right corner and found the last line to terminate. Therefore, the construction of the target string is in reverse order. To avoid the trouble of using stringBuffer, we can implement Recursive Implementation. Each time we execute a program, only one string is returned. If no string is returned, a null string is returned, using printLCS (x, y ,...) + str [I] is added to obtain the required string.

Let's write another method to verify whether the string we get is a real LCS string. As a working person, the code that cannot be written is put online without unit tests, just like in the case of students.

// By SITU zhengmei, print an LCSfunction printLCS (dp, str1, str2, I, j) {if (I = 0 | j = 0) {return "";} if (str1 [I-1] = str2 [J-1]) {return printLCS (dp, str1, str2, I-1, J-1) + str1 [I-1];} else {if (dp [I] [J-1]> dp [I-1] [j]) {return printLCS (dp, str1, str2, I, J-1 );} else {return printLCS (dp, str1, str2, I-1, j) ;}}// by SITU zhengmei, convert the target string to regular, verify whether it is the LCSfunction validateLCS (el, str1, str2) {var re = new RegExp (el. split (""). join (". * ") console. log (el, re. test (str1), re. test (str2) return re. test (str1) & re. test (str2 )}

Usage:

Function LCS (str1, str2) {var m = str1.length var n = str2.length //.... slightly, add var s = printLCS (dp, str1, str2, m, n) validateLCS (s, str1, str2) return dp [m] [n]} var c1 = LCS ("ABCBDAB", "BDCABA"); console. log (c1) // 4 BCBA, BCAB, BDABvar c2 = LCS ("13456778", "357486782"); console. log (c2) // 5 34678 var c3 = LCS ("accggtcgagtgcgcgcggaagccggccgaa", "GTCGTTCGGAATGCCGTTGCTCTGTAAA"); console. log (c3) // 20 GTCGTCGGAAGCCGGCCGAA


Print all LCS

The idea is similar to the above. We should note that there is a Math. max value in the LCS method, which is actually a combination of three cases, so three strings can be split. Our method will return an es6 collection object, which can be removed automatically. Then, each time a new set is used to merge the character strings of the old set.

// By SITU zhengmei print all LCSfunction printAllLCS (dp, str1, str2, I, j) {if (I = 0 | j = 0) {return new Set ([""])} else if (str1 [I-1] = str2 [J-1]) {var newSet = new Set () printAllLCS (dp, str1, str2, I-1, J-1 ). forEach (function (el) {newSet. add (el + str1 [I-1])}) return newSet} else {var set = new Set () if (dp [I] [J-1]> = dp [I-1] [j]) {printAllLCS (dp, str1, str2, I, J-1 ). forEach (function (el) {set. add (el)} if (dp [I-1] [j]> = dp [I] [J-1]) {// required> =, it cannot be simply an else deal with printAllLCS (dp, str1, str2, I-1, j ). forEach (function (el) {set. add (el)} return set }}

Usage:

Function LCS (str1, str2) {var m = str1.length var n = str2.length //.... slightly, add the var s = printAllLCS (dp, str1, str2, m, n) console. log (s) s. forEach (function (el) {validateLCS (el, str1, str2) console. log ("output LCS", el)}) return dp [m] [n]} var c1 = LCS ("ABCBDAB", "BDCABA"); console. log (c1) // 4 BCBA, BCAB, BDABvar c2 = LCS ("13456778", "357486782"); console. log (c2) // 5 34678 var c3 = LCS ("accggtcgagtgcgcgcggaagccggccgaa", "GTCGTTCGGAATGCCGTTGCTCTGTAAA"); console. log (c3) // 20 GTCGTCGGAAGCCGGCCGAA


Space Optimization

Use a scrolling array:

Function LCS (str1, str2) {var m = str1.length var n = str2.length var dp = [new Array (n + 1 ). fill (0)], now = 1, row // The first row is all 0 for (var I = 1; I <= m; I ++) {// a total of 2 rows row = dp [now] = [0] // The first column is all 0 for (var j = 1; j <= n; j ++) {// a total of n + 1 columns if (str1 [I-1] === str2 [J-1]) {// note that the first character of str1 is in the second column, therefore, we need to subtract 1, str2 likewise dp [now] [j] = dp [I-now] [J-1] + 1 // diagonal + 1} else {dp [now] [j] = Math. max (dp [I-now] [j], dp [now] [J-1])} now = 1 -Now; // 1-1 => 0; 1-0 => 1; 1-1 => 0...} return row? Row [n]: 0}

Dangerous recursive Solution

A subsequence of str1 corresponds to the subscript sequence {1, 2 ,..., A sub-sequence of m}. Therefore, str1 has a total of ${2 ^ m} $ different sub-sequences (this is also true for str2, for example, ${2 ^ n} $ ), therefore, the complexity reaches an astonishing exponential time ($ {2 ^ m * 2 ^ n} $ ).

// Warning: if the string is too long, the stack function LCS (str1, str2, a, B) {if (a = void 0) will pop up) {a = str1.length-1} if (B = void 0) {B = str2.length-1} if (a =-1 | B =-1) {return 0} if (str1 [a] = str2 [B]) {return LCS (str1, str2, A-1, B-1) + 1;} if (str1 [a]! = Str2 [B]) {var x = LCS (str1, str2, a, B-1) var y = LCS (str1, str2, A-1, B) return x> = y? X: y }}

Reference

  • Http://blog.csdn.net/hrn1216/article/details/51534607
  • Https://segmentfault.com/a/1190000002641054
  • Https://www.cnblogs.com/ider/p/longest-common-substring-problem-optimization.html
  • Http://www.cppblog.com/mysileng/archive/2013/05/14/200265.html

Summary

The above is all the content of this article. I hope the content of this article has some reference and learning value for everyone's learning or work. If you have any questions, please leave a message to us, thank you for your support.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.