Dynamic programming algorithm: Longest common sub-sequence & longest common substring (LCS)

Source: Internet
Author: User

1. The difference between the longest common sub-sequence & the longest common substring in first science:


Find the longest common substring of two strings, which is required to be contiguous in the original string. The longest common sub-sequence does not require continuous.


2. The longest common substring




In fact, this is a sequential decision-making problem, which can be solved by dynamic programming. We use a two-dimensional matrix to record intermediate results. How is this two-dimensional matrix structured? Just give an example: "Bab" and "Caba" (of course we can see at a glance that the longest common substring is "ba" or "AB")


b A B


C 0 0 0


A 0 1 0


B 1 0 1


A 0 1 0


We can find the longest common substring by looking at the longest diagonal of the matrix.


However, finding the longest diagonal line of 1 on a two-dimensional matrix is also a cumbersome and time-consuming thing, as follows: When you want the matrix to be filled 1 o'clock, let it be equal to its upper-left corner element plus 1.


b A B


C 0 0 0


A 0 1 0


B 1 0 2


A 0 2 0


The largest element in this matrix is the length of the longest common substring.


In the process of constructing this two-dimensional matrix, it is useless to get a row of the matrix after it is obtained, so it is actually possible to replace the matrix with one-dimensional array in the program.


The 2.1 code is as follows:



public class LCString2 {

public static void Getlcstring (char[] str1, char[] str2) {

int I, J;

int len1, len2;

Len1 = Str1.length;

Len2 = Str2.length;

int maxlen = len1 > len2? Len1:len2;

int[] max = new Int[maxlen];

int[] Maxindex = new Int[maxlen];

Int[] C = new Int[maxlen]; Record the number of equal values on the diagonal

for (i = 0; i < len2; i++) {

for (j = len1-1; J >= 0; j--) {

if (str2[i] = = Str1[j]) {

if ((i = = 0) | | (j = = 0))

C[J] = 1;

Else

C[J] = c[j-1] + 1;

} else {

C[J] = 0;

}

if (C[j] > Max[0]) {//if is greater than that temporarily only one is the longest, and to put the back of the Qing 0;

Max[0] = C[j]; Record the maximum value of the diagonal element, followed by the length of the substring being extracted

Maxindex[0] = j; Record the position of the maximum value of the diagonal element

for (int k = 1; k < maxlen; k++) {

Max[k] = 0;

Maxindex[k] = 0;

}

} else if (c[j] = = Max[0]) {//There are multiple substrings of the same length

for (int k = 1; k < maxlen; k++) {

if (max[k] = = 0) {

MAX[K] = C[j];

Maxindex[k] = j;

Break Plus one in the back is going to exit the loop.

}

}

}

}

}

for (j = 0; J < MaxLen; J + +) {

if (Max[j] > 0) {

System.out.println ("No." + (j + 1) + "public substring:");

for (i = maxindex[j]-max[j] + 1; I <= maxindex[j]; i++)

System.out.print (Str1[i]);

System.out.println ("");

}

}

}

public static void Main (string[] args) {

String str1 = new String ("123456abcd567");

String str2 = new String ("234dddabc45678");

String str1 = new String ("AAB12345678CDE");

String str2 = new String ("ab1234yb1234567");

Getlcstring (Str1.tochararray (), Str2.tochararray ());

}

}

Ref

The Java algorithm for LCS---consider that there may be multiple identical longest common substrings


http://blog.csdn.net/rabbitbug/article/details/1740557



Maximum subsequence, longest increment subsequence, longest common substring, longest common subsequence, string edit distance


Http://www.cnblogs.com/zhangchaoyang/articles/2012070.html


2.2 In fact, awk is easy to write:




echo "123456abcd567

234dddabc45678 "|awk-vfs=" "' Nr==1{str=$0}nr==2{n=nf;for (n=0;n++<n;) {s=" "; for (t=n;t<=n;t++) {s=s" "$t; if ( Index (str,s)) {a[n]=t-n;b[n]=s;if (M<=a[n]) m=a[n]}else{t=n}}}}end{for (n=0;n++<n;) if (a[n]==m) print B[n]} '

Ref:http://bbs.chinaunix.net/thread-4055834-2-1.html


2.3 Perl's ... Really did not read ...



#!/usr/bin/perl

Use strict;

Use warnings;

My $str 1 = "123456abcd567";

My $str 2 = "234dddabc45678";

My $str = $str 1. "\ n". $STR 2;

My (@substr, @result);

$str =~/(. +) (? =.*\n.*\1) (*prune) (? { Push @substr, $}) (*f)/;

@substr = sort {Length ($b) <=> length ($a)} @substr;

@result = grep {length = = length $substr [0]} @substr;

print "@result \ n";

Ref:http://bbs.chinaunix.net/thread-1333575-7-1.html



3, the longest common sub-sequence


Import Java.util.Random;

public class LCS {

public static void Main (string[] args) {

Randomly generated string

String x = getrandomstrings (substringLength1);

String y = getrandomstrings (substringLength2);

String x = "A1B2C3";

String y = "1a1wbz2c123a1b2c123";

Set string length

int substringLength1 = X.length ();

int substringLength2 = Y.length (); Specific size can be set by itself

Construct two-dimensional array to record sub-problems x[i] and y[i] The length of the LCS

Int[][] opt = new Int[substringlength1 + 1][substringlength2 + 1];

From the back forward, dynamic planning calculates all sub-problems. can also be from the front to the back.

for (int i = substringlength1-1; I >= 0; i--) {

for (int j = substringlength2-1; J >= 0; j--) {

if (X.charat (i) = = Y.charat (j))

OPT[I][J] = opt[i + 1][j + 1] + 1;//state transition equation

Else

OPT[I][J] = Math.max (Opt[i + 1][j], Opt[i][j + 1]);//state transition equation

}

}

System.out.println ("substring1:" + x);

System.out.println ("substring2:" + y);

System.out.print ("LCS:");

int i = 0, j = 0;

while (I < substringLength1 && J < substringLength2) {

if (X.charat (i) = = Y.charat (j)) {

System.out.print (X.charat (i));

i++;

j + +;

} else if (Opt[i + 1][j] >= opt[i][j + 1])

i++;

Else

j + +;

}

}

Get a fixed-length random string

public static String getrandomstrings (int length) {

StringBuffer buffer = new StringBuffer ("abcdefghijklmnopqrstuvwxyz");

StringBuffer sb = new StringBuffer ();

Random r = new Random ();

int range = Buffer.length ();

for (int i = 0; i < length; i++) {

Sb.append (Buffer.charat (R.nextint (range)));

}

return sb.tostring ();

}

}

REF:

The maximum common subsequence of a string and the maximum common substring problem


http://gongqi.iteye.com/blog/1517447


Dynamic programming algorithm for solving the longest common subsequence LCS problem


http://blog.csdn.net/v_JULY_v/article/details/6110269


Dynamic programming algorithm: Longest common sub-sequence & longest common substring (LCS)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.