1. The difference between the longest common sub-sequence & the longest common substring in first science:
Find the longest common substring of two strings, which is required to be contiguous in the original string. The longest common sub-sequence does not require continuous.
2. The longest common substring
In fact, this is a sequential decision-making problem, which can be solved by dynamic programming. We use a two-dimensional matrix to record intermediate results. How is this two-dimensional matrix structured? Just give an example: "Bab" and "Caba" (of course we can see at a glance that the longest common substring is "ba" or "AB")
b A B
C 0 0 0
A 0 1 0
B 1 0 1
A 0 1 0
We can find the longest common substring by looking at the longest diagonal of the matrix.
However, finding the longest diagonal line of 1 on a two-dimensional matrix is also a cumbersome and time-consuming thing, as follows: When you want the matrix to be filled 1 o'clock, let it be equal to its upper-left corner element plus 1.
b A B
C 0 0 0
A 0 1 0
B 1 0 2
A 0 2 0
The largest element in this matrix is the length of the longest common substring.
In the process of constructing this two-dimensional matrix, it is useless to get a row of the matrix after it is obtained, so it is actually possible to replace the matrix with one-dimensional array in the program.
The 2.1 code is as follows:
public class LCString2 {
public static void Getlcstring (char[] str1, char[] str2) {
int I, J;
int len1, len2;
Len1 = Str1.length;
Len2 = Str2.length;
int maxlen = len1 > len2? Len1:len2;
int[] max = new Int[maxlen];
int[] Maxindex = new Int[maxlen];
Int[] C = new Int[maxlen]; Record the number of equal values on the diagonal
for (i = 0; i < len2; i++) {
for (j = len1-1; J >= 0; j--) {
if (str2[i] = = Str1[j]) {
if ((i = = 0) | | (j = = 0))
C[J] = 1;
Else
C[J] = c[j-1] + 1;
} else {
C[J] = 0;
}
if (C[j] > Max[0]) {//if is greater than that temporarily only one is the longest, and to put the back of the Qing 0;
Max[0] = C[j]; Record the maximum value of the diagonal element, followed by the length of the substring being extracted
Maxindex[0] = j; Record the position of the maximum value of the diagonal element
for (int k = 1; k < maxlen; k++) {
Max[k] = 0;
Maxindex[k] = 0;
}
} else if (c[j] = = Max[0]) {//There are multiple substrings of the same length
for (int k = 1; k < maxlen; k++) {
if (max[k] = = 0) {
MAX[K] = C[j];
Maxindex[k] = j;
Break Plus one in the back is going to exit the loop.
}
}
}
}
}
for (j = 0; J < MaxLen; J + +) {
if (Max[j] > 0) {
System.out.println ("No." + (j + 1) + "public substring:");
for (i = maxindex[j]-max[j] + 1; I <= maxindex[j]; i++)
System.out.print (Str1[i]);
System.out.println ("");
}
}
}
public static void Main (string[] args) {
String str1 = new String ("123456abcd567");
String str2 = new String ("234dddabc45678");
String str1 = new String ("AAB12345678CDE");
String str2 = new String ("ab1234yb1234567");
Getlcstring (Str1.tochararray (), Str2.tochararray ());
}
}
Ref
The Java algorithm for LCS---consider that there may be multiple identical longest common substrings
http://blog.csdn.net/rabbitbug/article/details/1740557
Maximum subsequence, longest increment subsequence, longest common substring, longest common subsequence, string edit distance
Http://www.cnblogs.com/zhangchaoyang/articles/2012070.html
2.2 In fact, awk is easy to write:
echo "123456abcd567
234dddabc45678 "|awk-vfs=" "' Nr==1{str=$0}nr==2{n=nf;for (n=0;n++<n;) {s=" "; for (t=n;t<=n;t++) {s=s" "$t; if ( Index (str,s)) {a[n]=t-n;b[n]=s;if (M<=a[n]) m=a[n]}else{t=n}}}}end{for (n=0;n++<n;) if (a[n]==m) print B[n]} '
Ref:http://bbs.chinaunix.net/thread-4055834-2-1.html
2.3 Perl's ... Really did not read ...
#!/usr/bin/perl
Use strict;
Use warnings;
My $str 1 = "123456abcd567";
My $str 2 = "234dddabc45678";
My $str = $str 1. "\ n". $STR 2;
My (@substr, @result);
$str =~/(. +) (? =.*\n.*\1) (*prune) (? { Push @substr, $}) (*f)/;
@substr = sort {Length ($b) <=> length ($a)} @substr;
@result = grep {length = = length $substr [0]} @substr;
print "@result \ n";
Ref:http://bbs.chinaunix.net/thread-1333575-7-1.html
3, the longest common sub-sequence
Import Java.util.Random;
public class LCS {
public static void Main (string[] args) {
Randomly generated string
String x = getrandomstrings (substringLength1);
String y = getrandomstrings (substringLength2);
String x = "A1B2C3";
String y = "1a1wbz2c123a1b2c123";
Set string length
int substringLength1 = X.length ();
int substringLength2 = Y.length (); Specific size can be set by itself
Construct two-dimensional array to record sub-problems x[i] and y[i] The length of the LCS
Int[][] opt = new Int[substringlength1 + 1][substringlength2 + 1];
From the back forward, dynamic planning calculates all sub-problems. can also be from the front to the back.
for (int i = substringlength1-1; I >= 0; i--) {
for (int j = substringlength2-1; J >= 0; j--) {
if (X.charat (i) = = Y.charat (j))
OPT[I][J] = opt[i + 1][j + 1] + 1;//state transition equation
Else
OPT[I][J] = Math.max (Opt[i + 1][j], Opt[i][j + 1]);//state transition equation
}
}
System.out.println ("substring1:" + x);
System.out.println ("substring2:" + y);
System.out.print ("LCS:");
int i = 0, j = 0;
while (I < substringLength1 && J < substringLength2) {
if (X.charat (i) = = Y.charat (j)) {
System.out.print (X.charat (i));
i++;
j + +;
} else if (Opt[i + 1][j] >= opt[i][j + 1])
i++;
Else
j + +;
}
}
Get a fixed-length random string
public static String getrandomstrings (int length) {
StringBuffer buffer = new StringBuffer ("abcdefghijklmnopqrstuvwxyz");
StringBuffer sb = new StringBuffer ();
Random r = new Random ();
int range = Buffer.length ();
for (int i = 0; i < length; i++) {
Sb.append (Buffer.charat (R.nextint (range)));
}
return sb.tostring ();
}
}
REF:
The maximum common subsequence of a string and the maximum common substring problem
http://gongqi.iteye.com/blog/1517447
Dynamic programming algorithm for solving the longest common subsequence LCS problem
http://blog.csdn.net/v_JULY_v/article/details/6110269
Dynamic programming algorithm: Longest common sub-sequence & longest common substring (LCS)