from:http://my.oschina.net/leejun2005/blog/117167
1. The difference between the longest common sub-sequence & the longest common substring in first science:
Find the longest common substring of two strings, which is required to be contiguous in the original string. The longest common sub-sequence does not require continuous.
2. The longest common substring
In fact, this is a sequential decision-making problem, which can be solved by dynamic programming. We use a two-dimensional matrix to record intermediate results. How is this two-dimensional matrix structured? Just give an example: "Bab" and "Caba" (of course we can see at a glance that the longest common substring is "ba" or "AB")
b A B
C 0 0 0
A 0 1 0
B 1 0 1
A 0 1 0
We can find the longest common substring by looking at the longest diagonal of the matrix.
However, finding the longest diagonal line of 1 on a two-dimensional matrix is also a cumbersome and time-consuming thing, as follows: When you want the matrix to be filled 1 o'clock, let it be equal to its upper-left corner element plus 1.
b A B
C 0 0 0
A 0 1 0
B 1 0 2
A 0 2 0
The largest element in this matrix is the length of the longest common substring.
In the process of constructing this two-dimensional matrix, it is useless to get a row of the matrix after it is obtained, so it is actually possible to replace the matrix with one-dimensional array in the program.
The
2.1 code is as follows:
public class LCString2 {public static void Getlcstring (char[] str1, char[] str2) {int I, J; int len1, len2; Len1 = Str1.length; Len2 = Str2.length; int maxlen = len1 > len2? Len1:len2; int[] max = new Int[maxlen]; int[] Maxindex = new Int[maxlen]; Int[] C = new Int[maxlen]; Record the number of equal values on the diagonal for (i = 0; i < len2; i++) {for (j = len1-1; J >= 0; j--) {if (Str2[i] = = Str1[j]) {if ((i = = 0) | | (j = = 0)) C[J] = 1; else c[j] = c[j-1] + 1; } else {c[j] = 0; } if (C[j] > Max[0]) {//if is greater than that temporarily only one is the longest, and to put the back of the Qing 0; Max[0] = C[j]; Record the maximum value of the diagonal element, after which the length of the substring is used as the extraction string maxindex[0] = j; Record the position of the maximum value of the diagonal element for (int k = 1; k < maxlen; k++) {max[k] = 0; Maxindex[k] = 0; }} else if (c[j] = = Max[0]) {//There are multiple substrings of the same length for (int k = 1; k < maxlen; k++) { if (max[k] = = 0) {Max[k] = c[j]; Maxindex[k] = j; Break Add one at the back to exit the Loop.}}}} for (j = 0; J < MaxLen; J + +) {if (Max[j] > 0) {System.out.println ("+ (j + 1) +" Common substring: "); for (i = maxindex[j]-max[j] + 1; I <= maxindex[j]; i++) System.out.print (Str1[i]); System.out.println (""); }}} public static void Main (string[] args) {string str1 = new String ("123456abcd567"); String str2 = new String ("234dddabc45678"); String str1 = new String ("AAB12345678CDE"); String str2 = new String ("Ab1234yb1234567 "); Getlcstring (Str1.tochararray (), Str2.tochararray ()); }}
Ref
The Java algorithm for LCS---consider that there may be multiple identical longest common substrings
http://blog.csdn.net/rabbitbug/article/details/1740557
Maximum subsequence, longest increment subsequence, longest common substring, longest common subsequence, string edit distance
Http://www.cnblogs.com/zhangchaoyang/articles/2012070.html
2.2 In fact, awk is easy to write:
echo "123456abcd567234dddabc45678" |awk-vfs= "" ' Nr==1{str=$0}nr==2{n=nf;for (n=0;n++<n;) {s= ""; for "t=n;t<=N;t + +) {s=s "" $t, if (Index (str,s)) {a[n]=t-n;b[n]=s;if (M<=a[n]) m=a[n]}else{t=n}}}}end{for (n=0;n++<n;) if (a[n]==m ) Print B[n]} '
Ref:http://bbs.chinaunix.net/thread-4055834-2-1.html
3, the longest common sub-sequence
Import Java.util.Random; public class LCS {public static void main (string[] args) {//randomly generated string//String x = Getrandomstrings (s UBSTRINGLENGTH1); String y = getrandomstrings (substringLength2); String x = "A1B2C3"; String y = "1a1wbz2c123a1b2c123"; Sets the string length int substringLength1 = X.length (); int substringLength2 = Y.length (); Specific size can be set by itself//construct two-dimensional array record sub-problem x[i] and Y[i] LCS length int[][] opt = new Int[substringlength1 + 1][substringlength2 + 1]; From the back forward, dynamic planning calculates all sub-problems. can also be from the front to the back. for (int i = substringlength1-1, i >= 0; i--) {for (int j = substringlength2-1; J >= 0; j--) { if (X.charat (i) = = Y.charat (j)) Opt[i][j] = opt[i + 1][j + 1] + 1;//state transfer equation El Se opt[i][j] = Math.max (Opt[i + 1][j], Opt[i][j + 1]);//state Transfer equation}} System.out . println ("substring1:" + x); System.out.println ("substring2:" + y); System.out.print ("LCS:"); int i = 0, j = 0; while (I < substringLength1 && J < substringLength2) {if (X.charat (i) = = Y.charat (j)) { System.out.print (X.charat (i)); i++; j + +; } else if (Opt[i + 1][j] >= opt[i][j + 1]) i++; else J + +; }}//Get fixed-length random string public static string getrandomstrings (int length) {StringBuffer buffer = new Stringbuff ER ("abcdefghijklmnopqrstuvwxyz"); StringBuffer sb = new StringBuffer (); Random r = new Random (); int range = Buffer.length (); for (int i = 0; i < length; i++) {Sb.append (Buffer.charat (R.nextint (range))); } return sb.tostring (); }}
REF:
The maximum common subsequence of a string and the maximum common substring problem
http://gongqi.iteye.com/blog/1517447
Dynamic programming algorithm for solving the longest common subsequence LCS problem
http://blog.csdn.net/v_JULY_v/article/details/6110269
The dynamic planning of reading notes in the introduction to algorithms-longest common subsequence & longest common substring (LCS)