Query the longest common substring of two strings

Source: Internet
Author: User

LCS (longest common subsequence) is the problem of finding the longest public substring of two strings.

Link: http://blog.csdn.net/zztfj/article/details/6157429

For example:

String str1 = new string ("adbccadebbca ");
String str2 = new string ("edabccadece ");
The common substring of str1 and str2 is bccade.

The solution is to use a matrix to record the matching conditions between the two characters at all positions in two strings. If it matches, it is 1; otherwise, it is 0. Then we can find the longest 1 series of diagonal lines. The corresponding position is the longest position matching the substring.

The following is the matching matrix between string 21232523311324 and string 312123223445. The former is in the X direction and the latter is in the Y direction. It is not hard to find. The red part is the longest matching substring. The longest matching substring is 21232.


0 0 0 1 0 0 1 1 0 0 1 0 0 0
0 1 0 0 0 0 0 0 1 1 0 0 0 0
1 0 1 0 1 0 1 0 0 0 0 1 0 0
0 1 0 0 0 0 0 0 1 1 0 0 0 0
1 0 1 0 1 0 1 0 0 0 0 1 0 0
0 0 0 1 0 0 1 1 0 0 1 0 0 0
1 0 1 0 1 0 1 0 0 0 0 1 0 0
1 0 1 0 1 0 1 0 0 0 0 1 0 0
0 0 0 1 0 0 1 1 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
However, it takes some time to find the longest diagonal series of 1 in the matrix of 0 and 1. By improving the matrix generation method and setting tag variables, you can save this time. The new matrix generation method is as follows:
0 0 0 1 0 0 1 1 0 0 1 0 0 0
0 1 0 0 0 0 0 0 0 2 1 0 0 0
1 0 2 0 1 0 1 0 0 0 0 1 0 0
0 2 0 0 0 0 0 0 1 1 0 0 0 0
1 0 3 0 1 0 1 0 0 0 0 1 0 0
0 0 0 4 0 0 0 2 1 0 1 0 0 0
1 0 1 0 5 0 1 0 0 0 0 2 0 0
1 0 1 0 1 0 1 0 0 0 0 1 0 0
0 0 0 2 0 0 2 1 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
When matching a character, we do not simply assign 1 to the corresponding element, but the value of the element in the upper left corner plus one. We use two marking variables to mark the position of the element with the largest median value in the Matrix. During the matrix generation process, we can determine whether the value of the currently generated element is the largest. Based on this, we can change the value of the marking variable, by the time the matrix is complete, the longest position and length of the matched substring have come out. The specific algorithm is as follows:

Void LCS (char a [], char B [], int len1, int len2) {assert (len1> 0 & len2> 0); int max = 0; // mark the maximum length of Public substrings int ROW = 0, Col = 0; // mark the columns of Public substrings int ** c = new int * [len2 + 1]; for (INT I = 0; I <len2 + 1; ++ I) C [I] = new int [len1 + 1]; for (INT I = 0; I <len2 + 1; ++ I) // assign the initial value for (Int J = 0; j <len1 + 1; ++ J) c [I] [J] = 0; For (INT I = 1; I <len2 + 1; ++ I) for (Int J = 1; j <len1 + 1; + + J) if (B [I-1] = A [J-1]) {C [I] [J] = C [I-1] [J-1] + 1; if (C [I] [J]> MAX) {max = C [I] [J]; ROW = I; Col = J ;}} for (int K = row-Max; k <= row-1; k ++) cout <B [k];}

This is faster, but it takes too much space. We noticed that in the improved matrix generation method, each row is generated, and the previous row is useless. Therefore, we only need to use a one-dimensional array. The final code is as follows:

Public class lcstring2 {

Public static void getlcstring (char [] str1, char [] str2)
{
Int I, J;
Int len1, len2;
Len1 = str1.length;
Len2 = str2.length;
Int maxlen = len1> len2? Len1: len2;
Int [] Max = new int [maxlen];
Int [] maxindex = new int [maxlen];
Int [] C = new int [maxlen];

For (I = 0; I <len2; I ++)
{
For (j = len1-1; j> = 0; j --)
{
If (str2 [I] = str1 [J])
{
If (I = 0) | (j = 0 ))
C [J] = 1;
Else
C [J] = C [J-1] + 1;
}
Else
{
C [J] = 0;
}

If (C [J]> MAX [0])
{// If it is greater than that, only one of them is the longest at the moment, and the following values should be cleared;
Max [0] = C [J];
Maxindex [0] = J;

For (int K = 1; k <maxlen; k ++)
{
Max [k] = 0;
Maxindex [k] = 0;
}
}
Else if (C [J] = MAX [0])
{// There are multiple substrings of the same length
For (int K = 1; k <maxlen; k ++)
{
If (MAX [k] = 0)
{
Max [k] = C [J];
Maxindex [k] = J;
Break; // Add one to the backend and exit the loop.
}

}
}
}
}

For (j = 0; j <maxlen; j ++)
{
If (MAX [J]> 0)
{
System. Out. println ("th" + (J + 1) + "Public substrings :");
For (I = maxindex [J]-Max [J] + 1; I <= maxindex [J]; I ++)
System. Out. Print (str1 [I]);
System. Out. println ("");
}
}
}

Public static void main (string [] ARGs ){

String str1 = new string ("adbba1234 ");
String str2 = new string ("adbbf1234sa ");
Getlcstring (str1.tochararray (), str2.tochararray ());
}
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.