Longest contiguous common substring of two strings

Source: Internet
Author: User

LCS (Longest Common subsequence) is the problem of finding the longest common substring of two strings. Introduced:

LCS (Longest Common subsequence) is the problem of finding the longest common substring of two strings.

Like what:

String str1 = new String ("ADBCCADEBBCA");
String str2 = new String ("Edabccadece");
The common substring of str1 and str2 is Bccade.

The solution is to use a matrix to record the match between two characters of all the positions in two strings, or 1 if the match is 0. Then the longest diagonal 1 sequence is calculated, and its corresponding position is the position of the longest matched substring.

The following is the matching matrix for string 21232523311324 and string 312123223445, which is in the x direction and the latter is in the Y direction. Not hard to find, the red part is the longest matching substring. By finding the location we get the longest matching substring: 21232


0 0 0 1 0 0 0 1 1 0 0 1 0 0
0 1 0 0 0 0 0 0 0 1 1 0 0 0
1 0 1 0 1 0 1 0 0 0 0 0 1 0
0 1 0 0 0 0 0 0 0 1 1 0 0 0
1 0 1 0 1 0 1 0 0 0 0 0 1 0
0 0 0 1 0 0 0 1 1 0 0 1 0 0
1 0 1 0 1 0 1 0 0 0 0 0 1 0
1 0 1 0 1 0 1 0 0 0 0 0 1 0
0 0 0 1 0 0 0 1 1 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 1 0 0 0 0 0 0 0 0
But finding the longest 1 diagonal sequence in a matrix of 0 and 1 also takes a certain amount of time. This part of the time can be omitted by improving the way the matrix is generated and by setting tag variables. Here's how the new matrix is generated:
0 0 0 1 0 0 0 1 1 0 0 1 0 0
0 1 0 0 0 0 0 0 0 2 1 0 0 0
1 0 2 0 1 0 1 0 0 0 0 0 1 0
0 2 0 0 0 0 0 0 0 1 1 0 0 0
1 0 3 0 1 0 1 0 0 0 0 0 1 0
0 0 0 4 0 0 0 2 1 0 0 1 0 0
1 0 1 0 5 0 1 0 0 0 0 0 2 0
1 0 1 0 1 0 1 0 0 0 0 0 1 0
0 0 0 2 0 0 0 2 1 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 1 0 0 0 0 0 0 0 0
When a character is matched, we do not simply assign 1 to the corresponding element, but instead assign the value of the upper-left element to add one. We use two tag variables to mark the position of the largest element in the matrix, in the process of matrix generation to determine whether the value of the currently generated element is the largest, thereby changing the value of the tag variable, then to the completion of the matrix, the longest matching substring of the position and length has been out.

it's faster, but it takes too much space. We have noticed that in the improved matrix generation, the front row is useless for each generated row. So we just use one-dimensional arrays . The final code is as follows: (The source code has a slight flaw, has been improved)

voidGetlcs (Char* STR1,Char*str2) {    intLen1 =strlen (STR1); intLen2 =strlen (STR2); int*matrix =New int[LEN1];//str1 for x direction//Initialize Matrix     for(inti =0; i < len1; i++) {Matrix[i]=0; }    int*maxvalue =New int[Len2];//str2 is in the y direction with a maximum of len2 max values    int*maxindex =New int[Len2]; //initializing MaxValue and Maxindex     for(inti =0; i < len2; i++) {Maxvalue[i]= -1; Maxindex[i]= -1; }     for(inti =0; i < len2; i++)    {         for(intj = len1-1; J >=0; j--)        {            //Scan the str1 of each character in the str2            if(Str2[i] = =Str1[j]) {                if(J = =0) {Matrix[j]=1; }                Else{Matrix[j]= Matrix[j-1] +1; }            }            Else{Matrix[j]=0; }            if(Matrix[j]! =0&& Matrix[j] > maxvalue[0])            {                //Update the value of MaxValuemaxvalue[0] =Matrix[j]; maxindex[0] =J; //Reset the other MaxValue                 for(inti =1; i < len2; i++) {Maxvalue[i]= -1; Maxindex[i]= -1; }            }            Else if(Matrix[j] = = maxvalue[0])//There are multiple maximum consecutive common substrings            {                 for(inti =1; i < len2; i++)                {                    if(Maxvalue[i] = =-1) {Maxvalue[i]=Matrix[j]; Maxindex[i]=J;  Break;//just add a                    }                }            }        }    }     for(inti =0; i<len2; i++)    {        if(maxvalue[i]>0) {cout<<"Section"<< i +1<<"a common child string"<<Endl;  for(intj = Maxindex[i]-Maxvalue[i] +1; J <= Maxindex[i]; J + +) {cout<<Str1[j]; } cout<<Endl; }    }}

Longest contiguous common substring of two strings

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.