Obtain the longest public substring of two strings, longest public substring, and the editing distance.

Source: Internet
Author: User

(1) Find the longest common substring of two strings

Question: enter two strings to find the longest common substring in the two strings.

Find the longest common substring of two strings, which must be continuous in the original string. Therefore, we use a two-dimensional matrix to store intermediate results. How can we construct this two-dimensional array?

Assume that the two strings are "Bab" and "Caba" respectively ".

If STR [I] = STR [J], matrix [I] [J] = 1; otherwise, matrix [I] [J] = 0

Then we can find out the longest substring of the diagonal line from the matrix, that is, the longest common substring.

That is, "AB" and "ba" are 2.

We can simplify this by judging STR [I] = STR [J] and matrix [I-1] [J-1] When we calculate matrix [I] [J].

If STR [I] = STR [J], then Matrix [I] [J] = matrix [I-1] [J-1] + 1; otherwise, matrix [I] [J] = 0.

As shown in:

Therefore, we only find the maximum value in matrix [m] [N], that is, the longest common substring.

Then we can simplify the space complexity.

Because every time we judge a matrix [I] [J], it is actually only related to matrix [I-1] [J-1. Therefore, we can use a one-dimensional array to save the last result.

The implementation code is as follows:

# Include <cstring> # include <iostream> using namespace STD; int getlongestcommonsubstring (const char * pstr1, const char * pstr2) {/* determine the validity of the parameter */If (pstr1 = NULL | pstr2 = NULL) {return-1;} int n = strlen (pstr1 ); int M = strlen (pstr2); int longestcommonsubstring = 0;/* apply for the auxiliary space and initialize it to 0 */int * LCS = new int [m]; for (INT I = 0; I <m; I ++) {LCS [I] = 0;}/* constantly judge pstr [I]? = Pstr [J], and then update LCS */For (INT I = 0; I <n; I ++) {for (Int J = m-1; j> = 0; j --) {If (pstr1 [I] = pstr2 [J])/* If pstr1 [I] = pstr2 [J], LCS [J] = LCS [J-1] + 1 */{If (j = 0) {LCS [J] = 1 ;} else {LCS [J] = LCS [J-1] + 1 ;}} else/* If pstr1 [I]! = Pstr2 [J], LCs [J] = 0 */{LCS [J] = 0 ;} /* update the length of the oldest string */If (LCS [J]> longestcommonsubstring) {longestcommonsubstring = LCS [J] ;}} Delete LCs; LCS = NULL; return longestcommonsubstring;} void test (const char * testname, const char * pstr1, const char * pstr2, int expectedlongestcommonsubstring) {cout <testname <":"; if (getlongestcommonsubstring (pstr1, pstr2) = expectedlongestcommonsubstring) {cout <"passed. "<Endl;} else {cout <" failed. "<Endl ;}int main () {test (" test1 "," Caba "," Bab ", 2); test (" Test2 "," ABCD ", "EFG", 0); test ("test3", "ABCDE", "ABCDE", 5 );}

(2) Find the longest common subsequence of two strings

Question: enter two strings to obtain the longest common subsequence of the two strings.

First, the longest public sub-sequence is different from the longest public sub-string, and the sub-sequence does not require it to be continuous in the original string. For example, string x = {A, B, C, B, D, a, B}, y = {B, D, C, A, B, }, then the longest common subsequences of X and Y are Z = {B, C, B, }.

Suppose X = {x1, x2, X3 ,..., Xm}, then the prefix of X, XI = {x1, x2 ,... , Xi }. That is, x = {A, B, C, B, D, a, B}, X4 = {A, B, C, B }.

Y = {y1, Y2, Y3 ,... , Yn}, then z = {Z1, Z2 ,..., ZK} is the longest common subsequence of X and Y.

If XM = YN, zk = XM = YN and the Zk-1 is the longest common subsequence of the Xm-1 and Yn-1.

If XM! = YN, then ZK! = XM, and Z is the longest common subsequence of Xm-1 and YN.

If XM! = YN, then ZK! = YN, and Z is the longest common subsequence of XM and Yn-1.

Therefore, we define a two-dimensional array of C [I] [J] to store the longest common subsequences of Xi and YJ.

0 if I = 0 or J = 0

C [I] [J] = C [I-1] [J-1] + 1 if I, j> 0 and xi = YJ

Max (C [I] [J-1], C [I-1] [J]) if I, j> 0 and Xi! = YJ

The implementation code is as follows:

# Include <cstdio> # include <iostream> using namespace STD; int max (int A, int B) {return A> B? A: B;} int getlongestcommonsequence (const char * pstr1, const char * pstr2) {/* determine the validity of the parameter */If (pstr1 = NULL | pstr2 = NULL) {return-1;} int M = strlen (pstr1); int n = strlen (pstr2 ); /* apply for two-dimensional space LCS [M + 1] [n + 1] */INT ** LCS = new int * [M + 1]; for (INT I = 0; I <m + 1; I ++) {LCS [I] = new int [n + 1];}/* respectively on LCS [I] [0], LCS [0] [J] is assigned 0 */For (INT I = 0; I <m + 1; I ++) {LCS [I] [0] = 0 ;}for (Int J = 0; j <n + 1; j ++) {LCS [0] [J] = 0;}/* respectively traverse two strings and update LCS [I] [J] */For (INT I = 1; I <m + 1; I ++) {for (Int J = 1; j <n + 1; j ++) {If (pstr1 [I-1] = pstr2 [J-1]) {LCS [I] [J] = LCS [I-1] [J-1] + 1 ;} else {LCS [I] [J] = max (LCS [I-1] [J], LCs [I] [J-1]) ;}} /* obtain the longest common subsequence */INT longestcommonsequence = LCS [m] [N];/* Delete dynamic space */For (INT I = 0; I <m + 1; I ++) {Delete [] LCS [I]; LCS [I] = NULL;} Delete [] LCs; LCS = NULL; /* returns the longest common subsequence */return longestcommonsequence;} void test (const char * testname, const char * pstr1, const char * pstr2, int expectedlongestcommonsequence) {cout <testname <":"; if (getlongestcommonsequence (pstr1, pstr2) = expectedlongestcommonsequence) {cout <"passed. "<Endl;} else {cout <" failed. "<Endl ;}int main () {test (" test1 "," abcbdab "," bdcaba ", 4); test (" Test2 "," ", "A", 1); test ("test3", "AB", "BC", 1 );}

(3) determine the editing distance between two strings.

Question: enter two strings to find the shortest distance between them.

We have defined a set of operation methods to make two different strings the same. The specific operation method is:

1. modify a character (for example, replace "A" with "B ")

2. Add a character (for example, convert "abdd" to "aebdd ")

3. delete a character (for example, convert "traveling" to "traveling ")

Each time we perform the preceding step, the editing distance between them is increased by 1.

We also define a two-dimensional array. C [I] [J] indicates the shortest editing distance between string XI and string Yi.

C [I] [J] = min {C [I-1] [J] + 1, C [I] [J-1] + 1, c [I-1] [J-1] + 1 (Xi! = YJ), C [I-1] [J-1] (xi = YJ )}.

The implementation code is as follows:

# Include <cstring> # include <iostream> using namespace STD; int min (int A, int B, int c) {int min = A; If (min> B) {min = B;} If (min> C) {min = C;} return min;} int getleastesteditdistance (const char * pstr1, const char * pstr2) {If (pstr1 = NULL | pstr2 = NULL) {return-1;} int M = strlen (pstr1); int n = strlen (pstr2 ); /* apply for dynamic space led [M + 1] [n + 1] */INT ** led = new int * [M + 1]; for (INT I = 0; I <m + 1; I ++) {led [I] = new int [n + 1];}/* assigned the value of LED [I] [0] = I, led [0] [J] = J */For (INT I = 0; I <m + 1; I ++) {led [I] [0] = I ;} for (Int J = 0; j <n + 1; j ++) {led [0] [J] = J ;} /* calculate led [I] [J] */For (INT I = 1; I <m + 1; I ++) {for (Int J = 1; j <n + 1; j ++) {If (pstr1 [I-1] = pstr2 [J-1]) {led [I] [J] = min (LED [I-1] [J-1], led [I-1] [J] + 1, led [I] [J-1] + 1);} else {led [I] [J] = min (LED [I-1] [J-1] + 1, led [I-1] [J] + 1, led [I] [J-1] + 1 );}}} /* obtain the minimum editing distance */INT leastesteditdistance = led [m] [N];/* release dynamic space */For (INT I = 0; I <m + 1; I ++) {Delete [] led [I]; led [I] = NULL;} Delete [] led; led = NULL; /* return the minimum editing distance */return leastesteditdistance;} void test (const char * testname, const char * pstr1, const char * pstr2, int expectedleastesteditdistance) {cout <testname <":"; if (getleastesteditdistance (pstr1, pstr2) = expectedleastesteditdistance) {cout <"passed. "<Endl;} else {cout <" failed. "<Endl ;}int main () {test (" test1 "," A "," B ", 1); test (" Test2 "," abdd ", "aebdd", 1); test ("test3", "traveling", "traveling", 1); test ("test4", "ABCD", "ABCD ", 0); test ("test5", null, null,-1 );}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.