Returns the longest compound substring from an array with a suffix.

Source: Internet
Author: User

Problem description
Returns the longest repeated substring of a string.
Example: abcdabcd
The longest duplicate substring is ABCD, and the longest duplicate substring can overlap
For example: abcdabcda, the longest duplicate substring is abcda, and the in the middle is overlapped.

The intuitive solution is to first check the string with a length of n-1. If there are no duplicates, Check n-2 and decrease until 1.
The time complexity of this method is O (n * n), which includes three parts: length latitude, number of strings detected by length, and string detection.

The improved method is to use the suffix array.
A suffix array is a data structure that generates a suffix array for a string and then sorts it. After sorting, the system checks the public parts starting with two adjacent strings in sequence.
The time complexity is: generate the suffix array O (N), sort o (nlogn * n), and the last n is because the string comparison is also O (N)
Checks adjacent strings O (N * n) sequentially. The total time complexity is O (n ^ 2 * logn), which is better than the O (N ^ 3) of the first method)

For the problem similar to finding the longest duplicate substring in the given text, you can use the "suffix array" to efficiently complete this task. The suffix array uses the text itself and N additional pointers (corresponding pointer arrays to the text array) to represent each substring of n characters in the input text.
First, if the input string is stored in C [0 .. n-1], you can use the following code to compare each pair of strings:

int main(void){int i , j , thislen , maxlen = -1;..................for(i = 0 ; i < n ; ++i ){for(j = i+1 ; j < n ; ++j ){if((thislen = comlen(&c[i] , &c[j])) > maxlen){maxlen = thislen;maxi = i;maxj = j;}}}..................return 0;}

When the two strings used as the comlen function parameters are of the same length, the function returns the length value starting from the first character:

int comlen( char *p, char *q ){    int i = 0;    while( *p && (*p++ == *q++) )        ++i;    return i;}

Because the algorithm views all the string pairs, the time is proportional to the square of N. The following describes how to use the suffix array.
If the program can process up to N maxcharacters, these characters are stored in array C:

# Define maxchar 5000 // process a maximum of 5000 characters char C [maxchar], * a [maxchar];

When reading the input, initialize a first, so that each element points to the corresponding character in the input string:

N = 0; while (CH = getchar ())! = '\ N') {A [n] = & C [N]; C [n ++] = CH;} C [N] =' \ 0 '; // set the last element in array C to null to terminate all strings

In this way, element a [0] points to the entire string, and the next element points to the suffix of the array starting with the second character, and so on. If the input string is "banana", the array will indicate These suffixes:
A [0]: banana
A [1]: Anana
A [2]: Nana
A [3]: Ana
A [4]: Na
A [5]:
Since the pointer in array a points to each Suffix in the string, array a is named "suffix array"

2. Sort the suffix array quickly to combine the sub-strings with similar suffixes
After qsort (A, N, sizeof (char *), pstrcmp)
A [0]:
A [1]: Ana
A [2]: Anana
A [3]: banana
A [4]: Na
A [5]: Nana
Third, use the following comlen function to scan and compare the Adjacent Elements of the array to find the longest repeated string:

for(i = 0 ; i < n-1 ; ++i ){        temp=comlen( a[i], a[i+1] );        if( temp>maxlen ){              maxlen=temp;              maxi=i;        }}printf("%.*s\n",maxlen, a[maxi]);

The complete implementation code is as follows:

# Include <iostream> using namespace STD; # define maxchar 5000 // process a maximum of 5000 characters in char C [maxchar], * a [maxchar]; int comlen (char * P, char * q) {int I = 0; while (* P & (* P ++ = * q ++) ++ I; return I ;} int pstrcmp (const void * P1, const void * P2) {return strcmp (* (char * const *) P1, * (char * const *) P2 );} int main (void) {char ch; int n = 0; int I, temp; int maxlen = 0, Maxi = 0; printf ("Please input your string: \ n "); n = 0; while (CH = getchar ())! = '\ N') {A [n] = & C [N]; C [n ++] = CH;} C [N] =' \ 0 '; // set the last element in array C to a null character to terminate all strings qsort (A, N, sizeof (char *), pstrcmp); for (I = 0; I <n-1; ++ I) {temp = comlen (A [I], a [I + 1]); If (temp> maxlen) {maxlen = temp; maxi = I ;}} printf ("%. * s \ n ", maxlen, a [Maxi]); Return 0 ;}

Method 2: KMP
By using the next array feature, we can also find the longest repeated substring, but the time complexity is a bit high ..

# Include <iostream> using namespace STD; const int max = 100000; int next [Max]; char STR [Max]; void getnext (char * t) {int Len = strlen (t); next [0] =-1; int I = 0, j =-1; while (I <Len) {If (j =-1 | T [I] = T [J]) {I ++; j ++; If (T [I]! = T [J]) next [I] = J; elsenext [I] = next [J];} elsej = next [J] ;}} int main (void) {int I, j, index, Len; cout <"Please input your string:" <Endl; CIN> STR; char * s = STR; Len = 0; for (I = 0; * s! = '\ 0'; s ++, ++ I) {getnext (s); For (j = 1; j <= strlen (s); ++ J) {If (next [J]> Len) {Len = next [J]; Index = I + J; // index is the position of the first longest repeated string in STR }}if (LEN> 0) {for (I = index-len; I <index; ++ I) cout <STR [I]; cout <Endl;} elsecout <"NONE" <Endl; return 0 ;}

Description: The longest unduplicated substring, for example, abcdefgegcsgcasse. The longest unduplicated substring is abcdefg and the length is 7.

# Include <iostream> # include <list> using namespace STD; // train of thought: the number of times a character appears in an array. Use I and j to traverse the entire string. // When a character does not appear, the number of times + 1; the number of occurrences has exceeded, and the number of occurrences + 1, locate the next position before the character, set it to I // and set the number of previous characters to-1. Continue traversing until '\ 0' int find (char STR [], char * output) {int I = 0, j = 0; int CNT [26] = {0 }; int res = 0, temp = 0; char * out = output; int final; while (STR [J]! = '\ 0') {If (CNT [STR [J]-'a'] = 0) {CNT [STR [J]-'a'] ++ ;} else {CNT [STR [J]-'a'] ++; while (STR [I]! = STR [J]) {CNT [STR [I]-'a'] --; I ++;} CNT [STR [I]-'a'] --; I ++;} J ++; temp = J-I; If (temp> res) {res = temp; Final = I ;}} // Save the result in the output for (I = 0; I <res; ++ I) * out ++ = STR [Final ++]; * Out = '\ 0'; return res;} int main (void) {char a [] = "abcdefg"; char B [100]; int max = find (, b); cout <B <Endl; cout <max <Endl; return 0 ;}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.