Spoj repeats-repeats (suffix array [consecutive repeating substrings with the most repetitions])

Last Update:2018-07-26 Source: Internet

Author: User

Tags sort time limit

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article can use the catalog feature yo ↑ (click above [+])

spoj repeats-repeats

accept:0 submit:0
Time limit:1985 MS Memory limit:1536 MB problem Description

A string S is called an (k,l)-repeat if S is obtained by concatenating k>=1 times some seed string T with length l>= 1. For example, the string

s = Abaabaabaaba

is a (4,3)-repeat with t = ABA as its seed string. That's, the seed string T is 3 characters long, and the whole string s are obtained by repeating T 4 times.

Write a program for the following Task:your program is given a long string u consisting of characters ' a ' and/or ' B ' as I Nput. Your program must find some (k,l)-repeat, occurs as substring within U with K as large as possible. For example, the input string

U = Babbabaabaabaabab

Contains the underlined (4,3)-repeat s starting at position 5. Since u contains no other contiguous substring with more than 4 repeats, your program must output the maximum K.

Input

The first line of the input contains h-the number of test cases (H <= 20). H test Cases follow. First line of all test cases is n-length of the input string (n <= 50000), the next n lines contain the input string , one character (either ' a ' or ' B ') per line, in order.

Output

For each test cases, you should write exactly one Interger K in a line-the repeat count of is maximized.
Sample Input 1
17
B
A
B
B
A
B
A
A
B
A
A
B
A
A
B
A
b Sample Output 4 Hint

Since a (4, 3)-repeat is found starting at the 5th character of the input string.
Problem Idea

Problem Solving Ideas:

Test instructions

Given a string, the number of consecutive repeating substrings with the most repetitions

Type
suffix array [consecutive repeating substring with the most repetitions]
Analysis

The subject is a bare suffix array problem

"The most repeated consecutive repeating substring" solution (from Ro's national Training Team paper):

First, the length of the long l, and then the length of the substring can be more than a few consecutive occurrences. It is certainly possible to have 1 consecutive occurrences, so it is only considered at least 2 times in this case. Assuming that the original string appears consecutively 2 times, remember that substring is s, then s must include the character r[0], r[l], r[l*2],r[l*3], ... One of the adjacent two. So just look at the characters R[l*i] and r[l* (i+1)] forward and
After each can match to how far, remember this total length is k, then here successive appeared k/l+1 times. Finally see what the maximum value is. As shown in the figure.

The time of the exhaustive length L is N, and the time for each calculation is n/l. Therefore the time complexity of the entire procedure is O (n/1+n/2+n/3+......+n/n) =o (NLOGN).

PS: The basic idea in Ro's paper has been relatively clear, and I am here to mention the paper is a relatively vague part of

A total of two points to mention, the 1th more obvious

"s certainly includes the character r[0], r[l], r[l*2],r[l*3], ... Two "in one of the adjacent

Since the current S is a concatenation of two consecutive repeating substrings of length L, that means s[i] and s[i+l] (0≤i<l) must be the same character

And the two character positions differ by the L

And the character R[0],r[l],r[l*2],r[l*3],...... The position difference of two adjacent is L

"Just look at the characters R[l*i] and r[l* (i+1)] and how far forward and backward can be matched to how far back, this directly according to the longest public prefix can be easily obtained, that is, the suffix suffix (6) and suffix suffix (9) the longest public prefix. And for how far forward can be matched, we can of course start by putting the string in reverse, so we can see how far forward can be matched to the longest common prefix, but this is less efficient.

In fact, when the repeating substring length of the enumeration is I, we can inevitably appear r[i*j] in the process of enumerating r[i*j] and r[i* (j+1)] in the first repeating substring, and r[i* (j+1)] in the second repeating substring, if at this time r[i*j] is the first character of a repeating substring, so that the final result can be obtained by dividing the public prefix k by I and rounding down. But if R[I*J] is not the first character, then the result is likely to be small, since r[i*j] may have a few characters in front of it and can be seen as the first repeating substring.
So, we might as well first calculate, starting from r[i*j], in addition to matching the k/i repeat substring, there are several characters remaining, the remaining natural is k%i characters. If there are i-k%i characters in front of R[i*j] to complete the match, it would be equivalent to using extra characters to match a repeating substring, so we just have to check if there is r[i* from r[i*j-(i-k%i) and j+1 (i-k%i)). The I character can complete the match, that is, to check whether the longest common prefix of the two suffixes is larger than the i-k%i.
Of course, if the public prefix is not smaller than i-k%i, nature is no smaller than I, because the following characters are already matched, so in order to facilitate the writing, the program directly to see if it will be smaller than I can.

This part of the understanding of a little difficult, and do not understand the welcome proposed

"Time Complexity && optimization"
O (NLOGN)

Topic link →SPOJ repeats-repeats Source Code

/*sherlock and Watson and adler*/#pragma comment (linker, "/stack:1024000000,1024000000") #include <stdio.h> # include<string.h> #include <stdlib.h> #include <queue> #include <stack> #include <math.h > #include <vector> #include <map> #include <set> #include <list> #include <bitset> # include<cmath> #include <complex> #include <string> #include <algorithm> #include <iostream > #define EPS 1e-9 #define LL long Long #define PI ACOs ( -1.0) #define BITNUM (a) __builtin_popcount (a) using namespace s
td
const int N = 5005;
const int M = 100005;
const int inf = 1000000007;
const int mod = 1000000007;
const int MAXN = 50005; Rnk starting from 0//sa starting from 1, because the last character (the smallest) is ranked No. 0 bit//height starting from 1, because it represents sa[i-1] and sa[i]//Multiplication algorithm O (nlogn) int WA[MAXN], WB[MAXN], WV[MAXN
], WS_[MAXN]; The parameter m of the suffix function represents the range of characters in the string, is a parameter of the radix sort, if the original sequence is a letter can be directly taken 128, if the original sequence itself is an integer, then M can take the largest integer greater than 1 of the value//string to be sorted in the R array, from r[0] to r[ N-1], length n//To facilitate comparison of the size, you can add a character after the string, the character does not appear in the preceding characters, andAnd more than the preceding characters are smaller//Ibid, for the convenience of function operation, the Convention except R[n-1] all the r[i] is greater than the 0,r[n-1]=0//function end, the result is placed in the SA array, from sa[0] to sa[n-1] void Suffix (int *r, int *sa
    , int n, int m) {int I, j, K, *x = WA, *y = WB, *t;  Sort a string of length 1//In general, the maximum value of R is not very large in the string's title, so the radix sort is used here//if the maximum value of R is large, then the code is changed to quick sort for (i = 0; i < m; ++i) Ws_[i]
    = 0;
    for (i = 0; i < n; ++i) Ws_[x[i] = r[i]]++;//The number of statistical characters for (i = 1; i < m; ++i) ws_[i] + = ws_[i-1];//statistics not greater than the number of characters I for (i = n-1; I >= 0; i) sa[--ws_[x[i]] = i;//calculated character Rank//radix sort//x array holds the value equivalent to the rank value for (j = 1, k = 1; k < n; J *= 2, M = k) {//j is the length of the current string, array y holds the result of ordering the second keyword//second keyword sort for (k = 0, i = n-j; i < n; ++i  ) y[k++] = i;//The second keyword is 0 of the row in front for (i = 0; i < n; ++i) if (Sa[i] >= j) y[k++] = sa[i]-j;//"Length of J substring Sa[i] should be 2 * J Substring Sa[i]-j suffix (second keyword), for all substrings of length 2 * j are sorted according to the second keyword for (i = 0; i < n; ++i) Wv[i] = x[y[i]];//Extract First keyword//press
 The First keyword sort (principle is the same as the string of length 1) for (i = 0; i < m; ++i) ws_[i] = 0;       for (i = 0; i < n; ++i) ws_[wv[i]]++;
        for (i = 1; i < m; ++i) ws_[i] + = ws_[i-1]; for (i = n-1; I >= 0; i) sa[--ws_[wv[i]] = y[i];//The number of substrings ranked 2 * j is calculated by the first keyword//At this point the array x is the rank of a substring of length J, and the array y is still based on the second
        The result of the keyword sorting//the rank of the substring with a length of 2 * j, saved to array x t = x;
        x = y;
        y = t; For (x[sa[0]] = 0, i = k = 1; i < n; ++i) x[sa[i] = (Y[sa[i-1] [= Y[sa[i]] && Y[sa[i-1] + j] = = Y[sa[i] + j])?
        K-1: k++;
If the substring of length 2 * j Sa[i] is exactly the same as sa[i-1], then they have the same rank}} int RANK[MAXN], HEIGHT[MAXN], SA[MAXN], R[MAXN];
    void calheight (int *r,int *sa,int n) {int i,j,k=0;
    for (I=1; i<=n; i++) rank[sa[i]]=i;
For (i=0, i<n; height[rank[i++]]=k) for (k?k--:0,j=sa[rank[i]-1]; r[i+k]==r[j+k]; k++);
} int n,minnum[maxn][16];
	void RMQ ()//preprocessing O (NLOGN) {int i,j;
	int m= (int) (log (n*1.0)/log (2.0));
	for (i=1;i<=n;i++) minnum[i][0]=height[i]; for (j=1;j<=m;j++) for (i=1;i+ (1&LT;&LT;J) -1<=n;i+ +) Minnum[i][j]=min (minnum[i][j-1],minnum[i+ (1<< (j-1))][j-1]);
	} int ask_min (int a,int b)//o (1) {int k=int (log (b-a+1.0)/log (2.0));
return min (minnum[a][k],minnum[b-(1<<k) +1][k]);
    } int Calprefix (int a,int b) {a=rank[a],b=rank[b];
    if (a>b) swap (A, b);
Return Ask_min (A+1,B);
} Char s[5];
    int main () {int t,i,j,k,ans,max;
    scanf ("%d", &t);
        while (t--) {max=1;
        scanf ("%d", &n);
            for (i=0;i<n;i++) {scanf ("%s", s);
        r[i]=s[0]-' a ' +1;
        } r[i]=0;
        Suffix (r,sa,n+1,3);
        Calheight (R,sa,n);
        RMQ ();
                for (i=1;i<=n;i++) {for (j=0;j+i<n;j+=i) {ans=calprefix (j,j+i);
                k=j-(i-ans%i);
                ans=ans/i+1;
                if (K>=0&&calprefix (k,k+i) >=i) ans++;
                printf ("l=%d,r=%d\n", I,ans);
Max=max (Max,ans);            }} printf ("%d\n", Max);
} return 0; }

Rookie growth notes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More