Hiho 123 weeks suffix array four • Repeat Melody

Source: Internet
Author: User

Suffix array four • Repeat melody 4 time limit:5000msSingle Point time limit:1000msMemory Limit:256MBDescribe

Small hi usually a big hobby is playing the piano. We know that a musical melody is represented as a series of numbers consisting of N in length. Little hi has practiced a lot of music and found that many of the melodies in the works have duplicate parts.

We call a melody (k,l)-repeating if it satisfies a string that is a length of l that repeats the K-th composition. As Melody Abaabaabaaba is (4,3) repeated, because it is repeated 4 times by ABA composition.

Little hi wants to know the biggest (k,l)-repeating melody in a work.

Tips on how to solve problems

Input

A single line of string that contains only lowercase letters. The string length does not exceed 100000.

Output

An integer line representing the answer K.

Sample input
Babbabaabaabaabab
Sample output
4
Tips on how to solve problems:

Little ho: How to solve this problem?

Little hi: Well, this time the problem is the repetition of the most consecutive string.

Little ho: It doesn't seem to be a good idea.

Little hi: Let's consider how to solve how to find the maximum number of repetitions of a string before we reduce the difficulty.

Little ho: Uh. I think, for example, string Abababab, can be (1,8), also can be (2,4), the largest is (4,2).

Little hi: Yes. If we enumerate a possible length of the cyclic section L (or k), can we quickly determine if the L is legal?

Little ho: Ah! I think ... It seems that the original string and the original string are removed before the two strings of LCP (the longest common prefix), if it can be fully matched on, it is satisfied!

Little hi: Yes, that's right. For example Abababab, test whether is (2,4), take Abababab and Ababab to seek LCP.

Little hi: It is worth mentioning that using the height array can quickly find the LCP we need. For example, the height array for abababab is as follows:

suffix SA Height
Ab 7 0
Abab 5 2
Ababab 3 4
Abababab 1 6
B 8 0
Bab 6 1
Babab 4 3
Bababab 2 5

Little hi: If we ask for a two-suffix LCP, only the minimum value of the middle of the height array is required. For example, Abababab and Ababab LCP is [4] the minimum value, namely, 2;bab and Bababab LCP is [3, 5] This section of the minimum value, that is, 3;ab and Babab LCP is [2, 4, 6, 0, 1, 3] The minimum value, that is, 0.

Small hi: This problem of finding a certain section of the height array is exactly the [RMQ problem] previously mentioned, which can be processed by O (the Nlogn) to O (1) to handle a single inquiry; Of course, the use of data structures such as segment trees is also possible, with a single query O (LOGN).

Little ho: I get it. Back to the original question, we must first enumerate (k,l) The L, then enumerate the starting position I, calculate suffix (i) and suffix (i+l) LCP, recorded as LCP (L, i), then K (l, i) is equal to LCP (l,i)/L + 1. The maximum K (l, i) is the answer for all the length of the Loop section L and the starting position I.

Little hi: you're right! But there is still room for further optimization. For OK l, we do not enumerate all of the starting position I, and only enumerate I is an integer multiple of L. If the starting position of the optimal string is exactly in multiples of L, then the largest k we find is the correct answer.

Little ho: That's the truth. But what if the starting position of the optimal string is not in multiples of l?

Small hi: Even if not, the problem will be too bad, if the optimal string position in X, we can imagine that we will enumerate to a nearest position after X P,p is a multiple of L. And we calculated the LCP,LCP (L, p) of suffix (p) and suffix (p+l) so that at this point K (L, p) =LCP (L, p)/l+1.

Little hi: For the K (l, P-1), K (L, p-2) ... K (l, p-l+1), which are skipped by us, the upper limit is K (l, p) +1.

Little ho: That's right. Because their starting position distance p does not exceed L, it is more than suffix (p) to add a follow-up link.

Small hi: Second, if K (l, P-1), K (L, p-2) ... K (l, P-l+1) has one of the values K (L, p) + 1, then K (l, P-l + LCP (L, p) mod l) must be equal to K (L, p) +1. (MoD is to take the remainder operation)

Little ho: Why?

Small hi: For example, String XAYCDABCDABCD (xy each represents an indeterminate character, the specific character will affect the final answer, we will analyze it later), when we consider l=4, the first time to enumerate the starting position of p=4, The CDABCDABCD and CDABCD LCP (4, 4) =6,k (4, 4) = 2 are calculated. According to the above assertion, only when K (L, P-l + LCP (L, p) mod l) =k (4, 4-4 + 6 mod 4) =k (4, 2) =3, K (4, 1), K (4, 2) and K (4, 3) will have 3. First we can judge that K (4, 3) must not be equal to 3, because regardless of which character y is, YCDABCDABCD and BCDABCD LCP (4, 3) The maximum is 7, less than 8. Second, if K (4, 2) ≠3, then K (4, 1) will not be. Because if K (4, 2) ≠3, stating that Ay and AB do not match, then regardless of which character X is, Xay and dabs do not match, LCP (4, 1) < L,k (4, 1) = 1.

Little ho: Oh, I sort of understand. K (L, P-l + LCP (L, p) mod l) is a dividing line, the value on the right because LCP is not large enough, must not add a follow-up link. And if K (l, P-l + LCP (L, p) mod L) does not add a cyclic section, it means [P-l + LCP (L, p) mod L, p] This intermediate match error, the left LCP also follow the Avalanche, it is not possible to increase the cycle section.

Little hi: Yes!

Small ho: What is the time complexity of enumerating L and starting the enumeration?

Little hi: You will find that the time complexity of enumerating the starting position of the enumeration after L is O (n/l), so the total complexity is O (N/1) +o (N/2) +o (N/3) ... This is a classic summation, the total complexity is O (NLOGN).

Little ho: I get it! So magical, seemingly simple ideas, but also very low complexity.

Little hi: Yes. The following is a binary judgment of C + + code implementation:

for (l=1; L <= N; l++) {for    (int i = 1; i + l <= n; i + = L)    {        int R = LCP (i, i + L);        ans = max (ans, r/l + 1);        if (i >= l-r% l)        {            ans = max (LCP (i-l + r%l, i + r%l)/L + 1, ans);        }    }

Little ho: OK. I'm going to make it.

#include <iostream>#include<cstring>#include<cstdio>#include<algorithm>#include<cmath>#include<string>#include<map>#include<stack>#include<queue>#include<vector>#defineINF 2e9#defineMet (b) memset (a,b,sizeof a)typedefLong Longll;using namespacestd;Const intN = 2e5+5;Const intM = 4e5+5;intcmpint*r,intAintBintl) {    return(R[a]==r[b]) && (r[a+l]==r[b+l]);}intWa[n],wb[n],wss[n],wv[n];intRank[n];//The rank of suffix i in sa[]intHeight[n];//Sa[i] and Sa[i-1] LCPintSa[n];//Sa[i] Indicates the subscript for the small suffix of the rank ivoidDA (int*r,int*sa,intNintM//here n is more than 1 of the input n, a manually added character used to avoid CMP time out of bounds{    inti,j,p,*x=wa,*y=wb,*T;  for(i=0; i<m; i++) wss[i]=0;  for(i=0; i<n; i++) wss[x[i]=r[i]]++;  for(i=1; i<m; i++) wss[i]+=wss[i-1];  for(i=n-1; i>=0; i--) sa[--wss[x[i]]]=i;//preprocessing length is 1     for(j=1, p=1; p<n; j*=2, m=p)//The SA that has been calculated for the length of J, to find 2*j SAS    {         for(p=0, I=n-j; i<n; i++) y[p++]=i;//Special handling without a second keyword         for(i=0; i<n; i++)if(sa[i]>=j) Y[p++]=sa[i]-j;//using the length J, sort by the second keyword         for(i=0; i<n; i++) wv[i]=X[y[i]];  for(i=0; i<m; i++) wss[i]=0;  for(i=0; i<n; i++) wss[wv[i]]++;  for(i=1; i<m; i++) wss[i]+=wss[i-1];  for(i=n-1; i>=0; i--) Sa[--wss[wv[i]]]=y[i];//Base Sort Section         for(t=x,x=y,y=t,p=1, x[sa[0]]=0, i=1; i<n; i++) X[sa[i]]=CMP (y,sa[i-1],sa[i],j)? p1:p + +;//update rank array x[], pay attention to the same    }}voidCalheight (int*r,intN//here n is the actual length{    inti,j,k=0;//the legal range of height[] is 1-n, where 0 is the end-added character     for(i=1; i<=n; i++) rank[sa[i]]=i;//rank according to SA     for(i=0; i<n; height[rank[i++]] = k)//definition: h[i] = height[Rank[i]]     for(k?k--:0, j=sa[rank[i]-1]; R[I+K]==R[J+K]; k++);//optimize the calculation height process according to H[i] >= h[i-1]-1}intN;CharSs[n];intAa[n];Const intmaxn=N;intmn[n][ -];intLog22[n];voidPre () { for(intI=1; i<=n; i++) Log22[i]=log2 (i);}voidRmq_init (intNint*h) {     for(intj=1; j<=n; J + +) mn[j][0]=H[j]; intm=Log22[n];  for(intI=1; i<=m; i++)         for(intJ=n; J>0; j--) {Mn[j][i]=mn[j][i-1]; if(J+ (1<< (I-1)) <=n) Mn[j][i]=min (Mn[j][i], mn[j+ (1<< (I-1)] [i1]); }}intLcp_min (intLintR//request LCP (L,R){    if(l>r) swap (L,R);//First Exchangel++;//According to the height definition, l++    intm=log22[r-l+1]; returnMin (mn[l][m],mn[r-(1&LT;&LT;M) +1][m]);}intsolve () {intans=1;  for(intL=1; l<=n; l++)    {         for(intj=0; j<n; j+=L) {intLcp_len=lcp_min (rank[j],rank[j+L]); Ans=max (ans,lcp_len/l+1); intlast_possible_pos=j-(l-lcp_len%m); if(last_possible_pos>=0) ans=max (ans,1+lcp_min (Rank[last_possible_pos], rank[last_possible_pos+l])/L); }    }    returnans;}intMain () {scanf ("%s",&SS); N=strlen (ss);  for(intI=0; i<n; i++) aa[i]=ss[i]-'a'+1; Aa[n]=0; DA (Aa,sa,n+1, -);    Calheight (Aa,n);    Pre ();    Rmq_init (N,height); intans=solve (); printf ("%d\n", ans); return 0;}

Hiho 123 weeks suffix array four • Repeat Melody

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.