Hihocoder suffix Automaton three • Repeat Melody 6

Source: Internet
Author: User

Suffix Automaton three • Repeat melody 6 time limit:15000msSingle Point time limit:3000msMemory Limit:512MBDescribe

Small hi usually a big hobby is playing the piano. We know that a musical melody is represented as a series of numbers.

Now little hi wants to know the number of occurrences of the most frequently occurring melody in any melody of the length k in a work. But k is not fixed, and little hi wants to know the answer to all the K.

Tips on how to solve problems

Input

A total of one row, containing a string s consisting of a lowercase letter. The string length does not exceed 1000000.

Output

A total length (S) line, one integer per line, representing the answer.

Sample input
AaB
Sample output
211

Little hi: The last time we have learned the suffix automaton, today we will solve a problem with the suffix automaton.

Little ho: Good! So let's get started!

Little hi: Now we want to find out the number of occurrences of the most frequently occurring substrings in a substring of length k for K=1..length (S). Little Ho, do you have any ideas?

Little ho: I have a naive idea. Last week we already knew that for a state in Sam St,endpos (ST) is the St, which contains the substring in s where all the end positions (the St contains all the substrings have the same end position collection). Each different end position corresponds to the occurrence of a number of occurrences. So if we can find endpos in the process of constructing Sam, we should be able to solve this problem.

Little hi: You have a good idea, but the complexity is somewhat high. We use |endpos (ST) | To represent the size of Endpos (ST). So for a state st,|endpos (ST) | Worst possible to reach the level of O (length (s)), all States of |endpos (ST) | And the worst may also reach the O (length (s) ^2) level.

Small hi: For example, for s= "AAAAA", its status is as follows, easy to find Σ|endpos (ST) | = 1 + 2 + 3 + ....

Status Sub-string Endpos
S Empty string
1 A {1,2,3,4,5}
2 Aa {2,3,4,5}
3 Aaa {3,4,5}
4 Aaaa {4,5}
5 Aaaaa {5}

Little ho: So if we maintain endpos for each state, then the complexity is at least o (length (S) ^2) Oh? Can we only maintain the size of the Endpos (ST), that is, |endpos (ST), rather than maintaining the specific endpos (ST)? Just as we did last week only maintained the MaxLen (ST) and Minlen (ST), rather than maintaining the specific substrings (ST).

Little hi: You have a very good idea. Unfortunately, if you build Sam with last week's incremental method, maintenance of |endpos (ST) |, the extra cost is a bit high. For example, let's say we've built Sam for s= "AAAAA":

Status Sub-string Endpos |endpos|
S Empty string
1 A {1,2,3,4,5,6} 5
2 Aa {2,3,4,5,6} 4
3 Aaa {3,4,5,6} 3
4 Aaaa {4,5,6} 2
5 Aaaaa {5,6} 1

When we add a character ' a ', build the Sam s= "AAAAAA":

Status Sub-string Endpos |endpos|
S Empty string
1 A {1,2,3,4,5,6} 6
2 Aa {2,3,4,5,6} 5
3 Aaa {3,4,5,6} 4
4 Aaaa {4,5,6} 3
5 Aaaaa {5,6} 2
6 Aaaaaa {6} 1

You will find that the previous state 1-5 needs to be modified, and their |endpos| have increased by 1.

Small ho: So if we maintain |endpos (ST) |, the worst-case complexity will be O (length (S) ^2). So what do we do?

Little hi: We need to change our mind and not pursue the |endpos (ST) | In the process of constructing Sam. Instead, the SAM is constructed first, and then the |endpos (ST) of each state is calculated separately. Or take s= "AABBABD" as an example.

Small hi: This time we do not consider transition Function, leaving only suffix Links. In addition, if a state can accept (that is, include) a prefix of S, we will mark the status as Green. For example, status 4 contains "Aabb", and State 7 contains "Aabbab".

Td>1
status substring Endpos
S empty string {0,1,2,3,4,5,6}
a {1,2,5}
2 aa {2}
3 aab {3}
4 aabb,abb,bb {4}
5 b {3,4,6}
6 aabba,abba,bba,ba {5}
7 aabbab,abbab,bbab,bab {6}
8 A b {3,6}
9 aabbabd,abbabd,bbabd,babd,abd,bd,d {7}

Little hi: According to the basic concept of last week, we know that suffix links all the states in the Sam into a tree, and the Endpos collection between father and son has a relationship, the endpos between the non-grandchildren is an empty set. (Do you remember this theorem?) For the two substrings of S S1 and S2, it is advisable to set length (S1) <= length (S2), then S1 is the suffix of S2 when and only if Endpos (S1) ⊇endpos (S2), S1 is not the suffix of S2 when and only if Endpos (S1) ∩ Endpos (S2) =∅). Can we "bottom up" find out all the states of |endpos (ST) |?

Little ho: It seems a little interesting. You keep talking.

Little hi: Let's start with 2 specific examples to analyze this problem. The first example is the state 8, which assumes that we require |endpos (8) |. We know that State 8 has two sons which are state 3 and state 7 (i.e. slink[7]=slink[3]=8), where Endpos (3) ={3}, Endpos (7) ={6}, then |endpos (8) | What is it?

Small ho: Looks endpos (8) =endpos (3) ∪endpos (7). So |endpos (8) | = |endpos (7) | + |endpos (3) |?

Little hi: Let's look at one more example, state 1, assuming we require |endpos (1) |. We know that State 1 has two sons which are state 2 and state 6 respectively, of which endpos (2) ={2}, Endpos (6) ={5}, then |endpos (1) | What is it?

The Little Ho:endpos (1) is {1, 2, 5}, not Endpos (2) ∪endpos (6) = {2, 5}, one more element 1.

Little hi: Do you have any ideas through these two examples?

Little ho: we get it. A State St corresponds to |endpos (ST) | is at least the sum of its son's endpos size. This is still relatively easy to prove. Assuming that x and Y are the two sons of St, then according to the definition of suffix link, we know that the substring in St is the suffix of the X neutron string and the suffix of the y neutron string. So Endpos (ST) ⊇endpos (x) and Endpos (ST) ⊇endpos (y). And according to the definition of suffix link, we know that the substring in X is definitely not a suffix of the y neutron string, and vice versa, so Endpos (x) ∩endpos (y) =∅. So |endpos (ST) | >= |endpos (x) | + |endpos (y) |.

Little hi: So |endpos (ST) | may be much larger than the endpos size of St's son?

Little ho: It's 1 bigger. Also the case of the large 1 is when and only if St is the green State mentioned above, that is, when St contains a prefix of S. We analyze Endpos (1) ={1, 2, 5} to find out that it is more than Endpos (2) ∪endpos (6) = {2, 5} Out of the end position 1 because State 1 also contains the prefix "a" of length 1 of S. The more general case is that if a State St contains a prefix of S S[1..L], then there must be L∈endpos (ST), and L cannot inherit from St's son. This will require +1.

Little hi: Yes. So how do we tell which states should be labeled green?

Little ho: You can do it when you're building Sam. Looking back at our algorithm of constructing Sam, when we add a new character, we will at least create a new state Z (and possibly create a new state y), which must be a green state (Y must not be).

Little hi: Yes, let's recap. Build the SAM first and mark the green status. Then suffix link to the tree "bottom up" to find out each state of the |endpos (ST) |, this step "from the bottom up" can be done by topological sequencing, we have said earlier, no longer repeat.

Small ho: To find out the |endpos (ST) of each State, we also need to ask for the maximum number of substrings per length. I have a question about this step. Suppose Ans[l] represents the maximum number of occurrences of a substring of length L. My idea is that for each State St, you have to cycle through the values of |endpos (ST) | Update Ans[minlen (ST)] ... ans[maxlen (ST)]. This step complexity seems to be O (length (S) ^2), this is not fall short it? I wrote the pseudo code below.

FOREACH state St:for i = Minlen (ST): MaxLen (ST):    ans[i] = max (Ans[i], |endpos (ST) |)

Little hi: The question you mentioned is very good. This is one of the last problems we have to solve. Notable is ans[1], ans[2], ... ans[length (S)] must be a monotonically decreasing sequence. So we only need to update Ans[maxlen (ST) for each State St. After making i = Length (S)-1: 1, scan forward from the back, make ans[i] = max (Ans[i], ans[i+1]), you can. Pseudo-code as follows, you have a closer look.

FOREACH State ST:    Ans[maxlen (st)] = max (Ans[maxlen (ST)), |endpos (ST) |) For i = Length (S)-1: 1:    ans[i] = max (Ans[i], ans[i+1])

Little Hi told me it was too good to say anything.

1 /*************************************************************************2 > File:main.cpp3 > Author:you siki4 > Mail: [email protected]5 > time:2016 December 23 Friday 15:14 18 seconds6  ************************************************************************/7 8#include <bits/stdc++.h>9 Ten //using namespace std; One  A Const intMAXN =2000005; -  - /*Automaton*/ the  - intLast =1; - intTail =2; - intFAIL[MAXN]; + intSTEP[MAXN]; - intFLAG[MAXN]; + intnext[maxn][ -]; A  atInlinevoidBuildautomaton (Char*s) - { -      while(*s) -     { -         intp =Last ; -         intt = tail++; in         intc = *s++-'a'; -  toFLAG[T] =true; +STEP[T] = Step[p] +1; -  the          while(P &&!)Next[p][c]) *Next[p][c] = t, p=Fail[p]; $ Panax Notoginseng         if(P) -         { the             intQ =Next[p][c]; +             if(Step[q] = = Step[p] +1) AFAIL[T] =Q; the             Else +             { -                 intK = tail++; $FAIL[K] =Fail[q]; $FAIL[Q] = fail[t] =K; -STEP[K] = Step[p] +1; -                  for(inti =0; I < -; ++i) theNext[k][i] =Next[q][i]; -                  while(P && next[p][c] = =q)WuyiNext[p][c] = k, p =Fail[p]; the             } -         } Wu         Else -FAIL[T] =1; AboutLast =T; $     } - } -  - intQUE[MAXN]; A intCNT[MAXN]; + intANS[MAXN]; the  -InlineintSolveandprintanswer (Char*s) $ { the     intHD =0, TL =0, n =strlen (s); the  the      for(inti =1; i < tail; ++i) the++Cnt[fail[i]]; -  in      for(inti =1; i < tail; ++i) the         if(!cnt[i]) que[tl++] =i; the  About      while(HD! =TL) the     { the         intt = que[hd++]; theFLAG[FAIL[T]] + =Flag[t]; +         if(--cnt[fail[t]] = =0) -que[tl++] =Fail[t]; the     }Bayi  the      for(inti =1; i < tail; ++i) the         if(Ans[step[i]) <Flag[i]) -Ans[step[i]] =Flag[i]; -  the      for(inti = n; I --i) the         if(Ans[i] < Ans[i +1]) theAns[i] = ans[i +1]; the  -      for(inti =1; I <= N; ++i) theprintf"%d\n", Ans[i]); the } the 94 /*MAIN FUNC*/ the  the CharSTR[MAXN]; the 98Signed Main (void)  About { -scanf"%s", str);101 Buildautomaton (str);102 solveandprintanswer (str);103}

@Author: Yousiki

Hihocoder suffix Automaton three • Repeat Melody 6

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.