View original question
Test instructions is roughly: give you a string count the number of occurrences of all prefixes in this polygon. For example, the string Abab,a appears 2 times. AB appeared 2 times, and Aba appeared 1 times. Abab appeared 1 times. Total 6 times.
And it turned out to be too big. Requires a modulo operation on 1007.
AC Code
#include <iostream>using namespace std; #include <string>string s;int n,next[200005];void getNext () { int len = n; Next[0]=-1; int i=0,j=-1; while (I<len) { if (j==-1| | S[I]==S[J]) { ++i; ++j; next[i]=j; } else j = next[j]; }} void Main () { int t; cin>>t; while (t--) { cin>>n; cin>>s; GetNext (); int sum=0; for (int i=1;i<=n;i++) { int j=i; while (j) { sum = (sum+1)%10007; j = next[j]; } } cout<<sum<<endl; }}
An overview of the next array of KMP
Paste code is not the goal, the interpretation algorithm is the key.
。 The idea of solving the problem is to use the KMP algorithm, but not the complete KMP algorithm. It only uses its next array of methods.
However, this is the key to the KMP algorithm itself. The implementation of the next array in the GetNext function in the above code is a classic implementation. Template code.
Very easy to find. The key here is to explain the idea of the next array.
In the sky-flying network data, the next array is represented in roughly two ways:
- Next array the first bit is-1
- Next array the first bit is 0
Basically the same. Here I use the first element is the solution of 1, to note that if the first element is-1 of the scheme. Then the size of the next array is the pattern string length +1! To give a sample example:
Subscript |
0 |
1 |
2 |
3 |
4 |
Pattern string |
A |
B |
A |
B |
|
Next array |
-1 |
0 |
0 |
1 |
2 |
Of course, here I'm representing a C + + string string. Not a C-style string, so I didn't write '. ' If it's a C-style string (a character array) then the red part is ' s '. It's just not the point, is it?
In the KMP algorithm, there are two general understandings of the next array ( take the first element of the next array-1 For example, which is slightly different for the 0 O'Clock statement ):
- The pattern string should backtrack where the pattern string is missing from the main string somewhere.
- Ends with the previous bit of the current position, with the maximum length of the preceding string that matches the prefix.
Here are some explanations for these two points:
1th
For example, there is a main string abacabab. There is a pattern string abab. Whether to include a pattern string from within the main string. Then we traverse two strings in turn, and if we traverse the two strings there are two pointers (pointers in logical sense). or called the cursor.
At the beginning, the first three bits can be matched.
Subscript |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
Main string |
A |
B |
A |
C |
A |
B |
A |
B |
Pattern string |
A |
B |
A |
B |
|
|
|
|
Then the subscript is at 3. That is, the red part is missing.
The simple string matching algorithm is to move the pointer of the main string to the subscript 1. The pattern string pointer is zeroed, which moves to the first place. However, this is obviously an inefficient operation.
The KMP algorithm is in such a case. Does not change the main string pointer, only changes the pattern string pointer, therefore the KMP algorithm also called no backtracking KMP algorithm . So the pattern string pointer changes to what, it depends on the next array.
In the above example, the next label is mismatch at 3, then go to see next[3], yes, 1.
So
Subscript |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
Main string |
A |
B |
A |
C |
A |
B |
A |
B |
Pattern string |
|
|
A |
B |
A |
B |
|
|
Move the pointer of the pattern string directly to the subscript at 1. Re-mismatch, then observe next[1] =0. Continue the process over and over again. Until the traversal is complete. KMP algorithm efficiency is O (m+n), where M and n are respectively the length of the main string and the pattern string.
2nd
Let's look at the table for the next array again.
Subscript |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
Main string |
A |
B |
A |
C |
A |
B |
A |
B |
Pattern string |
A |
B |
A |
B |
|
|
|
|
- The next mark is 1 o'clock, to see the string before it, that is, to see a, self-matching does not count.
The next array is 0.
- The current mark is 2 o'clock, to observe that Ab,a and B do not match. The next array is 0.
- Now labeled 3 o'clock, to observe the ABA, the end of a and prefix a match, because the match length is 1, so the next array is 1.
- The current label is 4 o'clock. To observe Abab, the AB at the end matches the prefix AB, since the match length is 2 so the next array is 2.
Back to the subject
In the code:
int sum=0; for (int i=1;i<=n;i++) { int j=i; while (j) { sum = (sum+1)%10007; j = next[j]; } }
Used to solve all prefix occurrences and. So why is that?
First look at the for loop, traversing from 1 to N, you should be very clear.
Our next array has one more length than the string length.
while (j) is caused by the case that I = 1,2,3......N in the For loop will make the sum+1.
This is very well understood because, for example, Abab, then a. Ab. Aba,abab. These 4 prefixes will definitely count to 1, right? Then a string of length n will at least make sum+n.
And then the next is J = Next[j]. Next we'll use reverse thinking to explain, another example. Another is the sum of the string Ababa, which is the number of occurrences of the prefix and. We are able to get its next array:
Subscript |
0 |
1 |
2 |
3 |
4 |
5 |
Pattern string |
A |
B |
A |
B |
A |
|
Next array |
-1 |
0 |
0 |
1 |
2 |
3 |
into the code above. Compared with abab, there is only one more person. So look directly at the time I equals N (n is 5). On the basis of sum=6 (the sum value of ABAB is 6).
- J=i=5 //represents the longest prefix of length 5 for Ababa
- while (j) was established. Sum=6+1=7
- J=next[5]=3 //denotes aba This prefix of length 3
- while (j) established, Sum=7+1=8
- J=next[3]=1 //indicates a minimum prefix length of 1
- while (j) was established. Sum=8+1=9
- J=next[1]=0.
- while (j) is not established, ends.
- Finally sum=9
To understand the above gaze part, you need to go back to the front to see the 2nd on the next array interpretation.
hdu3336 interpreting the next array of KMP algorithms