The suffix of stringSOf lengthL(1? ≤?L? ≤? |S|) Is stringS[|S|? -?L? +? 1 .. |S|].Your task is, for any prefix of stringSWhich matches a suffix of stringS, Print the number of times it occurs in stringSAs a substring.
InputThe single line contains a sequence of charactersS1S2...S|S| (1? ≤? |S|? ≤? 105)-stringS. The string only consists of uppercase English letters.
OutputIn the first line, print integerK(0? ≤?K? ≤? |S|)-The number of prefixes that match a suffix of stringS. Next printKLines, in each line print two integersLI CI. NumbersLI CIMean that the prefix of the lengthLIMatches the suffix of lengthLIAnd occurs in stringSAs a substringCITimes. Print pairsLI CIIn the order of increasingLI.
Sample test (s) inputABACABA
Output31 43 27 1
InputAAA
Output31 32 23 1
Question:
A string of no more than 10 ^ 5 characters. You need to output the length of the prefix that exactly matches the suffix by length. And the number of times that the prefix appears in the entire string. (Overlapping)
Ideas:
The prefix and suffix are displayed at the competition. In my heart, I was overjoyed. Haha. I have learned about Suffix Arrays. This is just a good place. After thinking about it, the algorithm has been formed. The suffix 0 is the entire string. There must be a prefix that matches the common prefix with the entire suffix. Then determine the number of occurrences. When you know that a suffix is a target suffix. You can know its rank value. Then, the suffix that must completely contain a suffix must be followed by it. Based on ranking rules. You want. If the prefix of suffix a contains suffix B. Will a be in front of B? The front is obviously short. So the rest of the work is to determine the maximum distance that can be expanded down. This can be determined based on the value of the height data. Binary + rmq is required. The location is determined by two points. Rmq determines whether the conditions are met. Although the train of thought is correct, it has always been wrong until the end of the game. Only later debugging will I know whether I have a deep understanding of the suffix array. The problem is why the multiplication algorithm requires that txt [n-1] = 0. j = sa [rank [I]-1]; rank [I] = 0. We can solve this problem by adding 0 to the end of the original string. For details, see the code:
# Include
Using namespace std; const int INF = 0x3f3f3f3f; const double eps = 1e-8; const double PI = acos (-1.0); const int maxn = 150010; char txt [maxn]; int sa [maxn], T1 [maxn], T2 [maxn], ct [maxn], he [maxn], rk [maxn], ans, n, m; // sa [I] indicates the starting position of the suffix of the ranking I. Int rmq [25] [maxn], lg [maxn], ansn [maxn], ansp [maxn], ptr; void getsa (char * st) // note that m is an ASCII code in the range of {int I, k, p, * x = T1, * y = T2; for (I = 0; I
= 0; I --) // inverted enumeration ensures the relative sequence of sa [-- ct [x [I] = I; for (k = 1, p = 1; p
= K) y [p ++] = sa [I]-k; // sort by the second keyword. y [I] indicates the start position of the suffix of the second keyword ranking I for (I = 0; I
= 0; I --) sa [-- ct [x [y [I] = y [I]; // sort by the first keyword for (swap (x, y), p = 1, x [sa [0] = 0, I = 1; I
> 1] + 1;} void solve () {int low, hi, mid, p, pos, a, B, ans, tp, I; getsa (txt ), gethe (txt), rmq_init (); ptr = 0, pos = rk [0]; for (I = n-2; I> 0; I --) {if (rk [I]
= N-i-1) {ansp [ptr] = p, tp = rk [I] + 1; low = rk [I] + 1, hi = n-1, ans =-1; while (low <= hi) {mid = (low + hi)> 1; if (rmq_min (tp, mid)> = p) ans = mid, low = mid + 1; else hi = mid-1;} ansn [ptr ++] = ans-rk [I] + 1 ;}} int main () {int I; prermq (); while (~ Scanf ("% s", txt) {m = 150, n = strlen (txt); n ++; solve (); ansp [ptr] = n-1; ansn [ptr ++] = 1; printf ("% d \ n", ptr); for (I = 0; I
The next step is the second approach. After the competition. The first idea cannot be adjusted. So I went to the group and asked. The result was despised by qijie. If you throw a sentence kmp, you will leave. Think about it. My IQ is deeply despised. Kmp can easily calculate the number of times each prefix appears in the original string. The specific method is to find a mismatch array for the original string. Then match with yourself. If the position I matches the position j, the prefix j appears at the position I. We use cnt [I] to record. The number of times that prefix I appears. Finally, Count cnt [next [I] + = cnt [I]. This is easy to understand. If the prefix j can appear at the position I, next [j] will certainly appear at the position I. Count the number of times each prefix appears in the original string. Now we need to find the number of prefix matching the money fix and suffix. This is simple. Do you match yourself with yourself by matching your first half and your other parts. So we only need to match the n + 1 position to find all the prefixes that match the suffix. The gorgeous O (n) is gone ....
For details, see the code:
# Include
Using namespace std; const int INF = 0x3f3f3f3f; const double eps = 1e-8; const double PI = acos (-1.0); const int maxn = 150010; char txt [maxn]; int f [maxn], cnt [maxn], ansp [maxn], ansn [maxn], ct, n; void getf (char * p) {int I, j; f [0] = f [1] = 0; for (I = 1; I
0; j --) // Why can this be done. The strings with different endpoints must be different. Kmp ensures different endpoints. If (f [j]) // f [j] indicates the position of the next comparison. Note that f [j]-1 must be the same. Cnt [f [j]-1] + = cnt [J-1]; while (t) // prefix matching suffix {ansp [ct] = t; ansn [ct ++] = cnt [T-1]; t = f [t];} printf ("% d \ n", ct); for (I = CT-1; i> = 0; I --) printf ("% d \ n", ansp [I], ansn [I]);} int main () {while (~ Scanf ("% s", txt) {n = strlen (txt); memset (cnt, 0, sizeof cnt); getf (txt); KMP ();}}