[Suffix array and Statistics]

Source: Internet
Author: User

The suffix array has always been very powerful, and its statistical function is one aspect.

The following uses two examples to illustrate how to use a powerful suffix array for statistics.

[Example 1] poj 3415 http://poj.org/problem? Id = 3415

This is the number of substrings with length> = K in the two strings (repeated)

Practice: first, the most common practice is to use a suffix array to link two strings, add a non-existent character in the middle, and evaluate the heigh array for the new string, first, we can add the LCP of the front and the LCP of the B if we encounter a, and also add the LCP of the front and the if we encounter B. Now the question is how to add the LCP, O (N ^ 2) is the algorithm obviously unfeasible? There is something called"Monotonous Stack"As the name suggests, its elements are monotonous. The most common use of the monotonic stack is to obtain the maximum area of the submatrix, Which is similar now, because the LCP of any two suffixes is the minimum value of the interval. For more details, refer to thisBlogBytes.

# Define maxn 200100int wa [maxn], WB [maxn], WV [maxn], WSS [maxn]; int R [maxn], sa [maxn]; int CMP (int * r, int A, int B, int L) {return R [a] = R [B] & R [A + L] = R [B + L];} void da (int * r, int * Sa, int N, int m) {int I, j, P, * x = wa, * Y = WB, * t; for (I = 0; I <m; I ++) WSS [I] = 0; for (I = 0; I <n; I ++) WSS [x [I] = R [I] ++; for (I = 1; I <m; I ++) WSS [I] + = WSS [I-1]; for (I = n-1; I> = 0; I --) SA [-- WSS [x [I] = I; for (j = 1, P = 1; P <n; j * = 2, M = P) {for (P = 0, I = N-J; I <n; I ++) Y [p ++] = I; for (I = 0; I <n; I ++) if (SA [I]> = J) Y [p ++] = sa [I]-J; for (I = 0; I <n; I ++) WV [I] = x [Y [I]; for (I = 0; I <m; I ++) WSS [I] = 0; for (I = 0; I <n; I ++) WSS [wv [I] ++; for (I = 1; I <m; I ++) WSS [I] + = WSS [I-1]; for (I = n-1; I> = 0; I --) sa [-- WSS [wv [I] = Y [I]; for (t = x, x = Y, y = T, P = 1, X [SA [0] = 0, I = 1; I <n; I ++) x [SA [I] = CMP (Y, sa [I-1], sa [I], j )? P-1: P ++;} return;} int rank [maxn], height [maxn]; // rank [I]: the number of the I-th row; SA [I]: where is the suffix string of the column I? inverse void calheight (int * r, int * Sa, int N) for each other {// n no 1 int I, j, k = 0; for (I = 1; I <= N; I ++) rank [SA [I] = I; for (I = 0; I <N; height [rank [I ++] = k) {for (K? K --: 0, j = sa [rank [I]-1]; R [I + k] = R [J + k]; k ++ );} return;} Char A [maxn], B [maxn]; int Fa [maxn], FB [maxn], FC [maxn]; // fa records each height, which string does the FB record belong to? After the FC record is merged, the current minimum value is h continuous "rectangle" number int st [maxn]; int main () {int K; int I, J; while (scanf ("% d", & K) {int Len; scanf ("% S % s", a, B ); int LA = strlen (a); A [la] = '#'; A [La + 1] = '\ 0'; strcat (A, B ); len = strlen (a); for (I = 0; I <Len; I ++) {R [I] = A [I];} R [Len] = 0; da (R, SA, Len + 1,199); calheigh T (r, SA, Len); for (I = 0; I <= Len; I ++) {FB [I] = (SA [I] <la ); fa [I] = (height [I]> = k )? Height [I]-k + 1: 0;} ll ans = 0; ST [0] =-1, Fa [Len + 1] = 0; For (j = 0; j <= 1; j ++) {ll sum = 0; For (INT Top = 0, I = 2; I <= Len; I ++) {If (FB [I]! = J) ans + = sum; ST [++ top] = Fa [I + 1]; FC [Top] = (FB [I] = J ); sum + = (LL) ST [Top] * (LL) FC [Top]; while (ST [Top-1]> = sT [Top]) {sum-= (LL) (St [Top-1]-ST [Top]) * (LL) FC [Top-1]; st [Top-1] = sT [Top]; FC [Top-1] + = FC [Top]; // merge range top --;}}} printf ("% i64d \ n", ANS);} return 0 ;}

[Example 2]

E. prefix sumtime limit: 6000/3000 ms (Java/other) memory limit: 65535/32768 K (Java/other) total submission (s): 88 accepted submission (s): 14 Font: times New Roman | verdana | georgiafont size: direction → problem descriptiona string V is a suffix string of a string W if string V can read from a position of string W and to the end of W.
For example, string BC is a suffix string of ABC. But AB is not.
A string V is a prefix string of a string W if string V can read from the beginning of string W.
For example, string AB is prefix string of string ABC, but BC and ABCD are not.

For 2 strings S1 and S2, if there is a string S3 is both the prefix of S1 and S2, we call S3 is a common prefix of S1 and S2.
The longest common prefix of 2 strings is the longest common prefix string of all the common prefix strings among these 2 strings.

Your task is:
Give you the string, count the sum of the length of each of the longest common prefix string of each 2 Suffix of the string. inputthere are multi strings. one String per line. each string is no longer than 10 ^ 5. the strings only contain A-Z and a-z.OutputFor each string, output the sum. sample Input

ABCABABAAABB
Sample output
072
Sourcescaucpc 2012

This is a "problem" of the huanong University competition. In fact, it is not difficult to use a suffix array.

The question is simple: Give a string, calculate the length of the longest prefix of the two suffixes, and sum them.

Method: suffix array + monotonous stack Optimization

# Define maxn 100100int wa [maxn], WB [maxn], WV [maxn], WSS [maxn]; int R [maxn], sa [maxn]; int CMP (int * r, int A, int B, int L) {return R [a] = R [B] & R [A + L] = R [B + L];} void da (int * r, int * Sa, int N, int m) {int I, j, P, * x = wa, * Y = WB, * t; for (I = 0; I <m; I ++) WSS [I] = 0; for (I = 0; I <n; I ++) WSS [x [I] = R [I] ++; for (I = 1; I <m; I ++) WSS [I] + = WSS [I-1]; for (I = n-1; I> = 0; I --) SA [-- WSS [x [I] = I; for (j = 1, P = 1; P <n; j * = 2, M = P) {for (P = 0, I = N-J; I <n; I ++) Y [p ++] = I; for (I = 0; I <n; I ++) if (SA [I]> = J) Y [p ++] = sa [I]-J; for (I = 0; I <n; I ++) WV [I] = x [Y [I]; for (I = 0; I <m; I ++) WSS [I] = 0; for (I = 0; I <n; I ++) WSS [wv [I] ++; for (I = 1; I <m; I ++) WSS [I] + = WSS [I-1]; for (I = n-1; I> = 0; I --) sa [-- WSS [wv [I] = Y [I]; for (t = x, x = Y, y = T, P = 1, X [SA [0] = 0, I = 1; I <n; I ++) x [SA [I] = CMP (Y, sa [I-1], sa [I], j )? P-1: P ++;} return;} int rank [maxn], height [maxn]; // rank [I]: the number of the I-th row; SA [I]: where is the suffix string of the column I? inverse void calheight (int * r, int * Sa, int N) for each other {// n no 1 int I, j, k = 0; for (I = 1; I <= N; I ++) rank [SA [I] = I; for (I = 0; I <N; height [rank [I ++] = k) {for (K? K --: 0, j = sa [rank [I]-1]; R [I + k] = R [J + k]; k ++ );} return;} Char STR [maxn]; int C [maxn]; // how many intervals can the minimum value H of a C array record extend, merge int st [maxn] at the same time as the stack operation; // handwritten stack int DP [maxn]; // The number of rectangles in front of the DP Array records, similar to dpint main () {While (scanf ("% s", STR )! =-1) {int I, j; int n = strlen (STR); for (I = 0; I <n; I ++) {R [I] = STR [I];} da (R, SA, N + 1,199); calheight (R, SA, n); LL ans = 0; st [0] =-1; height [n + 1] = 0; DP [0] = 0; For (INT Top = 0, I = 1; I <= N; I ++) {st [++ top] = height [I + 1]; C [Top] = 1; while (ST [Top-1]> = sT [Top]) {st [Top-1] = sT [Top]; c [Top-1] + = C [Top]; top --;} ans + = C [Top] * st [Top] + dp [Top-1]; DP [Top] = DP [Top-1] + C [Top] * st [Top];} printf ("% i64d \ n", ANS);} return 0 ;}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.