Construct suffix tree with suffix array

Source: Internet
Author: User

Because Konjac Konjac azui ago Time busy preparing to save the election, and in the province selected muffled rolling big thick, blog stop for a long period.

After the election of the whole people all kinds of decadent, all day playing mud or something ...

The previous period of time to learn the suffix array when the Internet to check the relevant data, see that the suffix array and suffix tree can be converted to each other, and there are a lot of uoj on the suffix of the machine to build a suffix tree and then DFS traversal to obtain the suffix array template, but the suffix array to build suffix tree data is really scarce.

Perhaps Daniel all think this xjbyy can write, so do not find the corresponding code on the Internet, then I would like to fill a hole. Daniel do not spray.

Let's talk about my understanding.

The reasoning suffix array and suffix tree should be completely equivalent, but the former two and the suffix automata are not equivalent, different.

The advantage of the suffix tree is that it is the suffix trie, which can be constructed manually when the data size is small. At the same time there are a large number of tree-related algorithms and data structures, so the suffix tree can easily and tree chain, multiplication, virtual tree combination, high efficiency. At the same time, due to the hierarchical sense of the tree, DP when the statistical point is convenient. The disadvantage is that the structure is more obscure, I will not ukkonen algorithm, before the suffix tree is offline Sam constructs.

The advantage of the suffix array is that it is short and simple, and the space does not need a multiplication character. The disadvantage is the lack of hierarchical sense, do DP when you need to combine and check sets, such as the monotonous stack of things.

Therefore, it is complementary to use suffix arrays to construct suffix trees. Especially in some situations where strings are particularly long, constructed in the form of Ukk or Sam, the memory multiply on the character set size will be larger, and then when the character set is very large, Sam's transfer pointer must be saved with a map, more than one log in time. One advantage of postfix array construction suffix trees is that space-time complexity is not affected by character sets (of course, the character set exceeds n needs to be discretized).

No real suffix trees have been implemented before, and some of the representations may not be the same as the classical representations in the Ukk algorithm.

The specific algorithm flow is relatively simple. The suffix array is constructed first, so that sa[i] represents the position of the suffix of rank I, and height[i] represents LCP (Sa[i],sa[i-1]). The process of traversing the SA array is also the Dfs order of the leaf nodes in the suffix tree, and the height array is the length of the LCP of the adjacent two leaves, and then each time a leaf is added to the depth exactly to the height, and if there is no such node, the edges that cover the depth of the height are separated, Add a point to make it exactly height. Then add an edge to the end node of the current suffix, taking into account that an edge in the suffix tree will contribute so many essential different substrings of the edge, so that the side length of this edge is the current suffix of n-sa[i]-height[i].

This tree is stored with a chain-forward star, so it is perfectly perfect for efficiency by having to separate the edges that must be present on the table head of the side table. Note, however, that because of the characteristics of the forward star, the edge is reversed, and the DFS sequence after the suffix tree is built is the inverse of the SA array. But this generally does not affect the problem solving. If you do, you can change to a vector map, or manually flip the elements in the list without affecting the complexity of the time.

The function that constructs the suffix tree is O (n), regardless of the suffix array part. Although there are two layers of loops, it is obvious that one side of the suffix tree will only be accessed once at the time of the new, and will only be traced back to the edge, and the edge will never be accessed again, and the suffix tree is only linear if it is not more than the 2n-2 edge.

Note that the generic suffix tree is prefixed with one last character to prevent one suffix from becoming another suffix. This may affect the answer on some topics, and the actual implementation may not do so, just hit the tag to see if a node is the end of a suffix.

The code is implemented as follows. Because it is an array of suffixes implemented with multiplication, it is slower. If the pursuit of speed can be replaced by the latest O (n) suffix induction sorting algorithm (SAIS), Sais+build actually run very fast.

#include <iostream> #include <algorithm> #include <cstdio> #include <cstring> #include < Assert.h> #define REP (i,a,b) for (int. i=a;i<=b;++i) #define ERP (I,A,B) for (int i=a;i>=b;--i) #define LL Long longusing namespace Std;const int maxn = 100005;const int maxs = MAXN * 2;char s[maxn];int n;int R[MAXN], RNK[MAXN], Heigh T[MAXN], Sa[maxn];namespace sa{int WA[MAXN], WB[MAXN], WS[MAXN], Wv[maxn];bool cmp (int*r, int A, int b, int l) {return r[a] ==R[B] && r[a+l]==r[b+l];} void Da (int*r, int n, int m) {int I, J, p, *x=wa, *y=wb;for (i=0;i<m;++i) ws[i]=0;for (i=0;i<n;++i) ws[x[i]=r[i]]++;for (i=1;i<m;++i) Ws[i] + ws[i-1];for (i=n-1;i>=0;--i) sa[--ws[x[i]] = i;for (j=1; j<n; j*=2, M=p) {for (p=0,i=n-j;i <n;++i) y[p++]=i;for (i=0;i<n;++i) if (sa[i]>=j) y[p++]=sa[i]-j;for (i=0;i<m;++i) ws[i]=0;for (i=0;i<n; ++i) wv[i]=x[y[i]];for (i=0;i<n;++i) ws[wv[i]]++;for (i=1;i<m;++i) ws[i]+=ws[i-1];for (i=n-1;i>=0;--i) sa[-- Ws[wv[i]] = y[i];for (Swap (x, y),i=p=1,x[sa[0]]=0; I<n;++i) X[sa[i] = CMP (y,sa[i],sa[i-1],j)? p-1:p++;}} void GetHeight (int*r, int n) {Rep (i, 1, n) rnk[sa[i]]=i;for (int i=0,j=0,k=0; i<n; height[rnk[i++]]=k) for (K?k--:0,j=sa [Rnk[i]-1]; R[I+K]==R[J+K]; ++k);}} The default 1 is the root int last = 1, lastlen, ncnt = 1;int FA[MAXN]; The father of the suffix tree int DIS[MAXN]; Node depth bool EN[MAXN]; Labels whether a node is the end of a suffix struct Ed {int to, Len; Ed *NXT;} EDGES[MAXS], *ecnt = Edges, *adj[maxs];void adde (int A, int b, int c)//by Father to son even one-way side {fa[b] = A;//noticedis[b] = Dis[a] + C ;(++ecnt)->to = B;ecnt->len = C;ECNT-&GT;NXT = Adj[a];adj[a] = ecnt;} void Buildsuffixtree () {for (int i = 0; i<n; ++i) R[i] = S[i]+1;sa::d A (R, N+1, N), Sa::getheight (R, n.), Adde (1, ++ncnt, N-SA[1]); last = Ncnt;en[last] = 1;for (int i = 2; i<=n; ++i) {Int. H = height[i];int p = Last, Nowlen = N-sa[i]-height[i] ; int np = ++ncnt;while (Dis[p] > h) p = fa[p];int br = h-dis[p];if (BR)//split the edge{int q = ++ncnt, t = adj[p]-& Gt;to;int len = adj[p]->len; Keep the original side length and point to fa[q] = p; Create aThe point is placed in the middle of the original edge so that its depth is exactly height[i]adj[p]->len = Br;adj[p]->to = Q;dis[q] = dis[p] + br;adde (q, T, len-br);p = q;} Adde (P, NP, Nowlen); last = np;en[np] = 1;}} int main () {scanf ("%s", s), n = strlen (s); Buildsuffixtree (); return 0;}

Practical Walkthrough: bzoj4199, noi2015 wine Tasting Conference

Resolution: The maximum product must be multiplied by the maximum and secondary or minimum and sub-small values, the tree DP when the record can be.

Effect: No add fread and fwrite, no constant optimization, speed barely into the big field of view the first page, enough to illustrate the reliability of the algorithm.

An array of suffixes implemented using the SAIS algorithm. Because I do not understand this algorithm, the direct sticky SAIS algorithm template embarrassed.


#include <iostream> #include <algorithm> #include <cstdio> #include <cstring> #include < Assert.h> #define REP (i,a,b) for (int. i=a;i<=b;++i) #define ERP (I,A,B) for (int i=a;i>=b;--i) #define LL Long longusing namespace Std;const int maxn = 300005;const int maxs = MAXN * 2;const int inf = 0x3f3f3f3f;const LL inf = 1ll&lt ; <61;template<typename t>inline void Get (T&AMP;R) {register char c,f=0; r=0;do {C=getchar (); if (c== '-') f=1;} while (c< ' 0 ' | | C> ' 9 ');d o r=r*10+c-' 0 ', C=getchar (); while (c>= ' 0 ' &&c<= ' 9 '); if (f) r=-r;} Char s[maxn];int n;int RNK[MAXN], HEIGHT[MAXN], SA[MAXN]; namespace SA {int s[maxs], t[maxs];int P[MAXN], CNT[MAXN], CUR[MAXN]; #define PUSHS (x) sa[cur[s[x]]--] = X#define PUSHL (x) sa[cur[s[x]]++] = X#define Inducedsort (v) fill_n (SA, N,-1);  Fill_n (CNT, M, 0);  for (int i = 0; i < n; i++) cnt[s[i]]++;   for (int i = 1; i < m; i++) cnt[i] + = cnt[i-1]; for (int i = 0; i < m; i++) cur[i] = cnt[i]-1;for (int i = n1-1; ~i; i--) pUSHS (V[i]); for (int i = 1; i < m; i++) cur[i] = cnt[i-1];for (int i = 0; i < n; i++) if (Sa[i] > 0 && t[sa[i]-1]) p UshL (sa[i]-1);  for (int i = 0; i < m; i++) cur[i] = cnt[i]-1;for (int i = n-1; ~i; i--) if (Sa[i] > 0 &&!t[sa[i]-1]) pushs (sa[i]-1) void sais (int n, int m, int *s, int *t, int *p) {int N1 = t[n- 1] = 0, ch = rnk[0] =-1, *s1 = s+n;for (int i = n-2; ~i; i--) t[i] = s[i] = = s[i+1]? T[I+1]: S[i] > s[i+1];for (int i = 1; i < n; i++) rnk[i] = T[i-1] &&!t[i]?  (P[n1] = i, n1++): -1;inducedsort (P); for (int i = 0, x, y; i < n; i++) if (~ (x = Rnk[sa[i])) {if (ch < 1 | | p[x+1] -P[X]! = p[y+1]-p[y]) Ch++;else for (int j = P[x], k = p[y]; J <= P[x+1]; j + +, k++) if ((S[j]<<1|t[j])! = (s[k ]<<1|t[k]) {ch++; break;} S1[y = x] = ch;} if (Ch+1 < N1) Sais (N1, ch+1, S1, T+n, p+n1); else for (int i = 0; i < N1; i++) Sa[s1[i]] = i;for (int i = 0; i < N1; i++) S1[i] = P[sa[i]];inducedsort (S1);} Template<typename T>int mapchartoint (int n, const T *str) {int m = *max_element (str, str+n); Fill_n (RNK, m+1, 0); for (int i = 0; i < n ;  i++) Rnk[str[i] = 1;for (int i = 0; i < m; i++) rnk[i+1] + = rnk[i];for (int i = 0; i < n; i++) s[i] = Rnk[str[i]]- 1;return rnk[m];} Template<typename t>void Suffixarray (int n, const T *str) {int m = mapchartoint (++n, str); Sais (n, M, S, T, p); for (i NT i = 0; I < n;  i++) Rnk[sa[i] = i;for (int i = 0, h = height[0] = 0; i < n-1; i++) {int J = sa[rnk[i]-1];while (I+h < n && J+h < n && s[i+h] = = S[j+h]) h++;if ((height[rnk[i]] = h)) h--;}}; int last = 1, lastlen, ncnt = 1;int Fa[maxs], Dis[maxs], EN[MAXS]; LL A[MAXS]; LL Bg1[maxs], Bg2[maxs], Sm1[maxs], SM2[MAXS]; LL Cnt[maxs], Mx[maxs], F[maxs], g[maxs];struct Ed {int to, Len; Ed *NXT;} EDGES[MAXS], *ecnt = Edges, *adj[maxs];void adde (int A, int b, int c) {fa[b] = a;dis[b] = Dis[a] + C; (++ecnt)->to = B;ec Nt->len = C;ECNT-&GT;NXT = Adj[a];adj[a] = ecnt;} void Buildsuffixtree ({int n = N;sa::suffixarray (n, s); Adde (1, ++ncnt, n-sa[1]); last = Ncnt;en[last] = 1;bg1[last] = Sm1[last] = a[sa[1]];for ( int i = 2; i<=n;  ++i) {int h = height[i];int p = Last, Nowlen = n-sa[i]-height[i];int np = ++ncnt;while (Dis[p] > h) p = fa[p];int br = h  -Dis[p];if (BR) {int q = ++ncnt, t = adj[p]->to;int len = adj[p]->len;fa[q] = P;adj[p]->len = Br;adj[p]->to = Q;DIS[Q] = Dis[p] + br;adde (q, T, len-br);p = q;} Adde (P, NP, Nowlen); last = Np;bg1[np] = SM1[NP] = A[SA[I]];EN[NP] = 1;}} int dfs (int u) {int siz = En[u], t;for (ed*p = adj[u]; p; p=p->nxt) {t = DFS (p->to); F[u] + = 1ll * t * siz;siz + = T;if (Bg1[p->to]>bg1[u]) bg2[u]=bg1[u], bg1[u]=bg1[p->to];else if (Bg1[p->to]>bg2[u]) bg2[u]=bg1[p->to ];if (Bg2[p->to]>bg2[u]) bg2[u] = Bg2[p->to];if (Sm1[p->to]<sm1[u]) sm2[u]=sm1[u], sm1[u]=sm1[p->to ];else if (Sm1[p->to]<sm2[u]) sm2[u]=sm1[p->to];if (Sm2[p->to]<sm2[u]) sm2[u] = sm2[p->to];} Cnt[dis[u]] + = F[u];if (siZ > 1) {g[u] = max (Sm1[u]*sm2[u], bg1[u]*bg2[u]); Mx[dis[u]] = max (Mx[dis[u]], g[u]);} return siz;} int main () {scanf ("%d", &n), scanf ("%s", s), Rep (i, 0, N-1) get (A[i]); Rep (i, 0, N) mx[i] =-inf;rep (i, 1, n*2) bg1[i]=bg2 [I]=-inf, Sm1[i]=sm2[i]=inf, G[i]=-inf;buildsuffixtree ();d FS (1), ERP (I, N-2, 0) cnt[i] + = cnt[i+1], mx[i] = max (mx[i), mx[ I+1]); Rep (i, 0, N-1) printf ("%lld%lld\n", Cnt[i], (cnt[i]?mx[i]:0)); return 0;}

Construct suffix tree with suffix array

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.