How to choose suffix array && suffix automata

Last Update:2017-12-13 Source: Internet

Author: User

Tags cmath

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Suffix family known Memberssuffix Treesuffix ArrayAutomatic suffix machinesuffix Cactussuffix prophecysuffix splay? The suffix tree is the ancestor of the suffix array and the suffix automaton? The function is still relatively powerful, it is useful in palindrome string or dictionary order. And now there's a linear way of doing it. (But in fact I didn't use the tree after that.) The following is a comparison of the suffix automata and suffix arrays

Single string problem The unequal number is "better than",&& is almost (the following is a personal feeling)
- 1 Repeating substrings
  - 1, 1 can cross longest repeat substring &NB Sp suffix automaton >= suffix array all are base But the former code is slightly shorter
  - 1, 2 cannot cross the longest repeating substring nbsp Suffix array >= suffix automaton The former is easy to judge Cross; the latter needs to record every Status all occurrences
  - 1, 3 cross K-times longest repeat substring suffix automaton ; = suffix array The former requires + two points; the latter does not need to be judged, directly topological out of each state The number of times
- 2 Sub-string number problem
  - 2,1 not identical substring suffix automata && suffix arrays are basic functions and are easy to implement.
- 3 Cyclic substring problems
  - 3,1 the minimum loop-after-suffix array suffixes automaton should not work.
  - 3,2 consecutive repeating substring suffix array with the largest number of repetitions
two character string problem
- 1 common substring problem
  - 1, 1 longest common string & nbsp suffix automaton && suffix array &NBSP ; All are basic functions
- 2 substring number problem
Multiple string issues
- 1 Common substring problems
  - The oldest string that appears in K-strings generalized suffix automaton >= suffix array (KMP can also find the longest common string of multiple strings ) (specific efficiency who high depends on data)
  - The oldest string generalized suffix automaton >= suffix array with k-times appearing in each of the strings
  - 1,3 the oldest string generalized suffix automaton that appears in each string or after inversion? suffix array

Other

Minimal notation: suffix automata
Minimum loop: suffix array

Personal feeling:

A single string and two strings of questions, basically with a suffix array or suffix automata can be achieved. Multi-string problem with generalized suffix automata is also very strong, there is humorous if you want to use a suffix array, you must use RMQ (tree-like array | | ST) + two points, even to use splay to solve. Of course, flexible use of the suffix array plus a variety of tools to solve problems, to deal with a variety of difficulties, after all, suffix automata is also limited. Individuals are more inclined to write suffix automata, feel good to achieve a bit, the code is also good-looking.

The following compares the processing of multiple string strings

The problem of generalized suffix automata:

POJ3294: Test Instructions : Given some template strings, find the longest common string, the longest common string that appears in at least half of the strings.

comparison: If it is an array of suffixes, then the +rmq of the two points, and the generalized suffix automata only need to record the location of the occurrence, the last pass can be.

#include <iostream>#include<cstdio>#include<algorithm>#include<cstring>#include<memory>#include<cmath>#defineMAXN 350003using namespacestd;intN,len,ans,max,now;Chars[1010],cap[1010];structsam{intch[maxn][ -],fa[maxn],maxlen[maxn],last,sz; intROOT,NXT[MAXN],SIZE[MAXN]; voidinit () {sz=0; Root=++sz; memset (Size,0,sizeof(size)); memset (ch[1],0,sizeof(ch[1])); memset (NXT,0,sizeof(NXT)); }    voidAddintx) {intNp=++sz,p=last; last=NP; memset (CH[NP],0,sizeof(CH[NP])); MAXLEN[NP]=maxlen[p]+1;  while(P&&!ch[p][x]) ch[p][x]=np,p=Fa[p]; if(!p) fa[np]=1; Else {            intq=Ch[p][x]; if(maxlen[p]+1==MAXLEN[Q]) fa[np]=Q; Else {                intnq=++sz; memcpy (Ch[nq],ch[q],sizeof(Ch[q])); SIZE[NQ]=SIZE[Q]; nxt[nq]=Nxt[q]; MAXLEN[NQ]=maxlen[p]+1; FA[NQ]=Fa[q]; FA[Q]=fa[np]=NQ;  while(p&&ch[p][x]==q) ch[p][x]=nq,p=Fa[p]; }        }         for(; np;np=FA[NP])if(nxt[np]!=Now ) {SIZE[NP]++; NXT[NP]=Now ; }Else  Break; }    voidDfsintXintD) {//Output       if(D!=maxlen[x] | | d>ans)return; if(Maxlen[x]==ans && size[x]>n) {puts (CAP);return; }  for(intI=0;i< -;++i)if(Ch[x][i]) {cap[d]=i+'a'; DFS (ch[x][i],d+1); cap[d]=0; } }}; Sam Sam;intMain () { while(~SCANF ("%d", &n) &&N)        {Sam.init ();  for(intI=1; i<=n;i++) {scanf ("%s", s+1); Sam.last=Sam.root; Len=strlen (s+1); now=i;  for(intj=1; j<=len;j++) Sam.add (s[j]-'a'); } Max=0; ans=0; n>>=1;  for(intI=1; i<=sam.sz;i++)             if(Sam.size[i]>n&&sam.maxlen[i]>ans) {max=i;ans=sam.maxlen[i];} if(ANS) Sam.dfs (1,0); ElsePuts"?"); Puts (""); }    return 0;}

View Code

SPOJ8093 Test Instructions : Given some template strings, ask how many template strings each match string appears in.

Comparison: Ibid. Two ways of passing: one character at a time, or you can use Bitset to record where it occurred until all the strings have been added, then the topological sort, and then "or" to pass up.

#include <iostream>#include<cstdio>#include<algorithm>#include<cstring>#include<cmath>#defineN 200003using namespacestd;intch[n][ -],fa[n],l[n],n,m,len;intR[n],v[n],cnt,np,p,nq,q,last,root,nxt[n],now,size[n];CharS[n];voidExtendintx) {    intc=s[x]-'a'; P=last; np=++cnt; last=NP; L[NP]=l[p]+1;  for(;p &&!ch[p][c];p =fa[p]) ch[p][c]=NP; if(!p) fa[np]=Root; Else{Q=Ch[p][c]; if(l[q]==l[p]+1) fa[np]=Q; Else{NQ=++cnt; l[nq]=l[p]+1; memcpy (Ch[nq],ch[q],sizeofCH[NQ]); SIZE[NQ]=SIZE[Q]; nxt[nq]=Nxt[q]; FA[NQ]=Fa[q]; FA[Q]=fa[np]=NQ;  for(; ch[p][c]==q;p=fa[p]) ch[p][c]=NQ; }    }     for(; np;np=FA[NP])if(nxt[np]!=Now ) {SIZE[NP]++; NXT[NP]=Now ; }     Else  Break;}intMain () {scanf ("%d%d",&n,&m); Root=++CNT;  for(intI=1; i<=n;i++) {scanf ("%s", s+1); Last=Root; Len=strlen (s+1); now=i;  for(intj=1; j<=len;j++) Extend (j); }     for(intI=1; i<=m;i++) {scanf ("%s", s+1); Len=strlen (s+1); P=Root;  for(intj=1; j<=len;j++) p=ch[p][s[j]-'a']; printf ("%d\n", Size[p]); }}

View Code

(for the suffix array, the next is not very sensitive, do more to add some later to come up)

By the way, two suffixes of the automatic machine diagram

Status	Sub-string	Endpos
S	Empty string	{0,1,2,3,4,5,6}
1	A	{1,2,5}
2	Aa	{2}
3	AaB	{3}
4	Aabb,abb,bb	{4}
5	B	{3,4,6}
6	Aabba,abba,bba,ba	{5}
7	Aabbab,abbab,bbab,bab	{6}
8	Ab	{3,6}
9	Aabbabd,abbabd,bbabd,babd,abd,bd,d	{7}

How to choose suffix array && suffix automata

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More