How to choose suffix array && suffix automata

Source: Internet
Author: User
Tags cmath

Suffix family known Memberssuffix Treesuffix ArrayAutomatic suffix machinesuffix Cactussuffix prophecysuffix splay? The suffix tree is the ancestor of the suffix array and the suffix automaton? The function is still relatively powerful, it is useful in palindrome string or dictionary order. And now there's a linear way of doing it. (But in fact I didn't use the tree after that.) The following is a comparison of the suffix automata and suffix arrays
  • Single string problem The unequal number is "better than",&& is almost (the following is a personal feeling)
    • 1 Repeating substrings
      • 1, 1 can cross longest repeat substring                         &NB Sp     suffix automaton >= suffix array                           all are base But the former code is slightly shorter
      • 1, 2 cannot cross the longest repeating substring                         nbsp Suffix array >= suffix automaton                           The former is easy to judge Cross; the latter needs to record every Status all occurrences
      • 1, 3 cross K-times longest repeat substring                       suffix automaton ; = suffix array                           The former requires + two points; the latter does not need to be judged, directly topological out of each state The number of times                     
    • 2 Sub-string number problem
      • 2,1 not identical substring suffix automata && suffix arrays are basic functions and are easy to implement.
    • 3 Cyclic substring problems
      • 3,1 the minimum loop-after-suffix array suffixes automaton should not work.
      • 3,2 consecutive repeating substring suffix array with the largest number of repetitions
  • two character string problem
    • 1 common substring problem  
      • 1, 1 longest common string                 & nbsp                         suffix automaton && suffix array     &NBSP ;                     All are basic functions
    • 2 substring number problem
        li>2,1 Common substring of a specific length                               after Prefix automata && suffix arrays                           two basic features
  • Multiple string issues
    • 1 Common substring problems
      • The oldest string that appears in K-strings generalized suffix automaton >= suffix array (KMP can also find the longest common string of multiple strings ) (specific efficiency who high depends on data)
      • The oldest string generalized suffix automaton >= suffix array with k-times appearing in each of the strings
      • 1,3 the oldest string generalized suffix automaton that appears in each string or after inversion? suffix array
    • Other
    • Minimal notation: suffix automata
    • Minimum loop: suffix array
Personal feeling:

A single string and two strings of questions, basically with a suffix array or suffix automata can be achieved. Multi-string problem with generalized suffix automata is also very strong, there is humorous if you want to use a suffix array, you must use RMQ (tree-like array | | ST) + two points, even to use splay to solve. Of course, flexible use of the suffix array plus a variety of tools to solve problems, to deal with a variety of difficulties, after all, suffix automata is also limited. Individuals are more inclined to write suffix automata, feel good to achieve a bit, the code is also good-looking.

The following compares the processing of multiple string strings

The problem of generalized suffix automata:

POJ3294: Test Instructions : Given some template strings, find the longest common string, the longest common string that appears in at least half of the strings.

comparison: If it is an array of suffixes, then the +rmq of the two points, and the generalized suffix automata only need to record the location of the occurrence, the last pass can be.

#include <iostream>#include<cstdio>#include<algorithm>#include<cstring>#include<memory>#include<cmath>#defineMAXN 350003using namespacestd;intN,len,ans,max,now;Chars[1010],cap[1010];structsam{intch[maxn][ -],fa[maxn],maxlen[maxn],last,sz; intROOT,NXT[MAXN],SIZE[MAXN]; voidinit () {sz=0; Root=++sz; memset (Size,0,sizeof(size)); memset (ch[1],0,sizeof(ch[1])); memset (NXT,0,sizeof(NXT)); }    voidAddintx) {intNp=++sz,p=last; last=NP; memset (CH[NP],0,sizeof(CH[NP])); MAXLEN[NP]=maxlen[p]+1;  while(P&&!ch[p][x]) ch[p][x]=np,p=Fa[p]; if(!p) fa[np]=1; Else {            intq=Ch[p][x]; if(maxlen[p]+1==MAXLEN[Q]) fa[np]=Q; Else {                intnq=++sz; memcpy (Ch[nq],ch[q],sizeof(Ch[q])); SIZE[NQ]=SIZE[Q]; nxt[nq]=Nxt[q]; MAXLEN[NQ]=maxlen[p]+1; FA[NQ]=Fa[q]; FA[Q]=fa[np]=NQ;  while(p&&ch[p][x]==q) ch[p][x]=nq,p=Fa[p]; }        }         for(; np;np=FA[NP])if(nxt[np]!=Now ) {SIZE[NP]++; NXT[NP]=Now ; }Else  Break; }    voidDfsintXintD) {//Output       if(D!=maxlen[x] | | d>ans)return; if(Maxlen[x]==ans && size[x]>n) {puts (CAP);return; }  for(intI=0;i< -;++i)if(Ch[x][i]) {cap[d]=i+'a'; DFS (ch[x][i],d+1); cap[d]=0; } }}; Sam Sam;intMain () { while(~SCANF ("%d", &n) &&N)        {Sam.init ();  for(intI=1; i<=n;i++) {scanf ("%s", s+1); Sam.last=Sam.root; Len=strlen (s+1); now=i;  for(intj=1; j<=len;j++) Sam.add (s[j]-'a'); } Max=0; ans=0; n>>=1;  for(intI=1; i<=sam.sz;i++)             if(Sam.size[i]>n&&sam.maxlen[i]>ans) {max=i;ans=sam.maxlen[i];} if(ANS) Sam.dfs (1,0); ElsePuts"?"); Puts (""); }    return 0;}
View Code

SPOJ8093 Test Instructions : Given some template strings, ask how many template strings each match string appears in.

Comparison: Ibid. Two ways of passing: one character at a time, or you can use Bitset to record where it occurred until all the strings have been added, then the topological sort, and then "or" to pass up.

#include <iostream>#include<cstdio>#include<algorithm>#include<cstring>#include<cmath>#defineN 200003using namespacestd;intch[n][ -],fa[n],l[n],n,m,len;intR[n],v[n],cnt,np,p,nq,q,last,root,nxt[n],now,size[n];CharS[n];voidExtendintx) {    intc=s[x]-'a'; P=last; np=++cnt; last=NP; L[NP]=l[p]+1;  for(;p &&!ch[p][c];p =fa[p]) ch[p][c]=NP; if(!p) fa[np]=Root; Else{Q=Ch[p][c]; if(l[q]==l[p]+1) fa[np]=Q; Else{NQ=++cnt; l[nq]=l[p]+1; memcpy (Ch[nq],ch[q],sizeofCH[NQ]); SIZE[NQ]=SIZE[Q]; nxt[nq]=Nxt[q]; FA[NQ]=Fa[q]; FA[Q]=fa[np]=NQ;  for(; ch[p][c]==q;p=fa[p]) ch[p][c]=NQ; }    }     for(; np;np=FA[NP])if(nxt[np]!=Now ) {SIZE[NP]++; NXT[NP]=Now ; }     Else  Break;}intMain () {scanf ("%d%d",&n,&m); Root=++CNT;  for(intI=1; i<=n;i++) {scanf ("%s", s+1); Last=Root; Len=strlen (s+1); now=i;  for(intj=1; j<=len;j++) Extend (j); }     for(intI=1; i<=m;i++) {scanf ("%s", s+1); Len=strlen (s+1); P=Root;  for(intj=1; j<=len;j++) p=ch[p][s[j]-'a']; printf ("%d\n", Size[p]); }}
View Code

(for the suffix array, the next is not very sensitive, do more to add some later to come up)

By the way, two suffixes of the automatic machine diagram

Status Sub-string Endpos
S Empty string {0,1,2,3,4,5,6}
1 A {1,2,5}
2 Aa {2}
3 AaB {3}
4 Aabb,abb,bb {4}
5 B {3,4,6}
6 Aabba,abba,bba,ba {5}
7 Aabbab,abbab,bbab,bab {6}
8 Ab {3,6}
9 Aabbabd,abbabd,bbabd,babd,abd,bd,d {7}

How to choose suffix array && suffix automata

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.