"Algorithmic" suffix automata (SAM)

Source: Internet
Author: User


The function of the finite state automaton is to recognize the string, and the self-motive A can recognize the string S, which is recorded as $ A (s) $=true, otherwise $ A (s) $=false.

The automata consists of the $alpha$ (character set), $state $ (State collection), $init $ (initial state), $end $ (End-state collection), $trans $ (State transfer function).

The $trans (S,STR) $ indicates that the current state is $s$ and is read into a string or character after $str$ the state is reached.

A string that can be recognized starting from the state $s$ $x$ satisfies $trans (s,x) \subset end$.

"Suffix Automaton (SAM)"

SAM (suffix automaton) is an automaton that can recognize all the suffixes of a string $s$.

That is, $sam (x) =true$, when and only if $x$ is the suffix of $s$.

"Finite state suffix automaton"

The finite state suffix automata is the suffix automaton with the least number of States, and the size is $o (n) $.

Order $st (a) =trans (init, a) $

If the string $a$ appears in $s$ in $[l,r), then it will recognize the suffix that $s$ starts with $r$.

If $a$ appears in $s$ for $\{[l_1, r_1), [L_2, r_2], ..., [L_n, R_n] \}$, then $a$ can recognize the string as $\{suffix (r_1), suffix (r_2), ..., suffix (r_n ) \}$.

make $right (a) ={r_1, r_2, ..., r_n}$, then $a$ can identify the string entirely determined by $right (a) $, which means if $right (a) =right (b) $, there is $st (a) =st (b) $.

So a state $s$ consists of a string of all right sets of $right (s) $.

Given a $right$ collection, given a length, a substring is determined.

The length of the substring corresponding to a $right$ set is an interval, in other words, if the length $l$ and $r$ appropriate for a $right$ collection, it is appropriate to $l\leq the \leq x $x$ r. Because if $x$ determines that one of the occurrences of the substring $a_x$ is not in the $right$ collection, then $r$ determines that the substring $a_r$ as the $a_x$ suffix, there must be a location that is not in $right$. If a location in $righr$ cannot be determined $a_x$, then $a_x$ as the $a_l$ suffix, there will be a location that cannot be determined $a_l$.

So $[min (s), Max (s)]$ represents the length range of the state S.

"Linear proof of the number of States"

Suppose that two states a $ A, b$, assume that $right (a) $ and $right (b) $ have intersections.

Because the $a,b$ is different, there is no intersection of the substrings represented by these two states, $[min (a), Max (a)]$ and $[min (b), and Max (b)]$. Because if there is a intersection, then their $right$ set must be equal and become the same state.

Because $right (a), right (b) $ has a intersection, set $min (a) >max (b) $, then all of the substring lengths represented by $b$ are smaller than $a$, and the end is the same, that is to say, all substrings in $b$ are suffixes of $a$ neutron strings. That is to say, all the places that $a$ appear, $b $ all appear, so $r_a\subset $r _b$, that is, $r_a$ is a true subset of $r_b$.

That is, either the $right$ set of two states does not intersect, or the $right$ collection of one state is a true subset of the other.

Above is the $right$ collection of all states, which we call $parent$ trees.

In this tree, each node has at least two sons, so the number of nodes is $o (n) $.

After proving that the number of nodes is $o (n) $, we also need to prove that the number of edges is $o (n) $.

Consider a $sam$ spanning tree (which is not related to the $parent$ tree).

The number of States is $m$, altogether $m-1$ the edge, and a suffix corresponds to the first non-tree edge that it encounters (one edge may correspond to multiple suffixes), just can correspond, the suffix number is $o (n) $, so the number of sides is also $o (n) $.

We cannot save the $right$ collection of each state, but the $right$ set of a state can be set by the $right$ of the leaves in its subtree.

For a state $s$, set $r_i\in right (s) $, $right (trans (s,x)) =\{r_i+1| s[r_i]==x\}$.

"Linear Structural Sam Analysis"

Set the current string to $t$, $T $ length of $l$, and add a new character $x$.

Sets all the states that represent $t$ (that is, $right$ contains L) $v_1, v_2, V_3, ... $

There must be a state $p=st (T) $ satisfies $right (p) =\{l\}$, because $v_1, v_2, V_3, ... $ all contain $l$, so they are all $parent$ ancestors on the $p$ tree.

Suppose we add a character $x$, using $np$ to represent $st (Tx) $, then $right (NP) ={l+1}$.

Set $v1=p, V_2,.., v_k=root$, that is, in descending order of depth, so $v_1, v_2, V_3, ... $ $right$ size increments, and $v_i$ if there is $x$ at a certain location $v_{i+1}$. If $v_j$ does not have a $x$ edge, it can directly connect it to the $np$ with a $x$ side, because $right$ is in its $l$ collection.

Set $v_p$ is $v_1, v_2, V_3, ... $ in the first position of the $x$ side of the state, so $trans (v_p, x) =q$, then $right (q) =\{r_i+1|s[r_i]=x\}$, note that at this time $x$ has not been added to the string.

The difficulty comes, after $x $ into the string, we cannot add $l+1$ directly to $right (q) $.


$T =aaabaaaabaa$, $x =b$ is $tx=aaabaaaabaab$.

Marks a string represented by $v_p$ in $t$ with parentheses: $a (aa) BAA (aa) b (AA) $

mark a string represented by $q$ in $t$ with parentheses: $ (Aaab) A (Aaab) aa$

When you join $b$, you will find that $l+1$ can represent $aab$, but not $aaab$, so you can not add $right directly in $l+1$ (q) $.

Of course if $len (v_p) +1==len (q) $ words can also be directly added to $l+1$.

The solution to the above problem is to create a new node $nq$, obviously $right (NQ) =right (q) \cup (right (NP) =\{l+1\}) $, you can solve the problem.

So $trans (V_p~v_k, x) =nq,trans (V_1~v_{p-1}, X) =q$, and then joins the $parent$ tree to complete the construction process.

"Linear construction Sam Step"

① new node $np$ represents $st (Tx) $.

② from the leaf node of the $parent$ tree $l$ up the first to find out the $x$ of the $right$ set contains the state of L $v_p$, on the way no edge $x$ nodes are to $np$ edge, that is $trans (v_1~v_{p-1}, x) =np$.

③ If there is no $v_p$, the $parent$ tree $np$ connected to $root$

④ new node $nq$, copy the $q$ once and make the following updates

$FA (NQ) =FA (q) $//at this time $q$ is to join $x$ before $q$

$FA (q) =FA (NP) =nq$

⑤ $v _p~v_k$ to $nq$, namely $trans (V_p~v_k) =nq$.

And then there's no more.

Note that the points are $2n$!

The code is as follows:

#include <iostream>#include<cstring>#include<cstdlib>#include<cstdio>#include<algorithm>#definell Long Longusing namespacestd;Const intmaxn=2000010, inf=1e9;structpoi{intLen, FA, trans[ -];} ST[MAXN];intN, Tott, now, Root;CharS[maxn];inlinevoidExtendintch) {    intNp=++tott, p=Now ;; St[np].len=st[now].len+1; now=NP;  while(P &&!st[p].trans[ch]) ST[P].TRANS[CH]=NP, p=St[p].fa; if(!p) st[np].fa=Root; Else    {        intq=St[p].trans[ch]; if(st[p].len+1==st[q].len) st[np].fa=Q; Else        {            intnq=++Tott; ST[NQ]=St[q]; St[nq].len=st[p].len+1; ST[Q].FA=st[np].fa=NQ;  while(P && st[p].trans[ch]==q) ST[P].TRANS[CH]=NQ, p=St[p].fa; }    }}intMain () {scanf ("%s", s+1); N=strlen (s+1); now=tott=root=1;  for(intI=1; i<=n;i++) Extend (s[i]-'a');}
View Code

"Example Time"

"Algorithmic" suffix automata (SAM)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.