"Algorithmic" suffix automata (SAM)

Last Update:2018-01-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

"Automaton"

The function of the finite state automaton is to recognize the string, and the self-motive A can recognize the string S, which is recorded as $ A (s) $=true, otherwise $ A (s) $=false.

The automata consists of the $alpha$ (character set), $state $ (State collection), $init $ (initial state), $end $ (End-state collection), $trans $ (State transfer function).

The $trans (S,STR) $ indicates that the current state is $s$ and is read into a string or character after $str$ the state is reached.

A string that can be recognized starting from the state $s$ $x$ satisfies $trans (s,x) \subset end$.

"Suffix Automaton (SAM)"

SAM (suffix automaton) is an automaton that can recognize all the suffixes of a string $s$.

That is, $sam (x) =true$, when and only if $x$ is the suffix of $s$.

"Finite state suffix automaton"

The finite state suffix automata is the suffix automaton with the least number of States, and the size is $o (n) $.

Order $st (a) =trans (init, a) $

If the string $a$ appears in $s$ in $[l,r), then it will recognize the suffix that $s$ starts with $r$.

If $a$ appears in $s$ for $\{[l_1, r_1), [L_2, r_2], ..., [L_n, R_n] \}$, then $a$ can recognize the string as $\{suffix (r_1), suffix (r_2), ..., suffix (r_n ) \}$.

make $right (a) ={r_1, r_2, ..., r_n}$, then $a$ can identify the string entirely determined by $right (a) $, which means if $right (a) =right (b) $, there is $st (a) =st (b) $.

So a state $s$ consists of a string of all right sets of $right (s) $.

Given a $right$ collection, given a length, a substring is determined.

The length of the substring corresponding to a $right$ set is an interval, in other words, if the length $l$ and $r$ appropriate for a $right$ collection, it is appropriate to $l\leq the \leq x $x$ r. Because if $x$ determines that one of the occurrences of the substring $a_x$ is not in the $right$ collection, then $r$ determines that the substring $a_r$ as the $a_x$ suffix, there must be a location that is not in $right$. If a location in $righr$ cannot be determined $a_x$, then $a_x$ as the $a_l$ suffix, there will be a location that cannot be determined $a_l$.

So $[min (s), Max (s)]$ represents the length range of the state S.

"Linear proof of the number of States"

Suppose that two states a $ A, b$, assume that $right (a) $ and $right (b) $ have intersections.

Because the $a,b$ is different, there is no intersection of the substrings represented by these two states, $[min (a), Max (a)]$ and $[min (b), and Max (b)]$. Because if there is a intersection, then their $right$ set must be equal and become the same state.

Because $right (a), right (b) $ has a intersection, set $min (a) >max (b) $, then all of the substring lengths represented by $b$ are smaller than $a$, and the end is the same, that is to say, all substrings in $b$ are suffixes of $a$ neutron strings. That is to say, all the places that $a$ appear, $b $ all appear, so $r_a\subset $r _b$, that is, $r_a$ is a true subset of $r_b$.

That is, either the $right$ set of two states does not intersect, or the $right$ collection of one state is a true subset of the other.

Above is the $right$ collection of all states, which we call $parent$ trees.

In this tree, each node has at least two sons, so the number of nodes is $o (n) $.

After proving that the number of nodes is $o (n) $, we also need to prove that the number of edges is $o (n) $.

Consider a $sam$ spanning tree (which is not related to the $parent$ tree).

The number of States is $m$, altogether $m-1$ the edge, and a suffix corresponds to the first non-tree edge that it encounters (one edge may correspond to multiple suffixes), just can correspond, the suffix number is $o (n) $, so the number of sides is also $o (n) $.

We cannot save the $right$ collection of each state, but the $right$ set of a state can be set by the $right$ of the leaves in its subtree.

For a state $s$, set $r_i\in right (s) $, $right (trans (s,x)) =\{r_i+1| s[r_i]==x\}$.

"Linear Structural Sam Analysis"

Set the current string to $t$, $T $ length of $l$, and add a new character $x$.

Sets all the states that represent $t$ (that is, $right$ contains L) $v_1, v_2, V_3, ... $

There must be a state $p=st (T) $ satisfies $right (p) =\{l\}$, because $v_1, v_2, V_3, ... $ all contain $l$, so they are all $parent$ ancestors on the $p$ tree.

Suppose we add a character $x$, using $np$ to represent $st (Tx) $, then $right (NP) ={l+1}$.

Set $v1=p, V_2,.., v_k=root$, that is, in descending order of depth, so $v_1, v_2, V_3, ... $ $right$ size increments, and $v_i$ if there is $x$ at a certain location $v_{i+1}$. If $v_j$ does not have a $x$ edge, it can directly connect it to the $np$ with a $x$ side, because $right$ is in its $l$ collection.

Set $v_p$ is $v_1, v_2, V_3, ... $ in the first position of the $x$ side of the state, so $trans (v_p, x) =q$, then $right (q) =\{r_i+1|s[r_i]=x\}$, note that at this time $x$ has not been added to the string.

The difficulty comes, after $x $ into the string, we cannot add $l+1$ directly to $right (q) $.

Example:

$T =aaabaaaabaa$, $x =b$ is $tx=aaabaaaabaab$.

Marks a string represented by $v_p$ in $t$ with parentheses: $a (aa) BAA (aa) b (AA) $

mark a string represented by $q$ in $t$ with parentheses: $ (Aaab) A (Aaab) aa$

When you join $b$, you will find that $l+1$ can represent $aab$, but not $aaab$, so you can not add $right directly in $l+1$ (q) $.

Of course if $len (v_p) +1==len (q) $ words can also be directly added to $l+1$.

The solution to the above problem is to create a new node $nq$, obviously $right (NQ) =right (q) \cup (right (NP) =\{l+1\}) $, you can solve the problem.

So $trans (V_p~v_k, x) =nq,trans (V_1~v_{p-1}, X) =q$, and then joins the $parent$ tree to complete the construction process.

"Linear construction Sam Step"

① new node $np$ represents $st (Tx) $.

② from the leaf node of the $parent$ tree $l$ up the first to find out the $x$ of the $right$ set contains the state of L $v_p$, on the way no edge $x$ nodes are to $np$ edge, that is $trans (v_1~v_{p-1}, x) =np$.

③ If there is no $v_p$, the $parent$ tree $np$ connected to $root$

④ new node $nq$, copy the $q$ once and make the following updates

$FA (NQ) =FA (q) $//at this time $q$ is to join $x$ before $q$

$FA (q) =FA (NP) =nq$

⑤ $v _p~v_k$ to $nq$, namely $trans (V_p~v_k) =nq$.

And then there's no more.

Note that the points are $2n$!

The code is as follows:

#include <iostream>#include<cstring>#include<cstdlib>#include<cstdio>#include<algorithm>#definell Long Longusing namespacestd;Const intmaxn=2000010, inf=1e9;structpoi{intLen, FA, trans[ -];} ST[MAXN];intN, Tott, now, Root;CharS[maxn];inlinevoidExtendintch) {    intNp=++tott, p=Now ;; St[np].len=st[now].len+1; now=NP;  while(P &&!st[p].trans[ch]) ST[P].TRANS[CH]=NP, p=St[p].fa; if(!p) st[np].fa=Root; Else    {        intq=St[p].trans[ch]; if(st[p].len+1==st[q].len) st[np].fa=Q; Else        {            intnq=++Tott; ST[NQ]=St[q]; St[nq].len=st[p].len+1; ST[Q].FA=st[np].fa=NQ;  while(P && st[p].trans[ch]==q) ST[P].TRANS[CH]=NQ, p=St[p].fa; }    }}intMain () {scanf ("%s", s+1); N=strlen (s+1); now=tott=root=1;  for(intI=1; i<=n;i++) Extend (s[i]-'a');}

View Code

"Example Time"

"Algorithmic" suffix automata (SAM)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Algorithmic" suffix automata (SAM)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Algorithmic" suffix automata (SAM)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support