"Automaton"

The function of the finite state automaton is to recognize the string, and the self-motive A can recognize the string S, which is recorded as $ A (s) $=true, otherwise $ A (s) $=false.

The automata consists of the $alpha$ (character set), $state $ (State collection), $init $ (initial state), $end $ (End-state collection), $trans $ (State transfer function).

The $trans (S,STR) $ indicates that the current state is $s$ and is read into a string or character after $str$ the state is reached.

A string that can be recognized starting from the state $s$ $x$ satisfies $trans (s,x) \subset end$.

"Suffix Automaton (SAM)"

SAM (suffix automaton) is an automaton that can recognize all the suffixes of a string $s$.

That is, $sam (x) =true$, when and only if $x$ is the suffix of $s$.

"Finite state suffix automaton"

The finite state suffix automata is the suffix automaton with the least number of States, and the size is $o (n) $.

Order $st (a) =trans (init, a) $

If the string $a$ appears in $s$ in $[l,r), then it will recognize the suffix that $s$ starts with $r$.

If $a$ appears in $s$ for $\{[l_1, r_1), [L_2, r_2], ..., [L_n, R_n] \}$, then $a$ can recognize the string as $\{suffix (r_1), suffix (r_2), ..., suffix (r_n ) \}$.

make $right (a) ={r_1, r_2, ..., r_n}$, then $a$ can identify the string entirely determined by $right (a) $, which means if $right (a) =right (b) $, there is $st (a) =st (b) $.

So a state $s$ consists of a string of all right sets of $right (s) $.

Given a $right$ collection, given a length, a substring is determined.

The length of the substring corresponding to a $right$ set is an interval, in other words, if the length $l$ and $r$ appropriate for a $right$ collection, it is appropriate to $l\leq the \leq x $x$ r. Because if $x$ determines that one of the occurrences of the substring $a_x$ is not in the $right$ collection, then $r$ determines that the substring $a_r$ as the $a_x$ suffix, there must be a location that is not in $right$. If a location in $righr$ cannot be determined $a_x$, then $a_x$ as the $a_l$ suffix, there will be a location that cannot be determined $a_l$.

So $[min (s), Max (s)]$ represents the length range of the state S.

"Linear proof of the number of States"

Suppose that two states a $ A, b$, assume that $right (a) $ and $right (b) $ have intersections.

Because the $a,b$ is different, there is no intersection of the substrings represented by these two states, $[min (a), Max (a)]$ and $[min (b), and Max (b)]$. Because if there is a intersection, then their $right$ set must be equal and become the same state.

Because $right (a), right (b) $ has a intersection, set $min (a) >max (b) $, then all of the substring lengths represented by $b$ are smaller than $a$, and the end is the same, that is to say, all substrings in $b$ are suffixes of $a$ neutron strings. That is to say, all the places that $a$ appear, $b $ all appear, so $r_a\subset $r _b$, that is, $r_a$ is a true subset of $r_b$.

That is, either the $right$ set of two states does not intersect, or the $right$ collection of one state is a true subset of the other.

Above is the $right$ collection of all states, which we call $parent$ trees.

In this tree, each node has at least two sons, so the number of nodes is $o (n) $.

After proving that the number of nodes is $o (n) $, we also need to prove that the number of edges is $o (n) $.

Consider a $sam$ spanning tree (which is not related to the $parent$ tree).

The number of States is $m$, altogether $m-1$ the edge, and a suffix corresponds to the first non-tree edge that it encounters (one edge may correspond to multiple suffixes), just can correspond, the suffix number is $o (n) $, so the number of sides is also $o (n) $.

We cannot save the $right$ collection of each state, but the $right$ set of a state can be set by the $right$ of the leaves in its subtree.

For a state $s$, set $r_i\in right (s) $, $right (trans (s,x)) =\{r_i+1| s[r_i]==x\}$.

"Linear Structural Sam Analysis"

Set the current string to $t$, $T $ length of $l$, and add a new character $x$.

Sets all the states that represent $t$ (that is, $right$ contains L) $v_1, v_2, V_3, ... $

There must be a state $p=st (T) $ satisfies $right (p) =\{l\}$, because $v_1, v_2, V_3, ... $ all contain $l$, so they are all $parent$ ancestors on the $p$ tree.

Suppose we add a character $x$, using $np$ to represent $st (Tx) $, then $right (NP) ={l+1}$.

Set $v1=p, V_2,.., v_k=root$, that is, in descending order of depth, so $v_1, v_2, V_3, ... $ $right$ size increments, and $v_i$ if there is $x$ at a certain location $v_{i+1}$. If $v_j$ does not have a $x$ edge, it can directly connect it to the $np$ with a $x$ side, because $right$ is in its $l$ collection.

Set $v_p$ is $v_1, v_2, V_3, ... $ in the first position of the $x$ side of the state, so $trans (v_p, x) =q$, then $right (q) =\{r_i+1|s[r_i]=x\}$, note that at this time $x$ has not been added to the string.

The difficulty comes, after $x $ into the string, we cannot add $l+1$ directly to $right (q) $.

Example:

$T =aaabaaaabaa$, $x =b$ is $tx=aaabaaaabaab$.

Marks a string represented by $v_p$ in $t$ with parentheses: $a (aa) BAA (aa) b (AA) $

mark a string represented by $q$ in $t$ with parentheses: $ (Aaab) A (Aaab) aa$

When you join $b$, you will find that $l+1$ can represent $aab$, but not $aaab$, so you can not add $right directly in $l+1$ (q) $.

Of course if $len (v_p) +1==len (q) $ words can also be directly added to $l+1$.

The solution to the above problem is to create a new node $nq$, obviously $right (NQ) =right (q) \cup (right (NP) =\{l+1\}) $, you can solve the problem.

So $trans (V_p~v_k, x) =nq,trans (V_1~v_{p-1}, X) =q$, and then joins the $parent$ tree to complete the construction process.

"Linear construction Sam Step"

① new node $np$ represents $st (Tx) $.

② from the leaf node of the $parent$ tree $l$ up the first to find out the $x$ of the $right$ set contains the state of L $v_p$, on the way no edge $x$ nodes are to $np$ edge, that is $trans (v_1~v_{p-1}, x) =np$.

③ If there is no $v_p$, the $parent$ tree $np$ connected to $root$

④ new node $nq$, copy the $q$ once and make the following updates

$FA (NQ) =FA (q) $//at this time $q$ is to join $x$ before $q$

$FA (q) =FA (NP) =nq$

⑤ $v _p~v_k$ to $nq$, namely $trans (V_p~v_k) =nq$.

And then there's no more.

Note that the points are $2n$!

The code is as follows:

#include <iostream>#include<cstring>#include<cstdlib>#include<cstdio>#include<algorithm>#definell Long Longusing namespacestd;Const intmaxn=2000010, inf=1e9;structpoi{intLen, FA, trans[ -];} ST[MAXN];intN, Tott, now, Root;CharS[maxn];inlinevoidExtendintch) { intNp=++tott, p=Now ;; St[np].len=st[now].len+1; now=NP; while(P &&!st[p].trans[ch]) ST[P].TRANS[CH]=NP, p=St[p].fa; if(!p) st[np].fa=Root; Else { intq=St[p].trans[ch]; if(st[p].len+1==st[q].len) st[np].fa=Q; Else { intnq=++Tott; ST[NQ]=St[q]; St[nq].len=st[p].len+1; ST[Q].FA=st[np].fa=NQ; while(P && st[p].trans[ch]==q) ST[P].TRANS[CH]=NQ, p=St[p].fa; } }}intMain () {scanf ("%s", s+1); N=strlen (s+1); now=tott=root=1; for(intI=1; i<=n;i++) Extend (s[i]-'a');}

View Code

"Example Time"

"Algorithmic" suffix automata (SAM)