Intuitive understanding of suffix automatic machines

Source: Internet
Author: User

Suffix automatic machine (SAM)

[3] is a more rigorous definition description and provides proof. However, this proves that I did not understand it. The following documents give some individuals an intuitive but rigorous understanding.

A suffix automatic machine with a given string a is a finite state automation (DFA). It can and can only accept the suffix of A, and we require that it have the least number of States.

Set n = | A |, number of States: St = [n + 1, 2n-1], number of edges: Eg = [N, 3n-4]. Constructor: the space complexity is 26 * st, and the time complexity is O (3N ). Query: O (| q | );

We can see that we may optimize 26 * st to 3 * St.

First, have an intuitive understanding:

<Fig1>


This is the suffix automation of string a = "abaaaba". The red S indicates the start state, the red indicates the acceptance state, and the black indicates the non-acceptance state. The figure shows that a node can be the son of multiple nodes, saving the state space.

 

Definition:

StatusP: = R (u) Equivalence Class. Here, R (u) :={ U has the right endpoint in }.

For example, Status 7 identifies two suffixes, meeting R ("ABA") = R ("ba") = {3, 7 }. P is called the accept state. It means that the automatic machine recognizes a substring represented by P from the initial state as the suffix of.

StatusTransfer(P, C, q) indicates that status P is transferred to status Q through character C.

 

Suffix function S (A, u): = the longest suffix V of U. If V is satisfied, it is not in the U equivalence class.

U = A is calledSuffix Link, [3] 1.5 indicates that S (A, A) = A contains at least two longest suffixes.

Remember that the last join state is last, then last, S (A, last), S (A, S (A, last ),... the suffix path SP composed of the Acceptance status. Note that other non-accepted suffix links can also point to an accepted status. The suffix link points to the last node that can accept the suffix.

 

LengthFunction L (A, P): = the maximum path length from initial state to P. That is, the maximum number of steps from the root node to the node.

 

Next we will examine the State Changes of state machine A after adding a character X.

ZP is the longest Suffix of ax, and ZP is the longest string of ZP in A. Z belongs to the same equivalence class of.

 

Inference 2.3.12: If X is not in a, the equivalence class of a remains unchanged in ax. (I)

Inference 2.3.11: if z = ZP, the equivalence class of a remains unchanged in ax. (Ii)

Theorem 2.3.10: If Z! = ZP, then the equivalence class between A and Z should be changed to ZP in ax. (Iii)

 

For example, a = "ccccbbccc", x = 'D' corresponds to (I), S (ax, ax) = "";

There is no need to draw out the suffix link, because if you observe sp, you will know that it is 9-> 3-> 2-> 1-> 0.

X = 'C' corresponds to (ii), Z = "CCCC", R (z) = {4}, ZP = z, S (A, A) = "CCC ", S (ax, ax) = z;

X = 'B' corresponds to (iii), Z = "cccb", R (z) = {5} = R (ZP), ZP = "ccccb", S (ax, ax) = z.

<Fig2>



Incremental constructor:

Set the current string to A and add the character to X.

Set P to the state corresponding to R (A) = {L (a)}, and the new node np to the state corresponding to R (ax) = {L (A) + 1.

NP should obviously be an acceptance status of ax. Where should NP be attached?

To save the state space, we should make public prefixes as much as possible and make the image width as much as possible to reduce the path length.

Therefore, starting from last, jump along the suffix link until you jump to the first V node with an X outbound edge.

For all extensions of P without X outdegree v = S (A, P), Trans (v, x) = NP, find the first V with X outdegree, so that Q = trans (v, x ),

1. if l (q) = L (p) + 1, p --> q is only accessible by X, we only need to take Q As the accept state, and the path to Q is the suffix of ax.

2. If l (q )! = L (p) + 1, p --> q may have several other characters that can be reached. If a virtual node NQ represents 1, the S of Q and NP all point to NQ.


There are only 20 lines of core code (it is estimated that the code size of the suffix tree is much larger, which is one of Sam's advantages ):

    void add(int x)                                                                                                                                                                  {        State p = last, np = new State();         np.val = last.val + 1;        for(; (p != null) && (p.go[x] == null); p = p.fa)            p.go[x] = np;         if(null == p){            np.fa = root;         }else{            State q = p.go[x];             if(q.val == p.val + 1){                np.fa = q; /*S(np)=q*/            }else{                State nq = new State();                nq.copy(q); /*trans(nq,*)=trans(q,*)*/                nq.val = p.val + 1;                q.fa = np.fa = nq;                for(; (p != null) && (p.go[x] == q); p = p.fa)                    p.go[x] = nq;             }        }        last = np;    }


Todo

I found that the suffix link is similar to the backward hop of KMP. Sort out the code and use the pattern string to construct Sam, which will be sorted out next week.


Ref

[1] accelerated 2, 3 https://www.cs.duke.edu/courses/fall12/compsci260/resources/suffix.trees.in.detail.pdf

[2] suffix pointer Construction

Http://marknelson.us/1996/08/01/suffix-trees

[3] algebraic combinatorics on words.pdf

Http://www.ctzsm.com/%E5%90%8E%E7%BC%80%E8%87%AA%E5%8A%A8%E6%9C%BA%E6%8A%A5%E5%91%8A/



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.