The suffix automata learns the small note

Last Update:2018-07-25 Source: Internet

Author: User

Tags lowercase

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction

Suffix three sisters: suffix array, suffix automaton, suffix tree.
Suffix automata: Suffix Automation, also known as Sam.
The idea of the creation of the algorithm source: can construct an automaton (essentially a graph), can identify a string of all suffixes. identify all suffixes basic ideas

Put all the suffixes into a trie, such as String aabbabd.

This state is too much, how to reduce the number of States. ways to reduce the number of States

Defines the right set of substrings for which the substring appears in the original string.
If the right set of the two substrings A and B are exactly the same, then they are obviously one of the other suffixes, assuming that a is the suffix of B, then they will continue to expand only in the same state, so they can be merged into the same state.
In fact, a suffix automaton is like an automaton that maintains the relationship of the right set.
How to deal with it. The following will say. Let's look at some important variables. Important Variables

Put a diagram first.

This is Aabbab's suffix automaton. The
is the status node labeled as a number. The
Set S State (node) is the initial state, indicating an empty string. An S-node can form a string if it reaches any one of the nodes, and this string differs from other formed strings .
There are some values for each point: len,fa,son[26]. (The letters here only consider lowercase letters, with 0 ... 25 for lowercase letters)
∙\bullet1 First Look at what Len is: Observe the graph above, the Len of each node is the length of the maximum string represented by this State (node) , a All of the strings represented by the state are all the strings that are formed by the s to reach this node . The len[0 in the image above ... 8]=0,1,2,3,4,1,6,7,2.
∙\bullet2 Another look at son[26] is what: from a node x walk C character, go to node y, then son[x][c]=y. But why Son[x][c] only the point of Y is not connected to the other points. What is the nature of such a connection? Looking back at the "How to reduce the number of states" above, if the right collection of two states is the same then it can be combined into one state, now all the strings represented by the X state here are the same as a C and Y set exactly as , so son [x] [C]=y, it's equivalent to merging two states. Looking at all the nodes above, each node may have several sides that are always the same, causing the node to represent many strings, but the right collection of these strings is the same .
∙\bullet3 on the Internet many say that this node can accept the new suffix, fa[x] return to the last node can accept the new suffix, I can't read. Then I went straight to the paper before I could understand it. In fact, the right set of fa[x] is the one containing the x, and to ensure that the right set of fa[x] is as small as possible. Because the right collection is as small as possible, it is closer to the set of "X", if x extends a new character to the state Y (the new suffix), and the right of Fa[x] is a collection of right that contains X, then fa[x] can also extend the new character and form a new state. This state may be merged with Y, which is the construction of the suffix automaton. Some properties of the parent tree

∙\bullet1: Moving up from a leaf node is the process by which right sets are constantly merging
∙\bullet2: Set a state the shortest string is mins, the longest string is Maxs=lens, then Mins-1=max_fa[s]
∙\bullet3: Online said to accept the suffix, in fact, can be a good understanding of the argument, the parent tree from the bottom to the right set constantly getting bigger is constantly looking for the suffix of the process suffix automaton construction

The current string s constructs a i-1 point.
The node that represents S[1...i-1] is last, and now it is time to construct the node I and build a new state NP, then it is obviously len[np]=i=len[last]+1. Set P=last

    P=last,np=++num;
    t[np].len=t[p].len+1;

So now to merge the state, now the state of the P-s[i] character has been merged with the NP state, because the right set of fa[p] contains the right set of P, so fa[p] the state of this character may be merged with NP.
Under what circumstances will merge, if son[p][s[i]]=0, then add the state of the s[i] character, in the current state of the automaton its right set must be only I, because S[i] is new, so add the newly appearing right set must only I. Then Fa[p] Ken can also have no s[i] node, Fa[fa[p]] may also have no s[i], so always find P=0 or P have s[i] node.

while (P&&!t[p].son[c]) T[p].son[c]=np,p=t[p].fa;

For convenience, we do not have the number 0th node, the empty string is indicated by the number 1th node .
If p=0, it means that you have traversed the empty string State (node 1th), but the right set of the empty string must include the right set of NP, so t[np].fa=1

if (!p) t[np].fa=1;

If p is not equal to 0, it means P now has s[i] This node, set P walk s[i] go to the node is Q.
Now, there are two things.
(forcing the s[i] into Q may make the T[q].len smaller, and the blue string cannot be placed directly into the set of x characters, because it will conflict with the character B as shown):

1, T[p].len+1=t[q].len, indicates that the longest string represented by P and Q represents the longest string is only one bit worse (that is, the s string goes to the first J to get the state p, go to the first j+1 to get the state Q), now P's right collection is containing the last of the collection, But last walk s[i] character expands out np,p go s[i] character expands out Q, then Q's right must also contain NP right.
2, t[p].len+1< T[q].len (T[p].len+1!=t[q].len) Now add a s[i], in this case Q represents the string, the length of the strings not exceeding the t[p].len+1 of the right set will be more than one I more than t[p]. Len+1 string because the longest string represented by P is several, the right collection of these strings does not increase significantly. Then it's time to take the state apart.
Create a new node NQ, because NQ is only removed from Q, then his son and FA are equal to Q, just use Len to disassemble, len[nq]=len[p]+1. Now the NQ right has a more I, certainly including Q and NP, and also as small as possible.
Then the previous state and Q merge points, their right set will be one more I, so they want to merge with NQ.
Then the suffix automaton is constructed.

else{
        Q=t[p].son[c];
        if (T[p].len+1==t[q].len) t[np].fa=q;
        else{
            Nq=++num;
            T[NQ]=T[Q];
            t[num].len=t[p].len+1;
            T[Q].FA=T[NP].FA=NQ;
            while (p&&t[p].son[c]==q) T[p].son[c]=nq,p=t[p].fa;
        }
    }

application of suffix automata

Now say a few suffixes of the nature of the automaton:
1. The range of string lengths represented by the point of each state I is (Len[fa[i]]...len[i]]. (from Len[fa[i]]+1...len[i])
2. The number of occurrences of all strings represented by each state i is the same as the right collection.
3, the number consisting of FA is called the parent tree, and right of the child node on the parent tree is a subset of the parent node.
4. The parent tree of the suffix automaton is the backward-forward tree of the original string, then the crossdress suffix tree of the original string.
The longest common suffix of 5, two strings is the state of the LCA on the parent tree of the corresponding state on the suffix automaton. the longest common substring of two strings

Build a serial suffix automaton, and then B-string run on the suffix automaton. find the number of different substrings

Method one: With DFS processing at each point can expand the number of strings sum[x]=∑sum[t[x].son[i]]+1 sum[x]=\sum sum[t[x].son[i]]+1, in fact, can not use DFS, topology (according to Len from small to large) and then do backwards. The last sum[1] is the number of all substrings.
Method Two: Ans=∑t[x].len−t[t[x].fa].len ans=\sum T[x].len-t[t[x].fa].len, where a property of 1 is used, because the string size range represented by node I is from len[fa[i]]+1...len[i] , then contains the number of different strings = Len[i]− (len[fa[i]]+1) +1=len[i]−len[fa[i]] len[i]-(len[fa[i]]+1) +1=len[i]-len[fa[i]] find the substring of the K-large

1, look for different strings of the K-large: pre-processing each state can construct how many strings, you can do with DFS, you can also on the robot topology (in fact, the equivalent of Len from small to large sort, because Len smaller topological order will be small), and then beg (equivalent to the DAG on the DP), Then Dfs went to find the Big K.
2, find the same string of the K-large: In addition to preprocess the above things, but also to preprocess all the status of the right set of the size (each string in the original string in the number of occurrences), this will affect the above requirements of the value, and then do the DFS at the same time to deal with it.
The minimum cyclic representation of TJOI2015 string theory in the original topic

The minimum loop representation is the smallest string of the dictionary sequence in all the loop strings of this string.
Copy the original string to the back, and then set up the suffix automaton, each time the minimum point of the dictionary sequence is currently running, go to the length of | s| so far. retrieve a string of text

Construct the suffix automata of the original string, find out the rmax of the right set of each node, and then put the crossdress on the suffix automaton, if the current matching string is in the range of the original string [L...R] covers the rmax of the current node, then [L...rmax] is a palindrome string. Trie built Sam.

Looks very advanced, in fact, the last of each node is the parent node on the trie. Why build it on the trie.
For example, to put many strings at the same time to build a suffix automaton, then there are two ways:
Method One: All strings are separated by a different character, and then a suffix automaton is established.
Method Two: Put all the strings on a trie, and then build a suffix automaton on the trie. (In fact this seems also called Generalized suffix automata) summary

In fact, the suffix array capable of many things can be done with a suffix automaton, suffix automata because there is a tree structure, so add a tree chain can be maintained with a lot of data structure, its code introduction, constant and small, fast, but need to think more to solve the problem. because I am a konjac konjac

There is only so much to know about the suffix automaton. Recommended Topics

TJOI2015 string theory
GDOI2012 string
The ZJOI2015 of the Gods

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More