Character and application of "string data structure suffix series Part3" suffix automata

Source: Internet
Author: User

Once we've learned to build Sam, we'll start learning how to use Sam to deal with all kinds of problems.
Let's start with the overall look at Sam's nature (quoted from 2015 National training Team proceedings Zhang Tianyan "suffix automata and their applications"):

1. The length of the string represented by each state s is the interval (Le n f a Span style= "Display:inline-block; width:0px; Height:2.456em; " > s ,Le n s ] .
2. For each State S, it represents the same number of occurrences in the original string as each occurrence of the right endpoint.
3. In the parent tree of the suffix automaton, the right collection of each state is a subset of its parent state, the collection.
4. The parent tree of the suffix automaton is the inverse of the original stringprefixTree.
5. The longest public suffix of two strings, located in the nearest public ancestor state of the two strings corresponding to the state on the parent tree.

——————————————— – Line cut is my >w< —————————————-
We all say Sam can replace SA completely.
So let's take a look at the SA application. SA has the following features (I'm only going to focus on Sam's approach for each feature):

1. Matching of multi-mode strings.

Obviously as an automaton, the matching function of strings is innate ... So Sam was able to do string matching in the obvious but ...

2. Longest common prefix (LCP)

When working with this problem, SA tends to build an array of height, which is actually a better implementation, but slightly less than Sam.
First we turn two strings in reverse, so the longest public prefix becomes the longest public suffix.
In observing the nature of Sam 5, it is obvious that this problem has turned into a bare question ...
There may be questions about the correctness of the nature, so let's do a brief analysis:
First we know that for a state s, the string corresponding to its parent state must be the suffix of the string corresponding to S. It is obvious that this is based on the way Sam was built.
So find the two strings on the Sam's corresponding state, and then do the LCA will be able to get that longest public suffix is the original problem of the longest public prefix.

3. Longest palindrome substring

We build Sam on the original string to find the maximum value of Rmax in the right collection for each point.
For the original string inversion and then substituting the suffix automaton to match, assuming that the current matching string corresponding to the original string substring is [l,r], then if the interval covers the current match to the state Rmax, then [L,rmax] is a palindrome string. (This thing looks quite right, but I'm not very good at it, but if you draw a picture yourself, you'll find it's quite right ...)
The same way we can find all the palindrome strings to deal with different problems >_<

4. Longest common substring

Build a SAM on a string, put the other strings on the SAM, keep track of the length of the string at all times, and finally each state takes min for all the matching lengths of the other matches, and then takes Max for the min of all States.

————————————————————— – Line cut is my >w< —————————————————
These are some of the more common problems. There may be some unusual problems, and then we're not very good with Sam.
In fact, Sam can construct a suffix tree and an array of suffixes linearly .
We know that the linear structure of SA (DC3 algorithm) is not only constant and poorly understood, but the SAM linear construction SA is actually very simple and has a small constant.
We know that there is a suffix tree corresponding to the suffix array, and you can get the suffix array by doing DFS over the suffix tree.
So how do you use Sam to construct suffix trees linearly?
See property Four.
Let's take a look at what is an anti-forward tree:

Insert the crossdress of each prefix into a trie and merge the chains with no branches.

Well, in fact, the goods can be regarded as the original string of crossdress suffix tree. As for why people can see their own papers or brain repair, I don't say more =
So we set up a suffix automaton for the crossdress of the original string, and its parent tree is the suffix tree of the original string.
Then use the method mentioned above to the suffix tree dfs, get is SA.
This is really a very small constant ... Basically it's a build Sam, sweep the parent tree, and then a DFS problem ...
It's no slower than DC3.
Finally got the result of our satisfaction.

In fact, Sam has a lot of functions, we all say he can completely replace SA, and seems to be more rich, but because I just learned Sam not long, to him not too deep understanding, now almost only these qaq please forgive Qaq
More applications and specific implementations of the above applications will be reflected in the code in the future.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Character and application of "string data structure suffix series Part3" suffix automata

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.