Suffix automatic machine (SAM) Learning Guide

Source: Internet
Author: User
Tags first string

* You must be familiar with Wa, re, and TLE before learning suffix automation *


What is a suffix automatic machine?

Suffix automatic machine suffix automation (SAM) is an automatic machine that is constructed with the complexity of O (N) and can accept all the suffixes of a string.

It was first mentioned in Chen Lijie's Noi winter camp lecture in 2012.

In a multi-school joint training in 2013, Sam became popular because Chen Li's outstanding HDU 4622 was able to use Sam for ease.

In general, all problems solved by suffix-based automatic machines can be solved by suffix arrays. However, suffix automation also has its own advantages.

1812. Longest Common substring II Question: give n (n <= 10) strings of no more than 100000 length, find their longest public continuous substring. Time Limit: 2 S on spoj
Chen Lijie's lecture uses spoj's 1812 as an example. Because spoj is too slow, only the O (n) algorithm can pass this question, and Sam is required.


Construction of suffix automatic machines

Refer to various templates on the Internet.



Nature of suffix-based automatic machines

A bare suffix automatic machine is only an automatic machine that can receive substrings. Maintaining the nature of it on its State node is the key to solving the problem.

A constructed Sam actually contains two graphs: A Dag consisting of a go array and a parent tree composed of a par pointer.


Sam's state node contains a lot of important information:

MAX: The Val variable in the code, which indicates the longest string length that can be accepted in this state.

Min: indicates the shortest string length that can be accepted in this status. Actually equal to the Val + 1 of the node pointed by the par pointer in this state.

Max-min + 1: Number of different strings that can be accepted in this status.

Right: The number of end-set, indicating the number of times this state appears in the string, and all strings that this state can represent have the right state.

Par: A par points to a node that can represent the longest public Suffix of all strings in the current state. All State par pointers constitute a parent tree, which is exactly the reverse suffix tree of the string.

The topological order of the parent tree: The subnode of the I state in the sequence must be after it, and the parent node must be before it.


Classic problem of suffix automatic machines


Least cycle string of the ultraviolet (a) 719-glass beads

The traversal of the suffix automatic machine.

Given a string S, you can move its first character to the end each time to find the string with the smallest Lexicographic Order.

Concatenate string s into SS and construct an automatic machine. Start from the root node and move the length (s) step to find the string with the smallest Lexicographic Order.

Because Sam can accept all SS substrings, and the smallest lexicographic string must be SS substrings, moving according to the above rules can find the smallest substrings in the Lexicographic Order.


Spoj 1811 Longest Common substring longest public substring

For two strings A and B with a length less than 100000, find their longest public continuous substrings.

Construct string a as Sam, and then run the automatic machine according to the following rules.

Use the variable LCS to record the longest common substring. The initial value is 0.

Set the current state node to P and the character to be matched to C. If go [c] has an edge, it indicates it can be transferred, and then it is transferred and LCs ++;

If it cannot be transferred, move the status to the P's par. If it still cannot be transferred, repeat the process until P returns to the root node and sets LCS to 0;

If the status is transferred during the previous process, set LCS to the Current Status Val.

Why do they move to the par after the mismatch? Because the mismatch in status P indicates that the [min, Max] In this status indicates that the strings are not substrings in B, but the suffixes shorter than them may still be substrings of B, the PAR Pointer Points to the suffix of this state.


Spoj 1812 Longest Common substring II The Longest Common substring of multiple strings

In the previous question, we learned how to obtain the longest public substring of two strings. This question requires the longest public substring of multiple strings.

This topic uses the topological order of the parent tree.

First, construct Sam with the first string, and then match it with other strings.

The Sam State requires two more messages: LCs. The length of the last character of the longest common substring of multiple strings falls on this state; NLCS, the length of the last character of the longest common substring of the current string in this state.

After matching each string, we need to maintain the LCS in each state. Obviously, LCs = min (LCS, NLCS), and what we want at last is the maximum value of LCS in all States.

The matching process is the same as that of the previous question. However, during the matching process, the NLCS obtained when the P state is reached may not be the longest public substring length that can be expressed in this state, because if a substring appears n times, all the suffixes of the substring also appear at least N times.

Therefore, after each string is matched, it is required to maintain the NLCS in each State in the reverse order of the topology so that P-> par-> NLCS = max (p-> NLCS, p-> par-> NLCS ).


HDU 4622 reincarnation

This is also the first time many new people have come into contact with Sam. This question can be used in various positions, but Sam is the easiest.

A string can contain a maximum of 2000 and Q queries. Each time you ask how many different strings are in the [L, R] interval.

Each status in Sam can indicate the number of different substrings as Val-the Val of the parent node. Therefore, when constructing an automatic machine, the variable total is used to record the number of different substrings that the current automatic machine can represent, and the total value is updated for each extend. Record every total value in this process to get a table that represents the number of substrings. We re-construct Sam for each Suffix of the string to get a two-dimensional table.

For each query, you can find the corresponding value in the table.


HDU 4436 str2int processes different substrings

N digits are given. The number is long and can be read in strings. The total length is 10 ^ 5. Evaluate the sum of all the substrings (not repeated) of the N strings and the modulo 2012.

The question should be handled for all non-repeated substrings, and Sam should be used to solve the problem.

Concatenate n digits into a string and separate them with 10 digits that do not appear.

After the construction, sum and CNT in each State are calculated in topological order. sum indicates the sum of substrings ending with the current state, and CNT indicates the number of methods to reach the current node.

Set the parent node to V, obviously, Add add = u-> sum * 10 + U-> CNT * k to the sum node v State. That is, the sum of the numbers that u can represent multiplied by 10 plus the total number of methods that reach v multiplied by the current digit K.

The final answer is to sum the sum of all States.


Spoj 8222 times of occurrence of substrings

Given a string of S, f (x) indicates the maximum number of occurrences of all substrings whose length is X. Evaluate F (1). F (length (s )).

Maintain the right of each State in the reverse order of the topology, indicating the number of occurrences of the current state.

Finally, F [Val] is updated with the right of each status, that is, the maximum number of occurrences of the string that the current status can represent.

Finally, F [I] is used to update the f [I-1] to obtain the maximum value, because if a string with the length of I appears f [I] times, then the string with a length of I-1 appears at least f [I] times.


Poj 3415 common substrings substring count

Give two strings and ask all the substrings of the two strings (if the positions are different, the two substrings are repeated ), the number of public substrings with a length greater than or equal to K.

First, construct Sam for the first string. The right and Val of the State can be used to easily find the number of all substrings that it can represent. The question is how to meet the conditions.

Use the second string to perform LCS on Sam. When the current status is LCS> = K, the CNT ++ in the maintenance status, the number of times that the end of this state is greater than K and the longest Public String is CNT.

Count the number of matching conditions in the state of the longest common substring. ans + = (LCS-max (k, p-> mi) + 1) * P-> right

After the matching is completed, each State parent node CNT is maintained in reverse order of the topology. The meaning of CNT is the number of times that the state is included.

Count the number of quilt strings that are not the longest public substrings, ANS + = p-> CNT * (p-> par-> Val-max (K, p-> par-> mi) + 1) * P-> par-> right: multiply the number of times that the parent node is contained by the number of strings that meet the condition to accumulate the number of strings into the answer.


Spoj 7258 lexicographical substring search Lexicographic Order

A string of 90000 characters. Ask Q times, answer a K each time, and find the substring In the k-th Lexicographic Order.

The number of substrings in each state is still obtained in topological order.

For the sub-string whose size is K, the edge is enumerated in Lexicographic Order. If an edge is skipped, K minus the number of different sub-strings in the state pointed by the edge until it cannot be skipped, then move along the edge once, and cycle this step until K changes to 0.

At this time, the path is the sub-string in the Lexicographic Order K.


Codeforces 235c occurrence ical quest string occurrences

* The winner of this competition is wjmzbmr Chen Lijie *

Returns a string s, which is called the parent string, and then returns n substrings, n <= 10 ^ 5. The total length of the substrings cannot exceed 10 ^ 6. Q: The total number of occurrences of all the different periodic homogeneous strings of each substring in the parent string.

Create Sam from the parent string, copy and splice the child string, and remove the last letter to run sam.

This question can be done in a variety of positions. The best way is to mark the child strings that meet the conditions with marks. After matching, maintain the mark in descending order of the topology until the length of the atomic string is included in the length that can be expressed by the State.

Then, the number of occurrences of this state is accumulated to the answer. If a State that should be accumulated has been marked, it will not be accumulated.


Codeforces 427d match & catch Public String occurrences

Two strings S1 and S2 with a length of no more than 5000 are given. Find that the two strings contain only the shortest common substrings once.

Construct Sam for the first string and run it with the second string. Obviously, the state of "right" is 1, that is, the substring that appears in the first string with the number of times being 1.

Each time the matching process enters a node, the CNT on the node is added to indicate the maximum number of times that the Public String appears in the second string.

Finally, the CNT in all States is obtained in reverse order of the topology. If a node appears CNT times, the number of times that its parent node is suffixed must be added with CNT.

Finally, traverse all States. The state where right is equal to 1 and CNT is equal to 1 is the public substring with the number of occurrences of 1. Find the shortest one of them as the answer.





Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.