This part of the article refers to the Rujia "algorithm competition introduction to the Classic Training Guide", hereby stated.
1. Preface
Take advantage of these days in the morning, the suffix array was roughly read. The concept of this thing itself may not have much to understand the problem, but it extends the knowledge is very complex, many, and its two brothers-suffix tree, suffix automaton, is not built up.
2. Concept
Previously mentioned Aho-corasick automata (http://www.cnblogs.com/jinkun113/p/4682853.html), speak a little bit ... It is used to solve multi-template matching problems. But the premise is to know all the templates in advance, in the actual application, we can not know in advance the query content, such as in the search engine, your query is not directly preprocessed. At this point, you need to preprocess the text string rather than the query content each time. An array of suffixes, which is simpler to say, is to store an array of all suffixes of a string, and then analyze its function.
3. Build
First, assume a string BANANA, add a non-alphabetic character "$" later, represent an identity character that does not appear, and then insert all of its suffix--banana$,anana$,nana$,ana$,na$,a$ into a trie. Because of the presence of the identity character, each suffix of the string corresponds to a leaf node one by one. :
[Picture Invalid]
In the actual application, the suffix trie will not branch of the chain merged together to get the so-called suffix tree, but because the suffix tree construction algorithm is difficult to understand, and easy to write wrong, so in the competition is rarely used, so temporarily do not study. In contrast, an array of suffixes is a must-have, time-efficient, code-simple, and hard-to-write error.
When we draw the suffix trie, we rank the letters with the small dictionary order on the left. Since the leaf node and the suffix one by one correspond, we now label each leaf node the first letter of the suffix in the original string position,
[Picture Invalid]
By connecting all the subscripts together, the so-called suffix array is built. The suffix array for banana is sa[]={5,3,1,0,4,2}. And according to the suffix trie is easy to get, this is based on each suffix of the dictionary sequence of ordering. In this case, we can get directly through a quick sort O (n log n). However, when comparing any of the two suffixes, O (n) is required, so this is O (n^2 log n), which cannot be carried.
4. Multiplication
The multiplication algorithm for the invention of Manber and Myers is described below, with the time complexity O (n log n) (o (n log^2 N) if not in Cardinal order.
First, all individual characters are sorted (also understood as the 1th character sort for each suffix, so that the subsequent steps are easier to connect),
[Picture Invalid]
For each letter,
[Knowledge point] suffix array