Http://chencb.ycool.com/post.1901840.html
Today, I showed you the suffix array and commented on the following sentence: very good, very powerful
If you are too lazy to write nonsense, simply record the relevant key points: the suffix array is the array after sorting all the strings suffix, set the string to S, so that the suffix (I) represents s [I .. len (s)]. Use two arrays to record the sorting results of all suffixes:
- Rank [I] records the sequence number after suffix (I) sorting, that is, suffix [I] is a small Suffix of rank [I] in all suffixes.
- Sa [I] records the position of the first letter of the I-bit suffix, that is, suffix [SA [I] is a suffix smaller than I in all suffixes.
Then there is how to quickly find the order of all suffixes. The key is how to reduce the complexity of comparing two suffixes.
The method is the multiplication method, which defines a string with a K-Prefix consisting of the first k characters of the string. The definition of suffix (K, I) on the K-suffix) sa [K, I] and rank [K, I] are similar
- If rank [K, I] = rank [K, J] and rank [K, I + k] = rank [K, J + K], suffix [2 K, i] = suffix [2 K, J]
- If rank [K, I] = rank [K, J] and rank [K, I + k] <rank [K, J + K], suffix [2 K, i] <suffix [2 K, J]
- If rank [K, I] <rank [K, J], suffix [2 K, I] <suffix [2 K, J]
In this way, the size of suffix (2 ^ K, I) can be compared within the constant time, so as to sort suffix (2 ^ K, I), and finallyWhen 2 ^ K> N, the size between suffix (2 ^ K, I) is the size between all suffixes.
So I figured out the sorting of all suffixes. what is the use? It is mainly used to obtain the longest common prefix (LCP) between them)
Make LCP (I, j) A suffix smaller than I and a suffix smaller than J (suffix (SA [I]) and suffix (SA [J]) the length of the longest common prefix has the following two properties:
- For any I <= k <= J, there are LCP (I, j) = min (LCP (I, K), LCP (K, j ))
- LCP (I, j) = min (I <k <= J) (LCP (K-1, k ))
The first property is obvious, and its significance is that it can be used to prove the second property. The second feature provides a way to convert an LCP problem to an rmq problem:
To make height [I] = LCP (I-1, I), that is, height [I] represents the LCP of the suffix of the I small and the suffix of the I-1 small, thenLCP (I, j) is equal to the height [I + 1] ~ Rmq between height [J], Apply rmqAlgorithmThe complexity is preprocessing O (nlogn) and querying O (1)
Then the height method uses another array: Make H [I] = height [SA [I], that is, H [I] represents suffix (I) the value of height (also, height [I] indicates the height value of suffix (SA [I]), then the value of height [I] = H [rank [I]
Then H [I] has a personality:
In this case, when we compare the suffix when calculating H [I ],Only need to compare from the second H [I-1] bitSo the total complexity of the comparison is O (n), that is, the H array is solved in the O (n) time. H solves the height problem and the entire LCP problem is solved.
Then, the application of the suffix array uses its LCP to reduce the complexity of string comparison. Meanwhile, because of the Order of the suffix array, binary can be easily used.
So let's summarize the key points:
- Sort the suffix array in O (nlogn) time using the multiplier Algorithm
- Use the properties of the H array to obtain the height of the LCP array between adjacent suffixes after sorting in the O (n) time.
- Converting ordinary LCP problems into rmq problems on the height Array Using the nature of LCP
Keywords (TAG): suffix array Multiplication Algorithm LCP
Related logs:
- » a true local committee president
- » suffix tree and suffix array
- » [Code] use a multiplier algorithm to construct a suffix array, O (n log n)
- » [Code] use the ukkonen algorithm to construct a suffix tree/suffix array, O (n), and store subnodes in a linked list
- » [Code] use the ukkonen algorithm to construct a suffix tree/suffix array, O (n), and store sub-nodes with arrays