[Knowledge point] suffix array

Source: Internet
Author: User
Tags alphabetic character

This part of the article refers to the Rujia "algorithm competition introduction to the Classic Training Guide", hereby stated.

1. Preface

Take advantage of these days in the morning, the suffix array was roughly read. The concept of this thing itself may not have much to understand the problem, but it extends the knowledge is very complex, many, and its two brothers-suffix tree, suffix automaton, is not built up.

2. Concept

Previously mentioned Aho-corasick automata (http://www.cnblogs.com/jinkun113/p/4682853.html), speak a little bit ... It is used to solve multi-template matching problems. But the premise is to know all the templates in advance, in the actual application, we can not know in advance the query content, such as in the search engine, your query is not directly preprocessed. At this point, you need to preprocess the text string rather than the query content each time. An array of suffixes, which is simpler to say, is to store an array of all suffixes of a string, and then analyze its function.

3. Build

First, assume a string BANANA, add a non-alphabetic character "$" later, represent an identity character that does not appear, and then insert all of its suffix--banana$,anana$,nana$,ana$,na$,a$ into a trie. Because of the presence of the identity character, each suffix of the string corresponds to a leaf node one by one. :

[Picture Invalid]

In the actual application, the suffix trie will not branch of the chain merged together to get the so-called suffix tree, but because the suffix tree construction algorithm is difficult to understand, and easy to write wrong, so in the competition is rarely used, so temporarily do not study. In contrast, an array of suffixes is a must-have, time-efficient, code-simple, and hard-to-write error.

When we draw the suffix trie, we rank the letters with the small dictionary order on the left. Since the leaf node and the suffix one by one correspond, we now label each leaf node the first letter of the suffix in the original string position,

[Picture Invalid]

By connecting all the subscripts together, the so-called suffix array is built. The suffix array for banana is sa[]={5,3,1,0,4,2}. And according to the suffix trie is easy to get, this is based on each suffix of the dictionary sequence of ordering. In this case, we can get directly through a quick sort O (n log n). However, when comparing any of the two suffixes, O (n) is required, so this is O (n^2 log n), which cannot be carried.

4. Multiplication

The multiplication algorithm for the invention of Manber and Myers is described below, with the time complexity O (n log n) (o (n log^2 N) if not in Cardinal order.

First, all individual characters are sorted (also understood as the 1th character sort for each suffix, so that the subsequent steps are easier to connect),

[Picture Invalid]

For each letter,

[Knowledge point] suffix array

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.