I have translated the suffix tree for 3 hours. If you don't understand it, contact me.

Source: Internet
Author: User

After reading a lot of unreliable suffix tree introductions, this article is one of the best articles I have found on the Internet so far. It uses three rules to describe the construction process of the entire suffix tree and the combination of images, it is very easy to understand, and this article respects the terminology of the original author ukkonen, and clearly explains every concept that appears in the suffix tree. When it takes three hours to translate it, we encourage you to modify and discard some of it.

The text is as follows:

Next, I will try to analyze ukkonen through a simple string that does not contain repeated characters.AlgorithmAnd then describes the complete algorithm framework.

First, let's briefly describe it beforehand.

1. We have built a simple search Trie-like structure, so there is a root node ). Edges of the tree points to a new node until the leaf node.

2. However, unlike search trie, edge label is not a single character. On the contrary, each edge is marked as an integer [from, to]. This integer is the index of the input string ). In this way, each edge records any length of sub-characters (substring), but only requires O (1) space complexity (an integer index ).

Basic conventions

Next I will use a string without repeated characters to illustrate how to create a suffix tree ):

 
ABC

This algorithm is executed step by step from the left to the right of the string. Each step processes one character of the input string, and each step may involve more than one operation, but all operands and O (n) time complexity.

Okay, now we willAInsert the edge to the suffix tree and mark it as [0, #]. It indicates that this edge indicates that it starts from index 0, in the # index end sub-string (I use the symbol # to indicate that the current end index, the current value is 1, exactly behind position ).

Therefore, we have the suffix tree after initialization:

It means:

Now we process index 2, CharacterB. The goal of each step is to update the current index with the ending index of all suffixes (suffixes. We can do this:

1. Expand the existing a side to make itAB;

2.BInsert a new edge.

Then it becomes like this:

It means:

We observed two points:

    1. IndicatesABThe edge is the same as the suffix tree we initialize: [0, #]. It means it will change automatically. We only need to update # And make it 2;
    2. Each step only requires the space complexity of O (1), because we only record a pair of integer indexes.

Next, we will continue to auto-increment # index. Now we need to insert charactersC. We willCInsert each edge to the suffix tree, and then insert it into the suffix tree.CInsert a new edge.

They are like the following:

It means:

We noted that:

    1. Each step is followed by a correct suffix tree;
    2. Operations that require the total number of string lengths;
    3. All operations are O (1 ).

First Expansion: simple repeated strings

The above algorithm works very correctly. Next let's look at more complex strings:

 
Abcabxabcd

Steps 1 to 3: As in the previous example:

Continue reading

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.