I have translated the suffix tree for 3 hours. If you don't understand it, contact me.

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

After reading a lot of unreliable suffix tree introductions, this article is one of the best articles I have found on the Internet so far. It uses three rules to describe the construction process of the entire suffix tree and the combination of images, it is very easy to understand, and this article respects the terminology of the original author ukkonen, and clearly explains every concept that appears in the suffix tree. When it takes three hours to translate it, we encourage you to modify and discard some of it.

The text is as follows:

Next, I will try to analyze ukkonen through a simple string that does not contain repeated characters.AlgorithmAnd then describes the complete algorithm framework.

First, let's briefly describe it beforehand.

1. We have built a simple search Trie-like structure, so there is a root node ). Edges of the tree points to a new node until the leaf node.

2. However, unlike search trie, edge label is not a single character. On the contrary, each edge is marked as an integer [from, to]. This integer is the index of the input string ). In this way, each edge records any length of sub-characters (substring), but only requires O (1) space complexity (an integer index ).

Basic conventions

Next I will use a string without repeated characters to illustrate how to create a suffix tree ):

ABC

This algorithm is executed step by step from the left to the right of the string. Each step processes one character of the input string, and each step may involve more than one operation, but all operands and O (n) time complexity.

Okay, now we willAInsert the edge to the suffix tree and mark it as [0, #]. It indicates that this edge indicates that it starts from index 0, in the # index end sub-string (I use the symbol # to indicate that the current end index, the current value is 1, exactly behind position ).

Therefore, we have the suffix tree after initialization:

It means:

Now we process index 2, CharacterB. The goal of each step is to update the current index with the ending index of all suffixes (suffixes. We can do this:

1. Expand the existing a side to make itAB;

2.BInsert a new edge.

Then it becomes like this:

It means:

We observed two points:

IndicatesABThe edge is the same as the suffix tree we initialize: [0, #]. It means it will change automatically. We only need to update # And make it 2;
Each step only requires the space complexity of O (1), because we only record a pair of integer indexes.

Next, we will continue to auto-increment # index. Now we need to insert charactersC. We willCInsert each edge to the suffix tree, and then insert it into the suffix tree.CInsert a new edge.

They are like the following:

It means:

We noted that:

Each step is followed by a correct suffix tree;
Operations that require the total number of string lengths;
All operations are O (1 ).

First Expansion: simple repeated strings

The above algorithm works very correctly. Next let's look at more complex strings:

 Abcabxabcd

Steps 1 to 3: As in the previous example:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

I have translated the suffix tree for 3 hours. If you don't understand it, contact me.

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support