After reading a lot of unreliable suffix tree introductions, this article is one of the best articles I have found on the Internet so far. It uses three rules to describe the construction process of the entire suffix tree and the combination of images, it is very easy to understand, and this article respects the terminology of the original author ukkonen, and clearly explains every concept that appears in the suffix tree. When it takes three hours to translate it, we encourage you to modify and discard some of it.
The text is as follows:
Next, I will try to analyze ukkonen through a simple string that does not contain repeated characters.AlgorithmAnd then describes the complete algorithm framework.
First, let's briefly describe it beforehand.
1. We have built a simple search Trie-like structure, so there is a root node ). Edges of the tree points to a new node until the leaf node.
2. However, unlike search trie, edge label is not a single character. On the contrary, each edge is marked as an integer [from, to]. This integer is the index of the input string ). In this way, each edge records any length of sub-characters (substring), but only requires O (1) space complexity (an integer index ).
Basic conventions
Next I will use a string without repeated characters to illustrate how to create a suffix tree ):
ABC
This algorithm is executed step by step from the left to the right of the string. Each step processes one character of the input string, and each step may involve more than one operation, but all operands and O (n) time complexity.
Okay, now we willAInsert the edge to the suffix tree and mark it as [0, #]. It indicates that this edge indicates that it starts from index 0, in the # index end sub-string (I use the symbol # to indicate that the current end index, the current value is 1, exactly behind position ).
Therefore, we have the suffix tree after initialization:
It means:
Now we process index 2, CharacterB. The goal of each step is to update the current index with the ending index of all suffixes (suffixes. We can do this:
1. Expand the existing a side to make itAB;
2.BInsert a new edge.
Then it becomes like this:
It means:
We observed two points:
- IndicatesABThe edge is the same as the suffix tree we initialize: [0, #]. It means it will change automatically. We only need to update # And make it 2;
- Each step only requires the space complexity of O (1), because we only record a pair of integer indexes.
Next, we will continue to auto-increment # index. Now we need to insert charactersC. We willCInsert each edge to the suffix tree, and then insert it into the suffix tree.CInsert a new edge.
They are like the following:
It means:
We noted that:
- Each step is followed by a correct suffix tree;
- Operations that require the total number of string lengths;
- All operations are O (1 ).
First Expansion: simple repeated strings
The above algorithm works very correctly. Next let's look at more complex strings:
Abcabxabcd
Steps 1 to 3: As in the previous example:
Continue reading