1. Overview
The trie tree, also known as a dictionary tree, is a word search tree or Prefix Tree. It is a multi-tree structure used for quick search. For example, the English letter dictionary tree is a 26-tree, the number dictionary tree is a 10-Cross Tree.
The word trie comes from RE.TrieVe, Which is pronounced as/TRI:/"Tree". Some people also read it as/TRA cute/"try ".
The trie tree can use the common prefix of the string to save storage space. As shown in, the trie tree saves 6 strings tea, ten, to, in, Inn, int with 10 nodes:
In the trie tree, the common prefixes of strings in, Inn, and INT are "in", so you can store only one "in" to save space. Of course, if there are a large number of strings in the system and these strings basically do not have a public prefix, the corresponding trie tree will consume a lot of memory, which is also a disadvantage of the trie tree.
The basic properties of the trie tree can be summarized:
(1) The root node does not contain any character. In addition to the root node, each node contains only one character.
(2) From the root node to a node, the character passing through the path is connected to the string corresponding to the node.
(3) the strings of all subnodes of each node are different.
2. Basic implementation of the trie tree
The insert, delete, and find operations of the letter tree are simple. Use a repeat loop, that is, find the child tree corresponding to the first letter in the I cycle, and then perform the corresponding operation. To implement this letter tree, we can save it with the most common array (static open memory). Of course, we can also open dynamic pointer types (Dynamic Open memory ). There are three methods for pointing a node to a son:
1. Open a small array of letters for each node. The corresponding subscript is the letter represented by the Son, and the content is the position of the son corresponding to the big array, that is, the label;
2. Create a linked list for each node and record who each son is in a certain order;
3. Use the expression of the Left son and right brother to record the tree.
The three methods have their own characteristics. The first method is easy to implement, but the actual space requirements are large; the second method is easy to implement, the space requirements are relatively small, but time-consuming; the third method is the minimum space requirement, but it is relatively time-consuming and not easy to write.
3. Advanced implementation of the trie tree
You can use double-array. The dual array can greatly reduce the memory usage. For details, see references (5) (6 ).
4. Trie tree Application
Trie is a simple and efficient data structure, but it has a large number of application instances.
(1) string SEARCH
Store the information about some known strings (dictionaries) in the trie tree in advance to find out whether or how often other unknown strings have occurred.
Example:
@ A familiar vocabulary consisting of n words and an article written in lowercase English are provided. Please write all the words not in the familiar vocabulary in the earliest order.
@ Give a dictionary where the word is a bad word. All words are lowercase letters. A text section is provided. Each line of the text is composed of lowercase letters. Determines whether the text contains any bad words. For example, if Rob is a bad word, the text problem contains bad words.
(2) The longest common prefix of a string
The trie tree uses the public prefix of multiple strings to save storage space. On the contrary, when we store a large number of strings in a trie tree, we can quickly obtain the public prefix of some strings.
Example:
@ Give n lower-case English strings and Q queries, that is, ask what is the length of the longest common prefix of two strings?
Solution: first, create the corresponding letter tree for all strings. At this time, we found that the length of the longest common prefix of two strings is the number of common ancestor of the nodes where they are located. Therefore, the problem is converted to the least common ancestor of offline nodes, (LCA.
Recently, the public ancestor issue is also a classic issue. You can use the following methods:
1. Use the disjoint set to use the classic Tarjan algorithm;
2. After finding the Euler sequence of the letter tree, you can convert it into a typical range minimum query (rmq) problem;
(There is a lot of information on the Internet about the parallel query set, the Tarjan algorithm, and rmq .)
(3) sorting
The trie tree is a multi-Cross Tree. As long as you traverse the entire tree in sequence, the corresponding string is output in the lexicographically ordered result.
Example:
@ Give you n English names composed of only one word, which are different from each other, so that you can sort them in lexicographically ascending order.
(4) auxiliary structure of other data structures and algorithms
Such as suffix tree and AC automatic machine
5. Trie tree complexity analysis
(1) the time complexity of insertion and search is O (n), where N is the string length.
(2) The space complexity is 26 ^ N, which is very huge (dual array can be used for improvement ).
6. Summary
The trie tree is a very important data structure. It is widely used in information retrieval, string matching, and other fields. It is also the basis of many algorithms and complex data structures, such as the suffix tree and AC automatic mechanism. Therefore, it is very basic and necessary for an IT staff to master the data structure of the trie tree!
7. References
(1) wiki: http://en.wikipedia.org/wiki/Trie
(2) blog Introduction and implementation of dictionary tree:
Http://hi.baidu.com/luyade1987/blog/item/2667811631106657f2de320a.html
(3) Analysis of the Application of letter tree in the competition of Informatics
(4) construction, utilization and improvement of trie Diagram
(5) An implementation of double-array trie:
Http://linux.thai.net /~ Thep/datrie/datrie.html
(6) An Efficient Implementation of trie structures:
Http://www.google.com.hk/url? Sa = T & source = web & Cd = 4 & ved = 0 cdemo-jad & url = http % 3A % 2f % 2fciteseerx.ist.psu.edu % 2 fviewdoc % 2 fdownload % 3 fdoi % 3d10. 1.1.14.8665% 26rep % 3drep1% 26 type % 3 dpdf & Ei = qaehtziyj4u3cyur_o4b & USG = AFQjCNF5icQbRO8_WKRd5lMh-eWFIty_fQ & sig2 = xfqsgyhbkqolxjdoniqnvw
-------------------------------------
For more information about data structures and algorithms, see:Data Structure and algorithm Summary
-------------------------------------
Original article, reprinted Please note:Reposted from Dong's blog
Link:
Http://dongxicheng.org/structure/trietree/