1. Overview
Trie tree, also known as the Dictionary tree, the word search tree or the prefix tree, is a multi-fork tree structure for fast retrieval, such as the English Letter Dictionary tree is a 26-fork tree, the number of the dictionary tree is a 10-fork tree. Trie Word from retrieve, pronounced as/tri:/ "Tree" was also read as/tra?/"Try". The trie tree can use the common prefix of a string to conserve storage space. As shown, the Trie tree holds 6 string tea,ten,to,in,inn,int with 10 nodes:
In the trie tree, the common prefix of the string in,inn and int is "in", so you can save space by storing only one copy of "in". Of course, if there are a large number of strings in the system and these strings do not have a common prefix, then the corresponding trie tree will consume memory very much, which is also a disadvantage of the trie tree. The basic properties of the trie tree can be summed up as:
- The root node does not contain characters, and each node except the root node contains only one character.
- From the root node to a node, the characters that pass through the path are concatenated to the corresponding string for that node.
- All child nodes of each node contain different strings.
2, the basic realization of trie tree
The insertion (insert), delete, and lookup (find) of the letter tree are very simple, with one heavy loop, that is, the second loop finds the subtree corresponding to the first I letter, and then the corresponding operation. To implement this tree, we use the most common array to save (static memory) can, of course, also open the dynamic pointer type (dynamic open memory). There are generally three ways to point a knot to a son:
- An array of alphabetic set sizes is opened for each node, and the corresponding subscript is the letter represented by the son, and the content is the position of the son corresponding to the large array, that is, the label;
- A list of each node is linked, and each son is recorded in a certain order;
- Use the left son right brother notation to record this tree.
Three methods, each with its own characteristics. The first one is easy to realize, but the actual space requirement is large; the second one is easier to realize, the space requirement is relatively small, but it is more time-consuming; third, space requirements are minimal, but relatively time-consuming and difficult to write.
3. Advanced implementation of Trie tree
Can be implemented using even-numbered groups (Double-array). The use of even groups can greatly reduce memory usage, specific implementation details see reference (5) (6).
4. Application of Trie tree
Trie is a very simple and efficient data structure, but there are a lot of application examples.
(1) String retrieval
Save the information about some known strings (dictionaries) in advance to the trie tree, and find out if other unknown strings have occurred or are occurring frequently.
Example:
- Give a cooked word list of n words, and an article written in lowercase English, please write all the new words that are not in the cooked vocabulary in the first order of occurrence.
- Give a dictionary in which the words are bad words. The words are all lowercase letters. Given a text, each line of text is also made up of lowercase letters. Determine if the text contains any bad words. For example, if Rob is a bad word, then the text problem contains bad words.
(2) Longest common prefix of the string
The trie tree uses the common prefixes of multiple strings to conserve storage space, whereas when we store a large number of strings on a trie tree, we can quickly get a common prefix for some strings.
Example:
- gives the N lowercase English alphabet string, and Q asks, what is the length of the longest common prefix that asks for a string of two?
Solution: First set up its corresponding letter tree for all the strings. At this point, the length of the longest public prefix for two strings is the number of common ancestors of their nodes, so the problem is converted to the problem of the nearest public ancestor (Least Common Ancestor, or LCA) of the offline (Offline).
The recent public ancestor problem is also a classic problem, which can be used in the following ways:
- Using the disjoint set, the classical Tarjan algorithm can be used.
- after finding the Euler sequence (Euler Sequence) of the letter tree, it is possible to switch to the classic minimum value query (Range Minimum query, short RMQ) . (about and check set, Tarjan algorithm, RMQ problem, there are a lot of information on the Internet.) )
(3) Sort
Trie Tree is a multi-fork tree, as long as the first sequence to traverse the whole tree, the output of the corresponding string is a dictionary ordered by the result. give you n a different English name consisting of only one word, so that you sort them out in dictionary order from small to large.
(4) As an auxiliary structure for other data structures and algorithms
such as suffix trees, ac automata, etc.
5. Analysis of the complexity of trie tree
- The time complexity of insertions and lookups is O (n), where N is the length of the string.
- Spatial complexity is 26^n level and very large (can be improved by using even-numbered groups).
6. Summary
Trie tree is a very important data structure, it in information retrieval, string matching and other fields have a wide range of applications, at the same time, it is also a lot of algorithms and complex data structure of the foundation, such as suffix tree, ac automata, so, master trie tree This data structure, for an IT staff, it seems very basic and necessary!
7. Simple implementation
PackageIO; Public classTrie {Private intSIZE = 26; PrivateTrienode Root;//the root of the dictionary treeTrie () {//Initializing the dictionary treeRoot =NewTrienode (); } Private classtrienode{//Dictionary tree Nodes Private intNum//How many words are passed through this node, that is, the number of occurrences of a string pattern consisting of a root to that node PrivateTrienode[] Son;//all the sons of the node Private BooleanIsend;//Is not the last node Private CharVal//the value of the nodeTrienode () {num= 1; Son=NewTrienode[size]; Isend=false; } } //Build a dictionary tree Public voidInsert (String str) {//Insert a word in the dictionary tree if(str = =NULL|| Str.length () = = 0) { return; } trienode Node=Root; Char[] Letters =Str.tochararray (); for(inti = 0, Len = str.length (); i < Len; i++) { intpos = letters[i]-' a '; if(Node.son[pos] = =NULL) {Node.son[pos]=NewTrienode (); Node.son[pos].val=Letters[i]; } Else{node.son[pos].num++; } node=Node.son[pos]; } node.isend=true; } //calculate the number of word prefixes Public intcountprefix (String prefix) {if(prefix = =NULL|| Prefix.length () = = 0) { return-1; } trienode Node=Root; Char[] Letters =Prefix.tochararray (); for(inti = 0, Len = prefix.length (); i < Len; i++) { intpos = letters[i]-' a '; if(Node.son[pos] = =NULL) { return0; } Else{node=Node.son[pos]; } } returnNode.num; } //print a word with the specified prefix Publicstring Hasprefix (string prefix) {if(prefix = =NULL|| Prefix.length () = = 0) { return NULL; } trienode Node=Root; Char[] Letters =Prefix.tochararray (); for(inti = 0, Len = prefix.length (); i < Len; i++) { intpos = letters[i]-' a '; if(Node.son[pos] = =NULL) { return NULL; } Else{node=Node.son[pos]; }} pretraverse (node, prefix); return NULL; } //traverse the word that passes through this node. Public voidPretraverse (trienode node, String prefix) {if(!node.isend) { for(Trienode child:node.son) {if(Child! =NULL) {pretraverse (child, prefix+child.val); } } return; } System.out.println (prefix); } //find an exact word in the dictionary tree. Public BooleanHas (String str) {if(str = =NULL|| Str.length () = = 0) { return false; } trienode Node=Root; Char[] Letters =Str.tochararray (); for(inti = 0, Len = str.length (); i < Len; i++) { intpos = letters[i]-' a '; if(Node.son[pos]! =NULL) {node=Node.son[pos]; } Else { return false; } } returnNode.isend; } //The pre-order traversal of the dictionary tree. Public voidPretraverse (Trienode node) {if(Node! =NULL) {System.out.print (Node.val+ "-"); for(Trienode Child:node.son) {pretraverse (child); } } } PublicTrienode Getroot () {return This. Root; } Public Static voidMain (string[] args) {Trie tree=NewTrie (); String[] STRs= {"Banana", "Band", "Bee", "Absolute", "ACM", }; string[] Prefix= {"ba", "B", "Band", "ABC", }; for(String str:strs) {tree.insert (str); } System.out.println (Tree.has ("ABC")); Tree.pretraverse (Tree.getroot ()); System.out.println (); for(String pre:prefix) {intnum =Tree.countprefix (PRE); SYSTEM.OUT.PRINTLN (Pre+ "" +num); } }}
View Code
Trie Tree Detailed