Supports the implementation of prefix trie based on the Basic granularity of Words in Chinese, and prefixtrie
TrieTree, also called dictionary tree and Prefix Tree. It can be used for "predictive text" and "autocompletion", or for Word Frequency Statistics (edge insertion of the Trie tree to update or add Word Frequency ).
In computer science,Trie, Also knownPrefix TreeOrDictionary treeIs an ordered tree used to save the associated array, where the key is usually a string. Unlike the Binary Search Tree, keys are not directly stored in the node, but determined by the node's position in the tree. All descendants of a node have the same prefix, that is, the string corresponding to the node, while the root node corresponds to a null string. Generally, not all nodes have corresponding values. Only the keys corresponding to leaf nodes and some internal nodes have related values.
Trie is a term from re.TrieVal. According to the word, the inventor of trie, Edward Fredkin, reads itEnglish pronunciation:/Define tri partition/"tree ".[1][2] However, other authors read itEnglish pronunciation:/Please tra release/"try ".[1][2][3]
In the figure, the key is marked in the node, and the value is marked under the node. Each complete English word corresponds to a specific integer. Trie can be considered as a finite state automation, although the symbols on the edge are generally hidden in the branch sequence.
The key does not need to be explicitly stored in the node. The entire word is marked in the graph to demonstrate the trie principle.
Keys in trie are usually strings, but they can also be other structures. The trie algorithm can be easily modified to process ordered sequences of other structures, such as numbers or shapes. For example,Bitwise trieThe key in is a string of BITs, which can be used to represent integers or memory addresses.
References: http://zh.wikipedia.org/wiki/Trie
1 #! /Usr/bin/python 2 #-*-coding: UTF-8-*-3 # * trie, prefix tree 4 # * author: yangxudongsuda@gmail.com 5 import sys 6 reload (sys) 7 sys. setdefaultencoding ("UTF-8") 8 9 class Node: 10 def _ init _ (self): 11 self. value = None 12 self. children = {} 13 14 class Trie: 15 def _ init _ (self): 16 self. root = Node () 17 18 def insert (self, key, value = None, sep = ''): # key is a word sequence separated by 'sept' 19 elements = key if isinstance (key, list) else key. split (sep)
20 node = self. root 21 for e in elements: 22 if not e: continue 23 if e not in node. children: 24 child = Node () 25 node. children [e] = child 26 node = child 27 else: 28 node = node. children [e] 29 node. value = value 30 31 def search (self, key, default = None, sep = ''): 32 elements = key if isinstance (key, list) else key. split (sep)
33 node = self. root 34 for e in elements: 35 if e not in node. children: 36 return default 37 node = node. children [e] 38 return node. value 39 40 def delete (self, key, sep = ''): 41 elements = key if isinstance (key, list) else key. split (sep)
42 self. _ delete (elements) 43 44 def _ delete (self, elements, node = None, I = 0): 45 node = node if node else self. root 46 e = elements [I] 47 if e in node. children: 48 child_node = node. children [e] 49 if len (elements) = (I + 1): 50 return node. children. pop (e) if len (child_node.children) = 0 else False 51 elif self. _ delete (elements, child_node, I + 1): 52 return node. children. pop (e) if (len (child_node.children) = 0 and not child_node.value) else False 53 return False 54 55 def longest_prefix (self, key, sep = ''): 56 elements = key if isinstance (key, list) else key. split (sep)
57 results = [] 58 node = self. root 59 for e in elements: 60 if e not in node. children: 61 return sep. join (results) 62 results. append (e) 63 node = node. children [e] 64 return sep. join (results) 65 66 def longest_prefix_value (self, key, default = None, sep = ''): 67 elements = key if isinstance (key, list) else key. split (sep)
68 value = default 69 node = self. root 70 for e in elements: 71 if e not in node. children: 72 return value 73 node = node. children [e] 74 value = node. value 75 return value if value else default 76 77 def longest_prefix_item (self, key, sep = ''): 78 elements = key if isinstance (key, list) else key. split (sep)
79 node = self. root 80 value = node. value 81 results = [] 82 for e in elements: 83 if e not in node. children: 84 return (sep. join (results), value) 85 results. append (e) 86 node = node. children [e] 87 value = node. value 88 return (sep. join (results), value) 89 90 def _ collect_items (self, node, path, results, sep): 91 if node. value: 92 results. append (sep. join (path), node. value) 93 for k, v in node. children. iteritems (): 94 path. append (k) 95 self. _ collect_items (v, path, results, sep) 96 path. pop () 97 return results 98 99 def items (self, prefix, sep = ''): 100 elements = prefix if isinstance (prefix, list) else prefix. split (sep)
101 node = self. root102 for e in elements: 103 if e not in node. children: 104 return [] 105 node = node. children [e] 106 results = [] 107 path = [prefix] 108 self. _ collect_items (node, path, results, sep) 109 return results110 111 def keys (self, prefix, sep = ''): 112 items = self. items (prefix, sep) 113 return [key for key, value in items] 114 115 if _ name _ = '_ main __': 116 trie = Trie () 117 trie. insert ('Happy Platform', 1) 118 trie. insert ('Happy shopping mall ', 2) 119 trie. insert ('sm ', 1) 120 trie. insert ('sm International plase', 2) 121 trie. insert ('sm city square ', 3) 122 trie. insert ('sm plase', 4) 123 trie. insert ('sm new life square ', 5) 124 trie. insert ('sm shopping plase', 6) 125 trie. insert ('soho shangdu', 3) 126 127 print trie. search ('sm ') 128 print trie. search ('sm plase') 129 print trie. search ('sm New Oriental Plaza ') 130 print trie. search ('shenma') 131 print trie. search ('Happy title') 132 print trie. search ('Happy shopping mall ') 133 print trie. longest_prefix ('soho plase') 134 print trie. longest_prefix ('soho Shangdu square ') 135 print trie. longest_prefix_value ('soho SunDo plase') 136 print trie. longest_prefix_value ('xx Shangdu square ', 90) 137 print trie. longest_prefix_item ('soho Shangdu square ') 138 139 print '================== keys ============================ '2018 print' prefix "sm ": ',' | '. join (trie. keys ('sm ')) 141 print '==================== items =========================== '2018 print' prefix "sm ": ', trie. items ('sm ') 143 144 print '=========================== delete ============================ === '2017 trie. delete ('sm plase') 146 print trie. search ('sm plase ')
The running result is as follows:
14NoneNone12sohosoho is still 390 ('soho \ xe5 \ xb0 \ x9a \ xe9 \ x83 \ xbd ', 3) ================== keys ============================= prefix "sm ": sm | sm new life Plaza | sm City Plaza | sm shopping mall | sm International Plaza ==================== items ====== ============= prefix "sm ": [('sm ', 1 ), ('sm \ xe6 \ x96 \ xb0 \ xe7 \ x94 \ x9f \ xe6 \ xb4 \ xbb \ xe5 \ xb9 \ xbf \ xe5 \ x9c \ xba', 5 ), ('sm \ xe5 \ x9f \ x8e \ xe5 \ xb8 \ x82 \ xe5 \ xb9 \ xbf \ xe5 \ x9c \ xba', 3 ), ('sm \ xe5 \ xb9 \ xbf \ xe5 \ x9c \ xba', 4 ), ('sm \ xe8 \ xb4 \ xad \ xe7 \ x89 \ xa9 \ xe5 \ xb9 \ xbf \ xe5 \ x9c \ xba ', 6 ), ('sm \ xe5 \ x9b \ xbd \ xe9 \ x99 \ x85 \ xe5 \ xb9 \ xbf \ xe5 \ x9c \ xba', 2)] ======================== delete ======================================= None