Support for Chinese-based prefix tree (prefix trie) Python implementation with word-base granularity

Source: Internet
Author: User

Trie Tree, also known as the Dictionary tree, prefix tree. Available for "Predictive text" and "Autocompletion". It can also be used for statistical frequency (side insert trie tree edge Update or join word frequency).

In computer science. Trie, also known as a prefix tree or a dictionary tree . is an ordered tree that holds associative arrays, where the keys are usually strings. Unlike the two-fork lookup tree. The key is not stored directly in the node, but is determined by the node's position in the tree. All descendants of a node have the same prefix, the corresponding string for the node, and the corresponding empty string for the root node.

Under normal circumstances, not all nodes have corresponding values, only the leaf node and some internal nodes corresponding to the key to have the relevant value.

References: Http://zh.wikipedia.org/wiki/Trie


#!/usr/bin/python#-*-coding:utf-8-*-# * trie, prefix tree, can be used as a dict# * Author: [Email protected]impor T sysreload (SYS) sys.setdefaultencoding ("Utf-8") # Singleton Sentinel-works with Picklingclass NULL (object): Passclass N Ode:def __init__ (self, value = NULL): Self.value = value Self.children = {}class Trie (object): Def __init__ (self) : Self.root = Node () def insert (self, key, value = None, Sep = "): # key is a word sequence separated by ' Sep ' E Lements = key if Isinstance (key, list) Else Key.split (SEP) node = Self.root for E in elements:if not e:contin        UE If e not in Node.children:child = node () node.children[e] = child node = Child Else: node = node.children[e] Node.value = value def get (self, key, default = None, Sep = "): Elements = key if is Instance (key, list) Else Key.split (SEP) node = Self.root for E in elements:if e not in Node.children:r Eturn Default node = NODE.CHILDREN[E] Return default if Node.value is NULL else node.value def delete (self, key, Sep = "): Elements = Key if Isinstance (key, list) Else Key.split (Sep) return Self.__delete (elements) def __delete (self, elements, node = N One, i = 0): node = node if node else self.root e = elements[i] If e in node.children:child_node = Node.chi Ldren[e] If len (elements) = = (i+1): If Child_node.value is Null:return False # does in Dict if Len (chil      D_node.children) = = 0:node.children.pop (e) else:child_node.value = NULL return True Elif Self.__delete (Elements, Child_node, i+1): If Len (child_node.children) = = 0:return Node.children.pop (e) Return True return False def shortest_prefix (self, key, default = NULL, Sep = "): Elements = key if is      Instance (key, list) Else Key.split (SEP) results = [] Node = self.root value = Node.value for E in elements: If E in Node.children: Results.append (E) node = node.children[e] Value = Node.value If value is not NULL:      return Sep.join (Results) Else:break if value is null:if default was not null:return default    Else:raise Exception ("No item matches any prefix of the given key!") return Sep.join (Results) def longest_prefix (self, key, default = NULL, Sep = "): Elements = key if Isinstance (key, L IST) Else Key.split (SEP) results = [] Node = self.root value = Node.value for E in elements:if e not in Node.children:if value is not null:return sep.join (results) elif default was not Null:re      Turn default Else:raise Exception ("No item matches any prefix of the given key!")        Results.append (e) node = node.children[e] Value = Node.value If value is null:if the default is not NULL: Return default Else:raise Exception ("No item matches any prefIX of the given key! ") return Sep.join (Results) def longest_prefix_value (self, key, default = NULL, Sep = "): Elements = key if Isinstance ( Key, list) Else Key.split (SEP) node = self.root value = Node.value for E in elements:if e not in Node.child Ren:if value is not Null:return value elif default was not null:return default El      Se:raise Exception ("No item matches any prefix of the given key!")      node = node.children[e] Value = Node.value If value is not Null:return value if the default is not NULL:  Return default Raise Exception ("No item matches any prefix of the given key!") def longest_prefix_item (self, key, default = NULL, Sep = "): Elements = key if Isinstance (key, list) else Key.split (s        EP) node = self.root Value = node.value results = [] for E in elements:if e not in Node.children: If value is not Null:return (Sep.join (results), value) EliF default is not Null:return default else:raise Exception ("No item matches any prefix of the GI      Ven key! ") Results.append (e) node = node.children[e] Value = Node.value If value is not Null:return (Sep.join (resu LTS), value) if default is not Null:return (Sep.join (results), default) Raise Exception ("No item matches any p  Refix of the given key! ") def __collect_items (self, node, path, results, Sep): If node.value are not NULL:results.append ((Sep.join (path), no      De.value))) for K, V in Node.children.iteritems (): Path.append (k) Self.__collect_items (V, Path, results, Sep) Path.pop () return results def items (self, prefix, Sep = "): elements = prefix if isinstance (prefix, list) else Prefix.split (SEP) node = Self.root for E in elements:if e not in Node.children:return [] nod E = node.children[e] results = [] path = [prefix] self.__collect_items (node, path, results, SEP) return results def keys (self, prefix, Sep = "): items = self.items (prefix, Sep) return [key for Key,value I n items]if __name__ = = ' __main__ ': trie = Trie () trie.insert (' Happy Platform ', 1) trie.insert (' Happy Station XX ', ten) Trie.insert (' Happy platform xx yy ', one) Trie.insert (' Happy Platform Gourmet Shopping Plaza ', 2) trie.insert (' SM ') Trie.insert (' SM International ') Trie.insert (' SM International Plaza ', 2) trie.insert (' SM City Square ', 3) trie.insert (' SM Plaza ', 4) trie.insert (' SM New Life Square ', 5) trie.insert (' SM Shopping Plaza ', 6) t Rie.insert (' Soho still ', 3) print trie.get (' SM ') print Trie.longest_prefix ([], default= "Empty list") Print TRIE.LONGEST_PR Efix (' SM ') print Trie.shortest_prefix (' Happy Platform ') print Trie.shortest_prefix (' Happy Platform xx ') Print Trie.shortest_prefix ( ' SM ') print Trie.longest_prefix (' sm xx ', Sep = ' & ', default = None) print ' sm square--', Trie.get (' sm Square ') print  Trie.get (' sm Square ') Print trie.get (' God Horse ') print trie.get (' Happy Platform ') print trie.get (' Happy Platform Gourmet Shopping mall ') print Trie.longest_prefix (' SoHo Plaza ', ' Default ') print Trie.longest_prefix (' Soho still square ') print Trie.longest_prefix_value (' Soho Galleria ') print Trie.lon Gest_prefix_value (' xx still square ', "all") print Trie.longest_prefix_value (' xx still square ', ' no prefix ') print Trie.longest_prefix_ite M (' Soho still square ') print ' ============== keys ================= ' print ' prefix "SM": ', ' | '. Join (Trie.keys (' SM ')) print ' ============== items ================= ' print ' prefix "SM": ', Trie.items (' SM ') print ' =  ================ Delete ===================== ' Print trie.delete (' sm Square ') Print trie.get (' sm Square ') Print trie.delete (' SM International ') Print Trie.get (' SM International ') Print Trie.delete (' sm xx ') print trie.delete (' xx ') print ' ====== no item matches any pre Fix of given key ======== ' Print trie.longest_prefix_value (' happy ') print Trie.longest_prefix_value (' Soho xx ')

Execution Result:

None
Empty list
Sm
Happy Platform
Happy Platform
Sm
None
SM Square-4
4
None
1
2
Default
Soho is still
3
90
No prefix
(' Soho \xe5\xb0\x9a\xe9\x83\xbd ', 3)
============== Keys =================
Prefix "SM": SM | SM New Life Square | SM City Square | SM Square | SM Shopping Plaza | SM International | SM International Plaza
============== Items =================
Prefix "SM": [(' SM ', None), (' SM \xe6\x96\xb0\xe7\x94\x9f\xe6\xb4\xbb \xe5\xb9\xbf\xe5\x9c\xba ', 5), (' SM \xe5\x9f\x8e\ Xe5\xb8\x82\xe5\xb9\xbf\xe5\x9c\xba ', 3), (' SM \xe5\xb9\xbf\xe5\x9c\xba ', 4), (' SM \xe8\xb4\xad\xe7\x89\xa9 \xe5\xb9\ Xbf\xe5\x9c\xba ', 6), (' SM \xe5\x9b\xbd\xe9\x99\x85 ', '), (' SM \xe5\x9b\xbd\xe9\x99\x85 \xe5\xb9\xbf\xe5\x9c\xba ', 2) ]
================= Delete =====================
True
None
True
None
False
False
====== no item matches any prefix of given key ========
Traceback (most recent):
File "./word_based_trie.py", line 225, in <module>
Print Trie.longest_prefix_value (' happy ')
File "./word_based_trie.py", line +, in Longest_prefix_value
Raise Exception ("No item matches any prefix of the given key!")
Exception:no item matches any prefix of the given key!


Support for Chinese-based prefix tree (prefix trie) Python implementation with word-base granularity

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.