Support for Chinese-based prefix tree (prefix trie) Python implementation with word-base granularity

Source: Internet
Author: User

Trie Tree, also known as the Dictionary tree, prefix tree. Can be used for "predictive text" and "Autocompletion", can also be used for statistical frequency (side insert trie tree Edge Update or add word frequency).

In computer science,Trie, also known as a prefix tree or a dictionary tree , is an ordered tree that holds associative arrays, where the keys are usually strings. Unlike a two-fork lookup tree, a key is not stored directly in a node, but is determined by the position of the node in the tree. All descendants of a node have the same prefix, that is, the string corresponding to the node, and the root node corresponds to an empty string. In general, not all nodes have corresponding values, only the leaf node and some internal nodes corresponding to the key has the relevant value.

The term Trie is derived from the REtrieVal According to Etymology, Trie's inventor, Edward Fredkin, read itEnglish pronunciation:/?TRI?/"Tree".[1][2]  However, other authors read it as   English pronunciation: / t " Span class= "IPA nopopups" style= "line-height:1.5" >r Span class= "IPA nopopups" style= "line-height:1.5" >a ? /  "Try". [1][2][3]

In the illustration, the key is labeled in the node, and the value is labeled below the node. Each full English word corresponds to a specific integer. Trie can be seen as a deterministic finite state automaton, although the symbols on the edges are generally implied in the order of the branches.

The key does not need to be explicitly saved in the node. The complete word is shown in the illustration, just to demonstrate the principle of trie.

The keys in trie are usually strings, but they can also be other structures. The trie algorithm can be easily modified to handle ordered sequences of other structures, such as a string of numbers or a shape arrangement. For example, a key inbitwise Trie is a string of bits that can be used to represent integers or memory addresses.

Reference: Http://zh.wikipedia.org/wiki/Trie

#!/usr/bin/python #-*-Coding:utf-8-*-# * trie, prefix tree# * Author: [email protected] Import sys reload (SYS) sys.setdefaultencoding ("Utf-8") class Node:def __init__ (self): Self.value = None Self.childre n = {}class trie:def __init__ (self): Self.root = Node () def insert (self, key, value = None, Sep = "): # key is a Word sequence separated by ' sep ' elements = key.split (Sep) "node = Self.root for E" Elements:if not E:con         Tinue if e not in Node.children:child = node () node.children[e] = child node = Child Else: node = node.children[e] Node.value = value def search (self, key, default = None, Sep = "): Elements = key . Split (SEP) node = Self.root for E in elements:if e not in node.children:return default node = no De.children[e] Return Node.value def delete (self, key, Sep = "): Elements = Key.split (Sep) self.__delete (elements) def __delete (self, Elements, node = None, i = 0): node = node if node else self.root e = elements[i] If e in node.children:ch Ild_node = Node.children[e] If len (elements) = = (i+1): Return Node.children.pop (E) If Len (Child_node.children) ==0 Else False elif self.__delete (elements, Child_node, i+1): Return Node.children.pop (E) if (Len (child_node.c Hildren) ==0 and not child_node.value) else false return False def longest_prefix (self, key, Sep = "): Elements = Key.split (SEP) results = [] node = Self.root for E in elements:if e not in Node.children:return Sep . Join (results) Results.append (e) node = node.children[e] return Sep.join (results) def Longest_prefix_valu E (Self, key, default = None, Sep = "): Elements = Key.split (SEP) value = Default node = Self.root for E in E Lements:if e Not In Node.children:return Value node = node.children[e] Value = node.value return value if value else de Fault def longest_prefix_item (self, key, Sep = "): Elements = key.split (SEP) node = self.root value = Node.val UE results = [] for E in elements:if e not in Node.children:return (Sep.join (results), value) res Ults.append (e) node = node.children[e] Value = Node.value return (Sep.join (results), value) def __collect_it EMS (Self, node, path, results, Sep): If Node.value:results.append ((Sep.join (path), Node.value)) for K, V in no  De.children.iteritems (): Path.append (k) Self.__collect_items (V, Path, results, Sep) path.pop () return      Results def items (self, prefix, Sep = "): Elements = prefix.split (SEP) node = Self.root for E in elements: If e not in Node.children:return [] node = node.children[e] results = [] path = [prefix] self._ _collect_items (node, Path, results, Sep) return results def keys (self, prefix, Sep = "): items = self.items (prefix, Sep) return [key F or key,value in items]if __name__ = = ' __main__ ': trie = Trie () trie.insert (' Happy Platform ', 1) trie.insert (' Happy Platform gourmet shopping wide Field ', 2) trie.insert (' SM ', 1) trie.insert (' SM International Plaza ', 2) trie.insert (' SM City Square ', 3) trie.insert (' sm Square ', 4) Tri E.insert (' SM New Life Square ', 5) trie.insert (' SM Shopping Plaza ', 6) Trie.insert (' Soho ', 3) print trie.search (' SM ') print Trie.searc H (' SM Plaza ') Print Trie.search (' SM New Oriental Plaza ') Print Trie.search (' divine Horse ') print Trie.search (' Happy Platform ') print Trie.search (' Happ Y Platform Gourmet Shopping plaza ') print Trie.longest_prefix (' Soho Plaza ') Print Trie.longest_prefix (' Soho Galleria ') print Trie.longest_prefix_ Value (' Soho still square ') print trie.longest_prefix_value (' xx still square ', +) print Trie.longest_prefix_item (' Soho Galleria ') print ' ============== keys ================= ' print ' prefix "SM": ', ' | '. Join (Trie.keys (' SM ')) print ' ============== items ================= ' Print ' prefix ' SM ': ', Trie.items (' SM ') print ' ================= delete ===================== ' Trie.delete (' sm Square ') Print TR Ie.search (' sm Square ')

The results of the operation are as follows:

14nonenone12sohosoho 390 (' Soho \xe5\xb0\x9a\xe9\x83\xbd ', 3) ============== keys =================prefix "SM":  SM | SM New Life Square | SM City Square | SM Square | SM Shopping Plaza | SM International Plaza ============== Items =================prefix "SM":  [(' SM ', 1), (' SM \XE6\X96\XB0\XE7\X94\X9F\XE6\XB4\XBB \ Xe5\xb9\xbf\xe5\x9c\xba ', 5), (' SM \xe5\x9f\x8e\xe5\xb8\x82\xe5\xb9\xbf\xe5\x9c\xba ', 3), (' SM \xe5\xb9\xbf\xe5\x9c\ Xba ', 4), (' SM \xe8\xb4\xad\xe7\x89\xa9 \xe5\xb9\xbf\xe5\x9c\xba ', 6), (' SM \xe5\x9b\xbd\xe9\x99\x85 \xe5\xb9\xbf\xe5\ X9c\xba ', 2)]================= Delete =====================none


Support for Chinese-based prefix tree (prefix trie) Python implementation with word-base granularity

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.