Python trie Tree implements dictionary sorting _python

Source: Internet
Author: User
Tags data structures extend lowercase ord
The general language provides a dictionary-sorted API, such as the need to use dictionary sorting when docking with a microblogging public platform. There are many algorithms for sorting by dictionary, the easiest way to think about is string search, but it is cumbersome to implement and not very good at performance. Trie tree is a very common tree structure, it is widely used in various aspects, such as string retrieval, Chinese word segmentation, the longest common prefix for string and dictionary sorting, and so on, but also in the input method can see the figure of the trie tree.


What is the trie tree

The trie tree is often called a dictionary tree, a word lookup tree, or a prefix tree, and is a multi fork tree structure for fast retrieval. The dictionary of figures is a 10-fork tree:

The number of dictionaries with the same lowercase or uppercase English letters is a 26-fork tree. As the above figure shows, the root node of the trie tree is not to save the data, all the data is stored in its child node. There are string go, Golang, PHP, Python, Perl, and it trie tree can be constructed as shown in the following figure:

Let's analyze the picture above. In addition to the root node, each child node stores only one character. Go and Golang share the go prefix, PHP, Perl, and Python only share the P prefix. To implement dictionary sorting, the characters stored on each layer are stored in a dictionary-sorted manner (this is related to the way the traversal is). Let's take a look at how a single character is sorted in a dictionary. Only lowercase letters are considered in this article and are similar in other ways. ' A ' is in front of ' B ', and ' a ' has an ASCII code less than ' B ', so the dictionary order can be obtained by subtracting their ASCII. and Python has built-in dictionary-ordering APIs, such as:

Copy Code code as follows:

#!/usr/bin/env python
#coding: UTF8

if __name__ = = ' __main__ ':
arr = [C for C in ' Python ']
Arr.sort ()
Print arr

You can also use the bitmap that I introduced in a previous article: Python: Implementing bitmap data structures. The implementation code is as follows:

Copy Code code as follows:

#!/usr/bin/env python
#coding: UTF8

Class Bitmap (object):
def __init__ (self, max):
Self.size = Self.calcelemindex (max, True)
Self.array = [0 for I in range (self.size)]

def calcelemindex (self, num, up=false):
' Up ' is true to take the whole up, otherwise it will be rounded down '
If up:
return int ((num + 31-1)/#向上取整
Return NUM/31

def calcbitindex (self, num):
Return num% 31

def set (self, num):
Elemindex = Self.calcelemindex (num)
Byteindex = Self.calcbitindex (num)
Elem = Self.array[elemindex]
Self.array[elemindex] = Elem | (1 << byteindex)

def clean (self, i):
Elemindex = Self.calcelemindex (i)
Byteindex = Self.calcbitindex (i)
Elem = Self.array[elemindex]
Self.array[elemindex] = Elem & (~ (1 << byteindex))

def test (self, i):
Elemindex = Self.calcelemindex (i)
Byteindex = Self.calcbitindex (i)
If Self.array[elemindex] & (1 << byteindex):
Return True
Return False

if __name__ = = ' __main__ ':
MAX = Ord (' z ')
Suffle_array = [C for C in ' Python ']
result = []
Bitmap = Bitmap (MAX)
For C in Suffle_array:
Bitmap.set (Ord (c))

For I in range (MAX + 1):
If Bitmap.test (i):
Result.append (Chr (i))

print ' original array is:%s '% Suffle_array
print ' sorted array:%s '% result

Bitmap sorting cannot have duplicate characters. In fact, there are many mature algorithms, such as insert sort, hill sort, bubble sort and heap sort, etc., which are based on the ASCII subtraction method. For the sake of simplicity, this article will use Python's own sorted method to sort the dictionary of single characters. It is also possible for the reader to sort the single character array by itself, and this will allow you to customize how the string is sorted.

Realize the idea

The entire implementation consists of 2 classes: The Trie class and the node class. Node classes represent nodes in the trie tree, which are organized into a trie tree by the Trie class. Let's look at the node class first:

Copy Code code as follows:

#!/usr/bin/env python
#coding: UTF8

Class Node (object):
def __init__ (self, C=none, Word=none):
SELF.C = A single character stored by the C # node
Self.word = Word # node stored words
Self.childs = [] # child nodes of this node

Node contains three member variables. c is the character stored on each node. Word represents a complete word, and in this article is a string. Childs contains all the child nodes of this node. Now that you have C stored in each node, what's the use of storing word? and which node should this word exist on? Or use the example of the previous illustration: Go and Golang, they share the go prefix, if it is a string search, because it will provide the original string, as long as the trie tree in accordance with the path search. But for sorting, no input is provided, so it is impossible to know where the boundary of the word is, and word in the node class acts as a word boundary. It is stored on the last node of the word, as shown in the figure:

The C member in the node class if the tree is not used for searching, you may not define it because it is not required in the sort.

Next we look at the definition of the Trie class:

Copy Code code as follows:

#!/usr/bin/env python
#coding: UTF8

"' Trie tree implements string array dictionary sort '"

Class Trie (object):
def __init__ (self):
self.root = node () # Trie tree Root Reference

 def add (Self, word):
   ' Add string '
  node = self.root
  for c in word:
   pos = self.find (node, c)
   if pos < 0:
    node.childs.append (Node (c))
     #为了图简单, the
     #pos有问题 is sorted directly using Python's built-in sorted, because the POS after sort is changed, So need to find again to get the real POS
     #自定义单字符数组的排序方式可以实现任意规则的字符串数组的排序
     Node.childs = sorted (NODE.CHILDS,&NBSP;KEY=LAMBDA&NBSP;CHILD:&NBSP;CHILD.C)
     pos = self.find (node, c)
   node = node.childs[pos]
   node.word = word

def preorder (self, node):
"First Order Output"
results = []
If Node.word:
Results.append (Node.word)
For child in Node.childs:
Results.extend (Self.preorder (child))
return results

def find (self, node, c):
' Find the position where the character is inserted '
Childs = Node.childs
_len = Len (childs)
If _len = 0:
Return-1
For I in Range (_len):
if childs[i].c = = c:
return I
Return-1

def setwords (self, words):
For word in words:
Self.add (Word)

Trie contains 1 member variables and 4 methods. Root is used to refer to the Roots node, it does not store specific data, but it has child nodes. The Setwords method is used for initialization, calling the Add method to initialize the Trie tree, which is based on each string. The Add method adds each character to the child node, if it exists, it is shared and the next child node is found, and so on. Find is for finding whether a child node that stores a character has been established, and preorder is the word that gets the store first. Tree traversal has three kinds: first-order traversal, in-sequence traversal and subsequent traversal, if you do not understand, you can Google to understand. Let's take a test:

Copy Code code as follows:

#!/usr/bin/env python
#coding: UTF8

"' Trie tree implements string array dictionary sort '"

Class Trie (object):
def __init__ (self):
self.root = node () # Trie tree Root Reference

 def add (Self, word):
   ' Add string '
  node = self.root
  for c in word:
   pos = self.find (node, c)
   if pos < 0:
    node.childs.append (Node (c))
     #为了图简单, the
     #pos有问题 is sorted directly using Python's built-in sorted, because the POS after sort is changed, So need to find again to get the real POS
     #自定义单字符数组的排序方式可以实现任意规则的字符串数组的排序
     Node.childs = sorted (NODE.CHILDS,&NBSP;KEY=LAMBDA&NBSP;CHILD:&NBSP;CHILD.C)
     pos = self.find (node, c)
   node = node.childs[pos]
   node.word = word

def preorder (self, node):
"First Order Output"
results = []
If Node.word:
Results.append (Node.word)
For child in Node.childs:
Results.extend (Self.preorder (child))
return results

def find (self, node, c):
' Find the position where the character is inserted '
Childs = Node.childs
_len = Len (childs)
If _len = 0:
Return-1
For I in Range (_len):
if childs[i].c = = c:
return I
Return-1

def setwords (self, words):
For word in words:
Self.add (Word)

Class node (object):
 def __init__ (self, c=none, word=none):
   self.c          = c    #  A single character stored by a node
  self.word       = word #  node-stored word
  self.childs     = []   #  child nodes of this node

if __name__ = = ' __main__ ':
words = [' python ', ' function ', ' php ', ' food ', ' kiss ', ' perl ', ' goal ', ' go ', ' Golang ', ' easy ']
Trie = Trie ()
Trie.setwords (words)
result = Trie.preorder (trie.root)
print ' Raw string array:%s '% words
print ' Trie tree sort:%s '% result
Words.sort ()
After the sort sort of print ' python:%s '% words

Conclusion

There are so many kinds of trees. In the implementation of the tree structure, the traversal of the tree is a difficult problem and needs more practice. The above code is written in haste, without any optimizations, but on the basis of which string sorting can be implemented in any way, as well as string search.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.