Python Trie tree for dictionary sorting

Source: Internet
Author: User

Generally, Languages provide dictionary-based APIs. For example, you need to use dictionary-based sorting when you connect to a public platform. There are many algorithms for sorting by dictionary, and the most easy to think of is the string SEARCH method. However, this method is difficult to implement and has poor performance. The Trie tree is a commonly used tree structure. It is widely used in various aspects, such as string retrieval, Chinese word segmentation, longest common prefix of a string, and dictionary sorting, you can also see the Trie tree in the input method.


What is a Trie tree?

The Trie tree, also known as the dictionary tree, word search tree, or Prefix Tree, is a multi-tree structure used for quick search. The number dictionary is a 10-digit tree:

Similarly, the number of lowercase English letters or uppercase English letters is a 26-fold tree. As we can see, the root node of the Trie tree does not store data, and all the data is saved in its child node. There are strings go, golang, php, python, and perl. The Trie tree can be constructed as shown in:

Let's analyze the figure above. Except the root node, each subnode stores only one character. Go and golang share the go prefix. php, perl, and python only share the p prefix. To sort the dictionary, the characters stored on each layer of nodes are stored in alphabetical order (this is related to the Traversal method ). Let's take a look at how to sort a single character in the dictionary. This article only takes lower-case letters into consideration. Other methods are similar. 'A' is prior to 'B', and 'A''s ASCII code is smaller than 'B''s ASCII code. Therefore, the dictionary order can be obtained through their ASCII subtraction. In addition, python has built-in dictionary sorting APIs, such:

Copy codeThe Code is as follows:
#! /Usr/bin/env python
# Coding: utf8

If _ name _ = '_ main __':
Arr = [c for c in 'python']
Arr. sort ()
Print arr

You can also use bitmap introduced in my previous article to implement Python: bitmap data structure. The implementation code is as follows:

Copy codeThe Code is as follows:
#! /Usr/bin/env python
# Coding: utf8

Class Bitmap (object ):
Def _ init _ (self, max ):
Self. size = self. calcElemIndex (max, True)
Self. array = [0 for I in range (self. size)]

Def calcElemIndex (self, num, up = False ):
'''Up is True, it is rounded up, otherwise it is rounded down '''
If up:
Return int (num + 31-1)/31) # rounded up
Return num/31

Def calcBitIndex (self, num ):
Return num % 31

Def set (self, num ):
ElemIndex = self. calcElemIndex (num)
ByteIndex = self. calcBitIndex (num)
Elem = self. array [elemIndex]
Self. array [elemIndex] = elem | (1 <byteIndex)

Def clean (self, I ):
ElemIndex = self. calcElemIndex (I)
ByteIndex = self. calcBitIndex (I)
Elem = self. array [elemIndex]
Self. array [elemIndex] = elem &(~ (1 <byteIndex ))

Def test (self, I ):
ElemIndex = self. calcElemIndex (I)
ByteIndex = self. calcBitIndex (I)
If self. array [elemIndex] & (1 <byteIndex ):
Return True
Return False

If _ name _ = '_ main __':
MAX = ord ('Z ')
Suffle_array = [c for c in 'python']
Result = []
Bitmap = Bitmap (MAX)
For c in suffle_array:
Bitmap. set (ord (c ))

For I in range (MAX + 1 ):
If bitmap. test (I ):
Result. append (chr (I ))

Print 'original array: % s' % suffle_array
Print 'sorted array: % s' % result

The sorting of bitmap cannot contain repeated characters. In fact, there are already many mature algorithms for dictionary sorting based on ASCII code subtraction, such as insert sorting, Hill sorting, Bubble sorting, and heap sorting. In this article, we will use the sorted method provided by Python to sort the dictionary of a single character. If you can sort a single character array by yourself, you can also customize the string sorting method.

Implementation

The entire implementation includes two classes: Trie class and Node class. The Node class indicates the nodes in the Trie tree, which is organized into a Trie tree by the Trie class. Let's first look at the Node class:

Copy codeThe Code is as follows:
#! /Usr/bin/env python
# Coding: utf8

Class Node (object ):
Def _ init _ (self, c = None, word = None ):
Self. c = c # single character stored by the node
Self. word = word # words stored by nodes
Self. childs = [] # subnode of this node

Node contains three member variables. C is the character stored on each node. Word indicates a complete word. In this article, it refers to a string. Childs contains all child nodes of this node. Since c is stored in each node, what is the use of word storage? Which node should the word be stored on? For example, go and golang share the go prefix. It is easy to search strings because the original string is provided, search by path on the Trie tree. However, for sorting, no input is provided, so you cannot know where the word boundary is, and word in the Node class acts as the word boundary. Specifically, it is stored on the last node of a word ,:

If the c Member in the Node class is not used for search, it can be not defined because it is not required in sorting.

Next let's take a look at the Trie class definition:

Copy codeThe Code is as follows:
#! /Usr/bin/env python
# Coding: utf8

'''Trie tree to sort String Array dictionaries '''

Class Trie (object ):
Def _ init _ (self ):
Self. root = Node () # Trie tree root Node reference

Def add (self, word ):
'''Add string '''
Node = self. root
For c in word:
Pos = self. find (node, c)
If pos <0:
Node. childs. append (Node (c ))
# To make the chart simple, use the Python built-in sorted for sorting.
# There is a problem with pos. Because the pos after sort will change, you need to find again to obtain the real pos.
# You can sort string arrays of any rule by customizing the sorting method of single-character arrays.
Node. childs = sorted (node. childs, key = lambda child: child. c)
Pos = self. find (node, c)
Node = node. childs [pos]
Node. word = word

Def preOrder (self, node ):
'''Fifo output '''
Results = []
If node. word:
Results. append (node. word)
For child in node. childs:
Results. extend (self. preOrder (child ))
Return results

Def find (self, node, c ):
'''Position for character insertion '''
Childs = node. childs
_ Len = len (childs)
If _ len = 0:
Return-1
For I in range (_ len ):
If childs [I]. c = c:
Return I
Return-1

Def setWords (self, words ):
For word in words:
Self. add (word)

Trie contains one member variable and four methods. Root is used to reference the root node. It does not store specific data but has subnodes. The setWords method is used for initialization and the add method is called to initialize the Trie tree. This call is based on each string. The add method adds each character to a subnode. If it exists, it shares it and finds the next subnode. Find is used to find whether a subnode stores a character, and preOrder is used to obtain the stored word in sequence. There are three ways to traverse a tree: first, middle, and last. If you do not understand it, You can Google it yourself. Next let's test:

Copy codeThe Code is as follows:
#! /Usr/bin/env python
# Coding: utf8

'''Trie tree to sort String Array dictionaries '''

Class Trie (object ):
Def _ init _ (self ):
Self. root = Node () # Trie tree root Node reference

Def add (self, word ):
'''Add string '''
Node = self. root
For c in word:
Pos = self. find (node, c)
If pos <0:
Node. childs. append (Node (c ))
# To make the chart simple, use the Python built-in sorted for sorting.
# There is a problem with pos. Because the pos after sort will change, you need to find again to obtain the real pos.
# You can sort string arrays of any rule by customizing the sorting method of single-character arrays.
Node. childs = sorted (node. childs, key = lambda child: child. c)
Pos = self. find (node, c)
Node = node. childs [pos]
Node. word = word

Def preOrder (self, node ):
'''Fifo output '''
Results = []
If node. word:
Results. append (node. word)
For child in node. childs:
Results. extend (self. preOrder (child ))
Return results

Def find (self, node, c ):
'''Position for character insertion '''
Childs = node. childs
_ Len = len (childs)
If _ len = 0:
Return-1
For I in range (_ len ):
If childs [I]. c = c:
Return I
Return-1

Def setWords (self, words ):
For word in words:
Self. add (word)

Class Node (object ):
Def _ init _ (self, c = None, word = None ):
Self. c = c # single character stored by the node
Self. word = word # words stored by nodes
Self. childs = [] # subnode of this node

If _ name _ = '_ main __':
Words = ['python', 'function', 'php', 'food', 'Kiss ', 'perl', 'goal', 'Go', 'golang ', 'easy']
Trie = Trie ()
Trie. setWords (words)
Result = trie. preOrder (trie. root)
Print 'original string array: % s' % words
After sorting the print 'trie tree: % s' % result
Words. sort ()
After sorting 'python sort: % s' % words

Conclusion

There are many types of trees. In the tree structure implementation, tree traversal is a difficult issue and more exercises are required. The above code was written in a rush without any optimization. However, on this basis, you can sort strings in any way and search strings.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.