Trie tree)

Source: Internet
Author: User
Basic concepts and Properties

In computer science, trie, also known as the Prefix Tree, Dictionary tree, or word search tree, is an ordered tree used to store correlated arrays. The keys in the tree are usually strings. Unlike the Binary Search Tree, keys are not directly stored in the node, but determined by the node's position in the tree. All descendants of a node have the same prefix, that is, the string corresponding to the node, while the root node corresponds to a null string. Generally, not all nodes have corresponding values. Only the keys corresponding to leaf nodes and some internal nodes have related values.

Address:Http://www.cnblogs.com/archimedes/p/trie-tree.html, Reprinted, please specify the source address.

In the figure, the key is marked in the node, and the value is marked under the node. Each complete English word corresponds to a specific integer. Trie can be considered as a finite state automation, although the symbols on the edge are generally hidden in the branch sequence. The key does not need to be explicitly stored in the node. The entire word is marked in the graph to demonstrate the trie principle.

Keys in trie are usually strings, but they can also be other structures. The trie algorithm can be easily modified to process ordered sequences of other structures, such as numbers or shapes. For example, the key in bitwise trie is a string of bitwise elements, which can be used to represent integers or memory addresses.

The trie tree is a tree structure and a variant of the hash tree. A typical application is to count and sort a large number of strings (but not limited to strings), so it is often used by the search engine system for text word frequency statistics. It has the following advantages: minimizes unnecessary string comparisons and improves query efficiency than hash tables.The core idea of trie is to change the space for time. The public prefix of the string is used to reduce the overhead of the query time to improve the efficiency.

It has three basic properties:

(1) The root node does not contain any character. Each node except the root node contains only one character.

(2) From the root node to a node, the character passing through the path is connected to the string corresponding to the node.

(3) All subnodes of each node contain different characters.

Basic Idea (take the letter tree as an example ):
1. Insert Process
For a word, start from the root and go down along the node branches in the tree corresponding to each letter of the word until the word is traversed and the last node is marked as red, indicates that the word has been inserted into the trie tree.
2. Query Process
Similarly, the trie tree is traversed alphabetically by words starting from the root. Once a node is identified as nonexistent or the last node is not marked as red after word traversal, the word does not exist, if the last node is marked in red, the word exists.

Implementation of the trie tree

The insert, delete, and find operations of the letter tree are simple. Use a repeat loop, that is, find the child tree corresponding to the first letter in the I cycle, and then perform the corresponding operation. To implement this letter tree, as for the implementation of the trie tree, you can use arrays or dynamic pointer allocation. In normal times, you can use arrays to allocate space statically for convenience.

1. Trie struct

struct Trie{    Trie *next[26];    bool isWord;}Root;

2. insert operations

// Insert operation (also build trie tree) void insert (char * TAR) {trie * head = & root; int ID; while (* TAR) {id = * tar-'A'; If (Head-> next [ID] = NULL) Head-> next [ID] = new trie (); head = head-> next [ID]; tar ++;} head-> isword = true ;}

3. search operation

// Search for bool search (char * TAR) {trie * head = & root; int ID; while (* TAR) {id = * tar-'A '; if (Head-> next [ID] = NULL) return false; head = head-> next [ID]; tar ++;} If (Head-> isword) return true; else return false ;}

There are three methods for pointing a node to a son:

1. Open a small array of letters for each node. The corresponding subscript is the letter represented by the Son, and the content is the position of the son corresponding to the big array, that is, the label;

2. Create a linked list for each node and record who each son is in a certain order;

3. Use the expression of the Left son and right brother to record the tree.

The three methods have their own characteristics. The first method is easy to implement, but the actual space requirements are large; the second method is easy to implement, the space requirements are relatively small, but time-consuming; the third method is the minimum space requirement, but it is relatively time-consuming and not easy to write.

The following describes how to dynamically open up memory:

#define MAX_NUM 26enum NODE_TYPE{ //"COMPLETED" means a string is generated so far.  COMPLETED,  UNCOMPLETED};struct Node {  enum NODE_TYPE type;  char ch;  struct Node* child[MAX_NUM]; //26-tree->a, b ,c, .....z};struct Node* ROOT; //tree rootstruct Node* createNewNode(char ch){  // create a new node  struct Node *new_node = (struct Node*)malloc(sizeof(struct Node));  new_node->ch = ch;  new_node->type == UNCOMPLETED;  int i;  for(i = 0; i < MAX_NUM; i++)    new_node->child[i] = NULL;  return new_node;}void initialization() {//intiazation: creat an empty tree, with only a ROOTROOT = createNewNode(‘ ‘);}int charToindex(char ch) { //a "char" maps to an index<br>return ch - ‘a‘;} int find(const char chars[], int len) {  struct Node* ptr = ROOT;  int i = 0;  while(i < len) {   if(ptr->child[charToindex(chars[i])] == NULL) {   break;  }  ptr = ptr->child[charToindex(chars[i])];  i++;  }  return (i == len) && (ptr->type == COMPLETED);}void insert(const char chars[], int len) {  struct Node* ptr = ROOT;  int i;  for(i = 0; i < len; i++) {   if(ptr->child[charToindex(chars[i])] == NULL) {    ptr->child[charToindex(chars[i])] = createNewNode(chars[i]);  }  ptr = ptr->child[charToindex(chars[i])];}  ptr->type = COMPLETED;}
Trie tree Application

Trie is a simple and efficient data structure, but it has a large number of application instances.

(1) string SEARCH

Store the information about some known strings (dictionaries) in the trie tree in advance to find out whether or how often other unknown strings have occurred.

Example:

1. A Word Table consisting of n words and an article written in lowercase English are provided. Please write all the words not in the word list in the earliest order.

2. Give a dictionary where the word is a bad word. All words are lowercase letters. A text section is provided. Each line of the text is composed of lowercase letters. Determines whether the text contains any bad words. For example, if Rob is a bad word, the text problem contains bad words.

(2) The longest common prefix of a string

The trie tree uses the public prefix of multiple strings to save storage space. On the contrary, when we store a large number of strings in a trie tree, we can quickly obtain the public prefix of some strings.

Example:

Give n lower-case English strings and Q queries, that is, ask the length of the longest common prefix of one or two strings?

Solution: first, create the corresponding letter tree for all strings. At this time, we found that the length of the longest common prefix of two strings is the number of common ancestor of the nodes where they are located. Therefore, the problem is converted to the least common ancestor of offline nodes, (LCA.

Recently, the public ancestor issue is also a classic issue. You can use the following methods:

1. Use the disjoint set to use the classic Tarjan algorithm;

2. After finding the Euler sequence of the letter tree, you can convert it into a typical range minimum query (rmq) problem;

(3) sorting

The trie tree is a multi-Cross Tree. As long as you traverse the entire tree in sequence, the corresponding string is output in the lexicographically ordered result.

Example:

Give you n English names that are composed of only one word, which are different from each other, so that you can sort them in lexicographically ascending order.

(4) auxiliary structure of other data structures and algorithms

Such as suffix tree and AC automatic machine

Basic dictionary tree Template
# Define Max 26 // character set size typedef struct trienode {int ncount; // record the number of occurrences of this character struct trienode * Next [Max];} trienode; trienode memory [1000000]; int allocp = 0;/* initialize */void inittrieroot (trienode ** proot) {* proot = NULL;}/* Create a new node */trienode * createtrienode () {int I; trienode * P; P = & memory [allocp ++]; P-> ncount = 1; for (I = 0; I <Max; I ++) p-> next [I] = NULL; return P;}/* Insert */void inserttrie (trienode ** proot, CH Ar * s) {int I, K; trienode * P; If (! (P = * proot) P = * proot = createtrienode (); I = 0; while (s [I]) {k = s [I ++]-'A'; // confirm branch if (p-> next [k]) p-> next [k]-> ncount ++; else p-> next [k] = createtrienode (); P = p-> next [k];} // search for int searchtrie (trienode ** proot, char * s) {trienode * P; int I, K; If (! (P = * proot) return 0; I = 0; while (s [I]) {k = s [I ++]-'A '; if (p-> next [k] = NULL) return 0; P = p-> next [k];} return p-> ncount ;}

 

Trie tree)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.