Trie tree
--------------------------------------------------------------------------------
The Trie tree, also known as the word search tree and dictionary tree, is a tree structure. It is a variant of the hash tree and a multi-tree structure used for quick search. A typical application is to count and sort a large number of strings (but not limited to strings), so it is often used by the search engine system for text word frequency statistics.
The advantage of the Trie tree is: to minimize unnecessary string comparisons, the query efficiency is higher than that of hash tables. The core idea of Trie is to change the space for time. The public prefix of the string is used to reduce the overhead of the query time to improve the efficiency. The Trie tree also has its disadvantages. The Trie tree consumes a lot of memory.
Structure Features of The Trie tree:
1. The root node has no data
2. Each node except the root node contains only one character.
3. A string is formed from the root node to a node.
The structure of the Trie tree is as follows:
Implementation of the Trie tree
The following is a simple Trie tree implementation. It is assumed that it only contains 26 characters, regardless of case.
--------------------------------------------------------------------------------
#include <stdlib.h> class Trie{ public: Trie(); ~Trie(); int insert(const char* str); int search(const char* str)const; int remove(const char* str); static const int CharNum = 26; protected: typedef struct s_Trie_Node{ bool isExist; struct s_Trie_Node* branch[Trie::CharNum]; s_Trie_Node(); }Trie_Node; Trie_Node* root; }; Trie::Trie():root(NULL){} Trie::~Trie(){} Trie::s_Trie_Node::s_Trie_Node():isExist(false){ for(int i = 0; i < Trie::CharNum; ++i){ branch[i] = NULL; } } int Trie::insert(const char* str){ if(root == NULL){ root = new Trie_Node(); } Trie_Node* pos = root; int char_pos; while(pos != NULL && *str != '\0'){ if(*str >= 'a' && *str <= 'z'){ char_pos = *str - 'a'; } else if(*str >= 'A' && *str <= 'Z'){ char_pos = *str - 'A'; } else { return -1; } if(pos->branch[ char_pos] == NULL){ pos->branch[ char_pos ] = new Trie_Node(); } pos = pos->branch[ char_pos ]; str++; } if(pos->isExist){ return 0; } else { pos->isExist = true; return 1; } } int Trie::search(const char* str)const{ Trie_Node* pos = root; int char_pos; while(pos != NULL && *str != '\0'){ if(*str >= 'a' && *str <= 'z'){ char_pos = *str - 'a'; } else if(*str >= 'A' && *str <= 'Z'){ char_pos = *str - 'A'; } else { return -1; } pos = pos->branch[char_pos]; str++; } if(pos != NULL && pos->isExist){ return 1; } else { return 0; } } int Trie::remove(const char* str){ Trie_Node* pos = root; int char_pos; while(pos != NULL && *str != '\0'){ if(*str >= 'a' && *str <= 'z'){ char_pos = *str - 'a'; } else if(*str >= 'A' && *str <= 'Z'){ char_pos = *str - 'A'; } else { return -1; } pos = pos->branch[ char_pos ]; str++; } if(pos != NULL && pos->isExist){ pos->isExist = false; return 1; } else { return 0; } } #include <stdlib.h>class Trie{public: Trie(); ~Trie(); int insert(const char* str); int search(const char* str)const; int remove(const char* str); static const int CharNum = 26;protected: typedef struct s_Trie_Node{ bool isExist; struct s_Trie_Node* branch[Trie::CharNum]; s_Trie_Node(); }Trie_Node; Trie_Node* root;};Trie::Trie():root(NULL){}Trie::~Trie(){}Trie::s_Trie_Node::s_Trie_Node():isExist(false){ for(int i = 0; i < Trie::CharNum; ++i){ branch[i] = NULL; }}int Trie::insert(const char* str){ if(root == NULL){ root = new Trie_Node(); } Trie_Node* pos = root; int char_pos; while(pos != NULL && *str != '\0'){ if(*str >= 'a' && *str <= 'z'){ char_pos = *str - 'a'; } else if(*str >= 'A' && *str <= 'Z'){ char_pos = *str - 'A'; } else { return -1; } if(pos->branch[ char_pos] == NULL){ pos->branch[ char_pos ] = new Trie_Node(); } pos = pos->branch[ char_pos ]; str++; } if(pos->isExist){ return 0; } else { pos->isExist = true; return 1; }}int Trie::search(const char* str)const{ Trie_Node* pos = root; int char_pos; while(pos != NULL && *str != '\0'){ if(*str >= 'a' && *str <= 'z'){ char_pos = *str - 'a'; } else if(*str >= 'A' && *str <= 'Z'){ char_pos = *str - 'A'; } else { return -1; } pos = pos->branch[char_pos]; str++; } if(pos != NULL && pos->isExist){ return 1; } else { return 0; }}int Trie::remove(const char* str){ Trie_Node* pos = root; int char_pos; while(pos != NULL && *str != '\0'){ if(*str >= 'a' && *str <= 'z'){ char_pos = *str - 'a'; } else if(*str >= 'A' && *str <= 'Z'){ char_pos = *str - 'A'; } else { return -1; } pos = pos->branch[ char_pos ]; str++; } if(pos != NULL && pos->isExist){ pos->isExist = false; return 1; } else { return 0; }}
Trie tree Application
--------------------------------------------------------------------------------
1. Search for a large number of strings
In many scenarios, such as the search engine's statistics on the Word Frequency in text, the search engine's log statistics on the user's keyword search frequency, and so on. The following are two typical questions:
1. Find the top 10 URLs in a large number of log files.
The Trie tree is no longer suitable for this problem. For statistics on a large number of log files, the trie tree is quite fast. Combined with the minimal heap and trie tree, you can search the log file once to get the result.
2. Implement the prompt input function for a website
This problem requires that input suggestions be displayed in real time when users enter the data, which can be easily implemented using the trie tree.
2. Sort strings
For sorting large-scale strings, you only need to calculate the string once, construct the trie tree, and traverse the output to get the sorting result.
3. Find the longest public prefix of the string
This problem is obvious.