We have been studying suggest tree over the past few days. The suggest tree is based on the ternary search tree, so we first turn to a 3-minute search tree blog.
From: http://chenzongzhi.info /? P = 173
ENGLISH Original: http://drdobbs.com/database/184410528? Pgno = 1
Hash Tables is often used to store a bunch of strings. Although hash tables can be quickly searched, hash tables cannot show the relationship between strings. binary Search tree can be used, but the query speed is not ideal. trie can be used, but trie will waste a lot of space (of course, you can also use two arrays to save space ). therefore, ternary search trees has the advantages of fast trie query speed and the advantages of Binary Search Tree space saving.
Implement a search for 12 words
This is implemented using a binary search tree. n is the number of words, Len is the length, complexity is O (logn * n), and space is N * Len.
This is implemented using trie, complexity O (N), space here is 18*26 (assuming only 26 lower-case characters), with the growth of the word length, etc, more space is needed
This is the ternary search tree. We can see that the space complexity is the same as that of the binary search tree. The complexity is like O (n), and the constant is less than trie.
Introduction
Ternary search tree has the advantages of binary search tree saving space and fast trie query.
The ternary search tree has three nodes. When searching, it compares the current character. If the search character is small, it jumps to the left node. if the characters to be searched are large, you can jump to the friends node. if the character is exactly the same, it goes to the intermediate node. at this time, compare the next character.
For example, in the above example, to search for "ax", first compare "A", "I", and "A" <"I" to jump to the left node of "I, compare "A" <"B", jump to the left node of "B", "a" = "A", jump to the intermediate node of ", and compare the next character "X ". "X"> "S", jump to the right node of "S", and compare "X"> "T" to find that "T" has no right node. find the result. The character "ax" does not exist.
Constructor
It is implemented in C language.
Node definition:
typedef struct tnode *Tptr;typedef struct tnode {char s;Tptr lokid, eqkid, hikid;} Tnode;
First, we will introduce the search method:
Int search (char * s) // s is the string {tptr P; P = T; // T is the root node of the constructed ternary search tree. while (p) {If (* S <p-> S) {// If * s is smaller than P-> S, then the node jumps to P-> lokidp = p-> lokid;} else if (* s> P-> S) {P = p-> hikid ;} else {If (* (S) = '\ 0') {// return 1 if * s is' \ 0 ;} // If * s = p-> S, go to the intermediate node, and s ++; P = p-> eqkid ;}} return 0 ;}
Insert a string:
Tptr insert (tptr P, char * s) {If (P = NULL) {P = (tptr) malloc (sizeof (tnode); P-> S = * s; p-> lokid = p-> eqkid = p-> hikid = NULL;} If (* S <p-> S) {P-> lokid = insert (p-> lokid, S);} else if (* s> P-> S) {P-> hikid = insert (p-> hikid, S);} else {If (* s! = '\ 0') {P-> eqkid = insert (p-> eqkid, ++ S);} else {P-> eqkid = (tptr) insertstr; // insertstr is the string to be inserted to facilitate the traversal of all strings and other operations} return P ;}}
Like binary search tree, the insertion sequence is also exquisite. In the worst case, binary search tree inserts strings in sequence and degrades to a linked list. however, ternary search tree is worse than Binary Search Tree.
There must be an operation to traverse a tree.
// Output all the strings void traverse (tptr P) in Lexicographic Order. // all the nodes under a certain node are traversed here. If it is not a root node, it is a string with the same prefix {If (! P) return; traverse (p-> lokid); If (p-> S! = '\ 0') {traverse (p-> eqkid);} else {printf ("% s \ n", (char *) P-> eqkid );} traverse (p-> hikid );}
Application
Here we will first introduce two applications: Fuzzy queries, a string containing a common prefix, and an adjacent query (the Hamilton distance is smaller than a certain range)
Fuzzy search
Psearch ("root", ". A. a") should be able to match strings such as baxaca and cadakd.
Void initialize arch1 (tptr P, char * s) {If (P = NULL) {return;} If (* s = '. '| * S <p-> S) {// If * s is '. 'or * S <p-> S to find the left subtree shard arch1 (p-> lokid, S);} If (* s = '. '| * s> P-> S) {// same as psearch1 (p-> hikid, S);} If (* s = '. '| * s = p-> S) {// * s = '. 'or * s = p-> S, find the next character if (* S & P-> eqkid! = NULL) {partition arch1 (p-> eqkid, S + 1 );}} if (* s = '\ 0' & P-> S =' \ 0') {printf ("% s \ n", (char *) p-> eqkid );}}
Solve the Problem of matching within the Hamilton distance. For example, the Hamilton distance between Hober, dobbd, and hocbe is 2.
Void nearsearch (tptr P, char * s, int d) // s is the string to be searched, and D is the Hamilton distance {If (P = NULL | D <0) return; If (D> 0 | * S <p-> S) {nearsearch (p-> lokid, S, d );} if (D> 0 | * s> P-> S) {nearsearch (p-> hikid, S, d );} if (p-> S = '\ 0') {If (INT) strlen (s) <= d) {printf ("% s \ n ", (char *) P-> eqkid) ;}} else {nearsearch (p-> eqkid, * s? S + 1: S, (* s = p-> S )? D: d-1 );}}
Search engine input bin, and then find all the results with a bin prefix matching similar to this. For example, Bing, binha, binb is to find all the prefix matching results.
Void presearch (tptr P, char * s) // The prefix to be searched {If (P = NULL) return; If (* S <p-> S) {presearch (p-> lokid, S);} else if (* s> P-> S) {presearch (p-> hikid, S );} else {If (* (S + 1) = '\ 0') {traverse (p-> eqkid); // traverses this node, that is, to find all the characters including this node return;} else {presearch (p-> eqkid, S + 1 );}}}
Summary
1. Ternary search tree is efficient and easy to implement
2. Ternary search tree is generally more efficient than hash, because when there is a large amount of data, the probability of hash collision will also be greater, while ternary search tree is exponential growth.
3. Ternary Search Tree Growth and contraction are very convenient. If hash changes the size, you need to copy the memory and re-hash it.
4. The ternary search tree supports operations such as fuzzy match, Hamilton distance search, and prefix search.
5. The ternary search tree supports many other operations, such as outputting all strings in Lexicographic Order. Trie can also be used, but it takes a lot of time.
Reference: http://drdobbs.com/database/184410528? Pgno = 1