Hashtree (hash tree)--similar to trie, just replaced the characters with prime numbers, Sphinx used????

Source: Internet
Author: User
Tags integer division

Excerpt from: http://blog.csdn.net/yang_yulei/article/details/46337405

The theoretical basis of hash tree

" Prime number resolution theorem "
Simply put, the number of consecutive integers with n different prime numbers that can be "distinguished" is equal to their product. "Resolution" means that these successive integers cannot have exactly the same remainder sequence.
(Proof of this theorem see: http://wenku.baidu.com/view/16b2c7abd1f34693daef3e58.html)

For example:
From 2 consecutive prime numbers, 10 consecutive prime numbers can be distinguished by about M (10) =2*3*5*7*11*13*17*19*23*29= 6,464,693,230 numbers, which have exceeded the expression range of the commonly used integers (32bit) in the computer. 100 consecutive prime numbers can be distinguished by about M (100) = 4.711930 times 10 219.
With the current CPU level, the integer division operation of 100-time redundancy is hardly difficult. In real-world applications, the overall speed of operation often depends on the number and time that the node will load the keyword into memory. In general, the time to load is determined by the size of the keyword and the hardware, and the actual overall operating time, under the same type of keyword and the same hardware conditions, depends largely on the number of times the load is loaded. Between them is a proportional relationship.

Insert

We select the prime number resolution algorithm to build a hash tree.
Select a continuous prime number starting at 2 to create a 10-layer hash tree. The first layer node is the root node, the root node has 2 nodes, the second layer has 3 nodes under each node, and so on, that is, the number of child nodes of each layer node is continuous prime. To the tenth floor, there are 29 nodes under each node.
Sub-nodes in the same node, from left to right, represent different remainder results.
For example, there are three sub-nodes under the second level node. Then from left to right respectively represent: In addition to more than 3 0, in addition to more than 3 1, in addition to more than 3 2.
The remainder of the number of hits is determined by the processing path.

Node structure: The node's key word (which is unique throughout the tree), the node's data object, whether the node is occupied by the flag bit ( when the flag bit is true, the keyword is considered valid ), and the node's sub-node point group.
Node structure of hash tree

[CPP]View PlainCopy
    1. STRUCT NODE  
    2. {  
    3.     keytype      key  ;  
    4.     ValueType    value ;  
    5.     bool          occupied ;    //with occupied to indicate whether the node is occupied. If the keyword (key) of the node is valid, then occupied should set the bit true, otherwise set to false.   
    6.     struct node*  Subnodes[1] ; //we use subnodes[i] to represent the address of the node's sub-node. (This technique is introduced in the jumping table, can look over the front blog)   
    7. } ;  

(If all nodes are built at the outset, the computational time and disk space consumed are enormous.) In actual use, you only need to initialize the root node to start working. The establishment of child nodes is established when more data is entered into the Hashi. So it can be said that hash tree is a dynamic structure like other trees. )

Let's take a random 10-digit insertion as an example to illustrate the Hashtree insertion process, the clearest diagram in history, and you can see it clearly ^_^

Some readers may have doubts about what to do if the conflict continues. First of all, if the keyword is an integer, our 10-layer hash tree can completely distinguish them, which is determined by the prime-number resolution algorithm.

(we can actually put all the key-value node at the 10th-level leaf node of the hash tree, this 10th layer of the full node contains all the number of integers, but if so, all non-leaf nodes as the key-value node index, so that the tree structure is huge, wasting space)

"It's not too clear here, this figure is created with a continuous prime number starting at 2, that is: the numbers of subtrees in each node from top to bottom levels are 2, 3, 5, 7, 11, 13, 17, 19, 23, 29. The number of subtrees for each node in the first layer is 2, and the number of tree nodes in the second layer is 5 ....

The number on the subtree, which is the index value of the child tree pointer array of its parent node "


Find

The node lookup process of hash tree is similar to the node insertion process, which is to take the remainder of the key word with prime number sequence and determine the bifurcation path of the next node according to the residue until the target node is found.
For example, the minimum "hash Tree" (Hashtree) finds the matching objects from 4G objects, not more than 10 times. That is to say: Up to O (10). In practical applications, the range of prime numbers is adjusted so that the number of comparisons is generally no more than 5 times. That is to say: Up to O (5). Therefore, we can find a balance in time and space according to our own needs.

Delete

Hash tree node Deletion process is also very simple, hash tree at the time of deletion, do not make any structural adjustment.
Just find the node you want to delete, then set the placeholder token of this node to False (that is, this node is an empty node, but it is not physically deleted).

Advantages

1. Simple Structure

2. Quick Search

3, the structure is unchanged

As can be seen from the deletion algorithm, hash tree does not make any structural adjustments when it is deleted .

Disadvantages

Non-sequencing

hash tree can be used in a wide range of areas where fast matching of large volumes of data is required. For example: Database indexing system , receipt matching in short messages, large number of routing matches, information filtering matching. The hash tree does not require additional balance and prevents degradation of the operation, the efficiency is very ideal.

Hashtree (hash tree)--similar to trie, just replaced the characters with prime numbers, Sphinx used????

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.