Hash (2) Linear detection method and double hashing method

Source: Internet
Author: User

A brief description of the hash and the link address method
Methods for resolving hash conflicts:

1. Linear detection method

If we can predict the number of elements that will be deposited into the table, and we have enough memory space to accommodate all the keywords with free space, it is not worthwhile to use the chain address method. We rely on empty storage space to resolve conflicts: Design table length m is greater than the number of elements n, open address method, the simplest open address method is the linear detection method:


The implementation of the symbol table saves the element to a hash table that is twice times the size of the element.

void HashTableInit(int max){    0;    2*max;    new Item[M];    for(int0; i < M; i++)        hash_table[i] = NULLItem;}

(1) When a conflict occurs, that is, the position to be inserted is occupied, we check the next position in the table,
(2) If the next position is also occupied, proceed to check the next one, know that a vacancy is encountered, and then insert it
When searching:

void HashTableInsert(Item item){    int hash_key = Hash(GetKey(item), M);    while (hash_table[hash_key] != NULLItem) {        hash_key = (hash_key+1) % M;    }    hash_table[hash_key] = item;    N++;}

We will check to see if there are elements in the table that match the search keyword to be probed. The linear detection method is characterized by 3 possible results per probe:
(1) Search hit (the element keyword in the current position matches the Search keyword, stop the search),
(2) Search failed (search location is empty, stop searching),
(3) Mismatch (the search location is non-empty, but does not match, continue to search the next location).

Item HashTabelSearch(KeyItem v){    int hash_key = Hash(v, M);    //遇到空位置,搜索失败    while (hash_table[hash_key] != NULLItem) {        if//搜索命中            break;        hash_key = (hash_key+1//不匹配    }    return hash_table[hash_key];}

The deletion of the linear probe table is not enough to remove only the elements corresponding to the deleted keyword. Because the empty space formed after the removal causes the search of its subsequent elements to fail (the empty space terminates the backward search). Therefore, all elements should be re-hashed into the table between the delete location and the next empty space on the right.

voidHashtabeldelete (item Item) {intHash_key = Hash (GetKey (item), M);//Find Delete location     while(Hash_table[hash_key]! = Nullitem) {if(EQ (GetKey (item), GetKey (Hash_table[hash_key))) Break;ElseHash_key = (hash_key+1)% M; }if(Hash_table[hash_key] = = Nullitem)return;    Hash_table[hash_key] = Nullitem; n--;//Delete the position to the right of the next empty space between all the elements to re-hash     for(intj = hash_key+1; HASH_TABLE[J]! = Nullitem; j = (j+1) (%M) {Hashtableinsert (hash_table[j]);    HASH_TABLE[J] = Nullitem; }}
Performance analysis

The performance of the Open address method relies on α=N/M That represents the percentage of the table that is occupied by the position, which becomes the filling factor.
For sparse tables ( α Smaller), it is expected that the table can be found in a few probes, but for a table that is close to full ( α Larger), a search is going to go through quite a lot of probing. Therefore we do not allow the expression to be nearly full in order to avoid too long searches.
In linear probing, multiple elements converge into a contiguous space to become focused, which results in slower search times. The average time overhead depends on the focus at insertion. That is, it takes a lot of probing to determine whether the search succeeds (matches) or fails (empty).

2. Double hash Table

For the linear detection method, when the focus problem is serious or the table is close to full, to search for a keyword, it is often necessary to check a number of unrelated items (before and search keyword matching elements inserted). In order to solve the focus problem, a double hashing algorithm is proposed, the basic strategy and the linear detection method, the only difference is: it is not to check each position after the conflict, but instead of using another hash function to produce a fixed increment. (Skip check and insert, reduce focus size)

Suppose the second hash function has a value of t
-Linear probing: Check the next position of the conflicting position one by one
-Double hash list: Check once every t position

Note: The second hash function needs to be carefully selected to satisfy the condition
(1) The case of excluding hash value is 0
(2) The resulting hash value must be the same as the table length m
Common `#define Hash2(v) ((v % 97) + 1)

2.1 Searching and inserting
voidHashtableinsert (item Item) {intHash_key = Hash (GetKey (item), M);intHash_key2 = HASH2 (GetKey (item), M); while(Hash_table[hash_key]! = Nullitem) {//Hash_key = (hash_key+1)% M; linear detection +1Hash_key = (hash_key+hash_key2)% M;    } Hash_table[hash_key] = Item; n++;} Item Hashtabelsearch (Keyitem v) {intHash_key = Hash (V, M);intHash_key2 = Hash2 (V, M); while(Hash_table[hash_key]! = Nullitem) {if(EQ (GetKey (Hash_table[hash_key]), V)) Break;    Hash_key = (hash_key+hash_key2)% M; }returnHash_table[hash_key];}
2.2 Delete

If the delete operation of a double hash list inherits the linear probing algorithm: Then the deletion algorithm degrades the performance of the double hash table, because the deletion of the keyword may affect the keywords in the entire tables, the workaround is to use an observation post instead of the deleted element, indicating that the position is occupied and not matched with any keywords.

2.3 Performance Analysis

If you want to ensure that the average cost of all searches is less than the T-probe, then the filling factor of the linear and double hashing methods is less than 1?1/ t √ And 1 Span class= "Texatom" id= "mathjax-span-2339" >/ t

3. Dynamic Hash Table

Because with the increase in the number of keywords in the table, the performance of the table degrades, one solution is to double the size of the table when the table is nearly full, and then reinsert all the elements. (Non-recurring operation, acceptable)

If the table supports deleted ADT operations, it is worth halving the table size as the table elements decrease. However, it is important to note that the table length doubles and the threshold for halving is different. If doubled at half full, halve when 1/8 is full.

The dynamic change of the table length can reasonably handle the change of the number of elements, and the disadvantage is the cost of re-hashing and memory allocation when the table expands and shrinks.

4. Overview of hashing
    • Linear detection is the fastest of the three (provided the memory is large enough to make the table sparse)
    • Double hashing is the most efficient use of memory (additional overhead is required to calculate the second hash value)
    • The chain address method is easy to implement (assuming a good memory allocation already exists), especially for delete operations, and for fixed table lengths, the chain address method is usually chosen.

How to choose:
-whether to choose a linear or double hash depends mainly on the cost of calculating the hash function and the filling factor. α Small, both can be; Long keywords calculate hash function overhead; loading factor α Close to 1, double hash performance is greater than linear probing.
-The chain address method requires additional memory space to store links, but there are elements in the symbol table that have already been assigned the link fields beforehand. Although not as fast as open address law, performance is still much faster than sequential search.
-Dynamic hash lists are optional when search-primary and the number of keywords cannot be accurately predicted.

5. All source code


Reference "algorithm: C language Implementation" p388-401

Hash (2) Linear detection method and double hashing method

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.