Resolving conflicts by opening address of hash table

Source: Internet
Author: User

Label:

In the previous blog post, we described how to resolve conflicts using the link address method. Here we introduce another way: Open address law to resolve conflicts.

Basic idea: When the hash address of key code key H0 = hash (key) conflicts, H0 is the basis for generating another hash address H1, if H1 still conflicts, then H0

base, generate another hash address H2, ... until you find a non-conflicting hash address Hi, save the corresponding element in it. Depending on how the increment sequence is taken, the corresponding re-hashing method is different. There are mainly the following four kinds :


Linear detection and re-hashing

Two-time detection and hashing

Pseudo-random detection and re-hashing

Double Hash Method


(i) linear detection and re-hashing


It is easy to understand that if the location of the hash function map already has data, then it is searched backwards sequentially, until there is no data in a position to put it in. Or the table is full. Note: The number of table elements/table length <=1 is the basic requirement (that is, the filling factor).

Stacking phenomenon

definition: When dealing with a conflict with linear probing, when there is data in the i,i+1,i+2 position in the table, the next hash address if it is i,i+1,i+2 and i+3 will be required to fill in the i+3 location, multiple first hash address different records for the same subsequent hash address.

If the hash function is not good, or filling factor A is too large, it will increase the accumulation phenomenon.

We will change the code of the chain address method, status save state, there is empty, DELETED, active, delete is just tombstone, the state is set to DELETED, when inserting a new key, as long as the location is not ACTIVE can be put, If it is a deleted position, you need to release the original element and then insert it first.

Common.h

#ifndef _common_h_#define _common_h_#include <unistd.h> #include <sys/types.h> #include <stdlib.h># Include <stdio.h> #include <string.h> #define ERR_EXIT (m)   do   {     perror (m);     Exit (exit_failure);   }   while (0) #endif
hash.h

#ifndef _hash_h_#define _hash_h_typedef struct HASH hash_t;typedef unsigned int (*hashfunc_t) (unsigned int, void *); hash_ T *hash_alloc (unsigned int buckets, hashfunc_t hash_func); void Hash_free (hash_t *hash); void *hash_lookup_entry (hash_t * hash, void *key, unsigned int key_size), void Hash_add_entry (hash_t *hash, void *key, unsigned int key_size,                    void *valu e, unsigned int value_size); void Hash_free_entry (hash_t *hash, void *key, unsigned int key_size); #endif/* _hash_h_ */

hash.c

#include "hash.h" #include "common.h" #include <assert.h>typedef enum entry_status{EMPTY, ACTIVE, DELETED} E    ntry_status_t;typedef struct hash_node{enum entry_status status;    void *key; void *value;}    hash_node_t;struct hash{unsigned int buckets;    hashfunc_t Hash_func; hash_node_t *nodes;}; unsigned int hash_get_bucket (hash_t *hash, void *key); hash_node_t *hash_get_node_by_key (hash_t *hash, void *key, unsigned int key_size); hash_t *hash_alloc (unsigned int buckets, hashfunc_t hash_func) {hash_t *hash = (hash_t *) malloc (    sizeof (hash_t));    ASSERT (hash! = NULL);    Hash->buckets = buckets;    Hash->hash_func = Hash_func;    int size = buckets * sizeof (hash_node_t);    Hash->nodes = (hash_node_t *) malloc (size);    memset (hash->nodes, 0, size);    printf ("The hash table has allocate.\n"); return hash;}    void Hash_free (hash_t *hash) {unsigned int buckets = hash->buckets;    int i; for (i = 0; i < buckets; i++) {if (HASH-&GT;nodes[i].status! = EMPTY) {free (hash->nodes[i].key);        Free (hash->nodes[i].value);    }} free (hash->nodes);    Free (hash); printf ("The hash table has free.\n");}  void *hash_lookup_entry (hash_t *hash, void *key, unsigned int key_size) {hash_node_t *node = Hash_get_node_by_key (hash,    Key, Key_size);    if (node = = null) {return null; } return node->value;} void Hash_add_entry (hash_t *hash, void *key, unsigned int key_size, void *value, unsigned int value_siz        e) {if (Hash_lookup_entry (hash, key, key_size)) {fprintf (stderr, "duplicate hash key\n");    Return    } unsigned int bucket = Hash_get_bucket (hash, key);    unsigned int i = bucket;        The found location has already been survived, probing down while (hash->nodes[i].status = = ACTIVE) {i = (i + 1)% hash->buckets;        if (i = = bucket) {//not found, and the table full return;    }} hash->nodes[i].status = ACTIVE;if (Hash->nodes[i].key)//Releases the memory of the originally tombstoned item (hash->nodes[i].key);    } Hash->nodes[i].key = malloc (key_size);    memcpy (Hash->nodes[i].key, Key, key_size);    if (Hash->nodes[i].value)//Releases the memory of the originally tombstoned item (hash->nodes[i].value);    } Hash->nodes[i].value = malloc (value_size); memcpy (Hash->nodes[i].value, value, value_size);} void Hash_free_entry (hash_t *hash, void *key, unsigned int key_size) {hash_node_t *node = Hash_get_node_by_key (hash, ke    Y, key_size);    if (node = = NULL) return; Tombstone, reset flag bit node->status = DELETED;} unsigned int hash_get_bucket (hash_t *hash, void *key) {//return hash address unsigned int bucket = Hash->hash_func (hash->b    Uckets, key);        if (bucket >= hash->buckets) {fprintf (stderr, "bad bucket lookup\n");    Exit (Exit_failure); } return bucket;} hash_node_t *hash_get_node_by_key (hash_t *hash, void *key, unsigned int key_size) {unsigned int bucket = Hash_get_buckET (hash, key);    unsigned int i = bucket; while (hash->nodes[i].status! = EMPTY && memcmp (Key, Hash->nodes[i].key, key_size)! = 0) {i = (i        + 1)% hash->buckets;        if (i = = bucket)//detected a circle {//not found, and the table is full return NULL;    }}//correct, but also to confirm whether or not to survive if (hash->nodes[i].status = = ACTIVE) {return & (Hash->nodes[i]); }//If run to here, the description I is empty or has been deleted return NULL;}
main.c (test code)

#include "hash.h" #include "common.h" typedef struct stu{char sno[5];    Char name[32]; int age;}    stu_t;typedef struct stu2{int sno;    Char name[32]; int age;}    stu2_t;unsigned int hash_str (unsigned int buckets, void *key) {char *sno = (char *) key;    unsigned int index = 0;        while (*sno) {index = *sno + 4 * index;    sno++; } return index% buckets;}    unsigned int hash_int (unsigned int buckets, void *key) {int *sno = (int *) key; Return (*SNO)% buckets;} int main (void) {stu2_t stu_arr[] = {{1234, "AAAA", +}, {4568, "BBBB", Max}, {6729, "AAAA"    , 19}};    hash_t *hash = Hash_alloc (hash_int);    int size = sizeof (Stu_arr)/sizeof (stu_arr[0]);    int i;                       for (i = 0; i < size; i++) {Hash_add_entry (hash, & (Stu_arr[i].sno), sizeof (STU_ARR[I].SNO),    &stu_arr[i], sizeof (stu_arr[i));    } int sno = 4568; stu2_t *s = (stu2_t *) hash_lookup_entry (hash, &sno, sIzeof (Sno));    if (s) {printf ("%d%s%d\n", S->sno, S->name, s->age);    } else {printf ("not found\n");    } sno = 1234;    Hash_free_entry (hash, &sno, sizeof (SNO));    s = (stu2_t *) hash_lookup_entry (hash, &sno, sizeof (SNO));    if (s) {printf ("%d%s%d\n", S->sno, S->name, s->age);    } else {printf ("not found\n");    } hash_free (hash); return 0;}
Output:

The hash table has allocate.
4568 BBBB 23
Not found
The hash table has a free.
(ii) Two-time detection and re-hashing

To improve the "stacking" problem, reduce the average number of probes required to complete the search, you can use the two-time probe method.

It can be proved that when the length of the table is >buckets to prime and the filling factor of the table is not more than 0.5, the new table entry must be inserted, and no position will be probed two times.

The implementation of the code is similar to the previous linear detection hash, except that the detection method is different, but the data structure used is a bit dissimilar. In addition, the cracking treatment (i.e., the length of the table is to be expanded one time, and then the smallest prime number larger than him), if the load factor a > 1/2; then a new table is created, the old table contents are copied, so the hash_t struct needs to save a size member, the same reason, In order to copy the old table contents, the hash_node_t structure needs to save the size of *key and *value.

Hash.c

#include "hash.h" #include "common.h" #include <assert.h>typedef enum entry_status{EMPTY, ACTIVE, DELETED} E    ntry_status_t;typedef struct hash_node{enum entry_status status;    void *key; unsigned int key_size;    Void *value is useful when copying into a new hash table; unsigned int value_size;    Useful when copying into a new hash table} hash_node_t;struct hash{unsigned int buckets; unsigned int size;    Accumulate, if size > buckets/2, then need to crack to establish a new table hashfunc_t Hash_func; hash_node_t *nodes;}; unsigned int next_prime (unsigned int n); int is_prime (unsigned int n); unsigned int hash_get_bucket (hash_t *hash, void *key) ; hash_node_t *hash_get_node_by_key (hash_t *hash, void *key, unsigned int key_size); hash_t *hash_alloc (unsigned int    Buckets, hashfunc_t hash_func) {hash_t *hash = (hash_t *) malloc (sizeof (hash_t));    ASSERT (hash! = NULL);    Hash->buckets = buckets;    Hash->hash_func = Hash_func;    int size = buckets * sizeof (hash_node_t);    Hash->nodes = (hash_node_t *) malloc (size); memset (Hash->nodes, 0, size);    printf ("The hash table has allocate.\n"); return hash;}    void Hash_free (hash_t *hash) {unsigned int buckets = hash->buckets;    int i; for (i = 0; i < buckets; i++) {if (hash->nodes[i].status! = EMPTY) {Free (hash->node            S[i].key);        Free (hash->nodes[i].value);    }} free (hash->nodes); printf ("The hash table has free.\n");}  void *hash_lookup_entry (hash_t *hash, void *key, unsigned int key_size) {hash_node_t *node = Hash_get_node_by_key (hash,    Key, Key_size);    if (node = = null) {return null; } return node->value;} void Hash_add_entry (hash_t *hash, void *key, unsigned int key_size, void *value, unsigned int value_siz        e) {if (Hash_lookup_entry (hash, key, key_size)) {fprintf (stderr, "duplicate hash key\n");    Return    } unsigned int bucket = Hash_get_bucket (hash, key);    unsigned int i = bucket;    unsigned int j = i;    int k = 1; InchT odd = 1;            while (Hash->nodes[i].status = = ACTIVE) {if (odd) {i = j + k * k;            Odd = 0;            I% hash->buckets;            while (i >= hash->buckets) {i-= hash->buckets;            }} else {i = j-k * k;            Odd = 1;            while (I < 0) {i + = hash->buckets;        } ++k;    }} hash->nodes[i].status = ACTIVE;    if (Hash->nodes[i].key)////releases the memory {free (Hash->nodes[i].key) of the previously tombstoned item;    } Hash->nodes[i].key = malloc (key_size); Hash->nodes[i].key_size = key_size;    Save Key_size;    memcpy (Hash->nodes[i].key, Key, key_size);    if (Hash->nodes[i].value)//Releases the memory of the originally tombstoned item (hash->nodes[i].value);    } Hash->nodes[i].value = malloc (value_size); Hash->nodes[i].value_size = value_size;    Save Value_size; memcpy (Hash->nodes[i].value, Value, value_size);    if (+ + (hash->size) < HASH-&GT;BUCKETS/2) return;    You can search without considering the full condition of the table, or you must ensure that the filling factor of the table does not exceed 0.5 when inserting.    If it is exceeded, the table length must be expanded by one times, splitting the table.    unsigned int old_buckets = hash->buckets;    Hash->buckets = Next_prime (2 * old_buckets);    hash_node_t *p = hash->nodes;    unsigned int size;  hash->size = 0;    Calculate size = sizeof (hash_node_t) * hash->buckets; starting from 0    Hash->nodes = (hash_node_t *) malloc (size);    memset (hash->nodes, 0, size); for (i = 0; i < old_buckets; i++) {if (P[i].status = = ACTIVE) {hash_add_entry (hash, p[i].        Key, P[i].key_size, P[i].value, p[i].value_size); }} for (i = 0; i < old_buckets; i++) {//Active or deleted if (P[i].key) {free (P[i]        . key);        } if (P[i].value) {free (p[i].value); }} free (p); Release old table}void hash_free_entry (hash_t *hash, void *key, unsigned int key_size) {hash_node_t *node = hAsh_get_node_by_key (hash, key, key_size);    if (node = = NULL) return; Tombstone node->status = DELETED;}     unsigned int hash_get_bucket (hash_t *hash, void *key) {unsigned int bucket = Hash->hash_func (hash->buckets, key);        if (bucket >= hash->buckets) {fprintf (stderr, "bad bucket lookup\n");    Exit (Exit_failure); } return bucket;} hash_node_t *hash_get_node_by_key (hash_t *hash, void *key, unsigned int key_size) {unsigned int bucket = Hash_get_bucke    T (hash, key);    unsigned int i = 1;    unsigned int pos = bucket;    int odd = 1;    unsigned int tmp = POS; while (hash->nodes[pos].status! = EMPTY && memcmp (Key, Hash->nodes[pos].key, key_size)! = 0) {if        (odd)            {pos = tmp + i * i;            Odd = 0;            Pos% hash->buckets;            while (POS >= hash->buckets) {pos-= hash->buckets; }} else {pos = Tmp-i * i;            Odd = 1;            while (POS < 0) {pos + = hash->buckets;        } i++;    }} if (hash->nodes[pos].status = = ACTIVE) {return & (Hash->nodes[pos]);    }//If it is run here, the POS is empty or tombstoned//can prove that when the length of the table is hash->buckets to prime and the filling factor of the table is not more than 0.5,//The new table entry x must be able to be inserted, and no position will be probed two times.    Therefore, as long as there is at least half empty in the table, there is no problem with the table full. return NULL;}    unsigned int next_prime (unsigned int n) {//Even number is not prime if (n% 2 = = 0) {n++; } for (;!is_prime (n); n + = 2); Not a prime number, continue to seek return n;}    int is_prime (unsigned int n) {unsigned int i;        for (i = 3; I * I <= n; i + = 2) {if (n% i = = 0) {//No, return 0 return 0; }}//Yes, returns 1 return 1;}
(iii) pseudo-random detection and re-hashing


(iv) Double-hash method


Here is a performance analysis of the various methods under certain data:


We can draw a general conclusion:

The method of dealing with conflicts is best to adopt the chain address method, and the hash function uses the remainder method (where the hash function is better correlated with the characteristic of the key code) the best performance.

Resolving conflicts by opening address of hash table

Related Article

Beyond APAC's No.1 Cloud

19.6% IaaS Market Share in Asia Pacific - Gartner IT Service report, 2018

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.