Data Structure Foundation (18)--design and implementation of hash table

Last Update:2015-01-13 Source: Internet

Author: User

Tags rehash

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hash Table

Map a set of keywords to a finite, address-contiguous set of addresses (intervals) based on the set hash function H (key) and the selected method of handling conflicts , and use the "image" of the keyword in the address set as the storage location for the corresponding record in the table. The lookup table so constructed is called a "hash table."

methods for constructing hash functions

1. Direct addressing method (array)

The hash function is a linear function of the keyword H (key) = key or H (key) = A*key + b

This method is only appropriate for: size of the address collection = = size of the keyword collection

2. Digital Analysis method

Assume that each keyword in the keyword collection consists of a number of s digits (U1, U2, ..., US), parses the whole of the keyword set, and extracts the evenly spaced bits or combinations of them as addresses.

This method is only suitable for: the frequency of the various numbers appearing on each of the keywords can be estimated beforehand.

3. The method of square take

The middle of the square value of the keyword as the storage address. The purpose of the "square value of the keyword" is to "widen the difference", while the middle of the square value can be affected by the whole of the keyword.

This method is suitable for: every bit in the keyword has a high frequency of repetition of certain numbers.

4. Folding method

Divide the keywords into sections and then take their overlays and hash addresses. There are two methods of superposition: Shift overlay and inter-boundary overlay.

This method is suitable for: the number of digits of the keyword is particularly numerous;

5. Residual remainder method

Set the hash function to: {H (key) = key% P | where p≤m (table length) and p should be no less than m prime or no less than 20 of the quality factor}

Why should I limit the P addition?

For example: Given a set of keywords: 12, 39, 18, 24, 33, 21, if taken p=9, then their corresponding hash function value will be: 3, 3, 0, 6, 6, 3;

It can be seen that if the p contains a mass factor of 3, then all the keywords containing the mass factor 3 are mapped to the "multiples of 3" address, thereby increasing the likelihood of "conflict".

6. Random number method

Set the hash function to: H (key) = random (key) where random is a pseudo-random function;

Typically, this method is used to construct a hash function on a keyword of unequal length.

(if the keyword is not a number, you need to digitize it first.) )

When actually watchmaking, the method used to construct the hash function depends on the case of the set of key words (including the scope and morphology of the keyword), and the general principle is to make the probability of the conflict as small as possible (we will construct the hash function with the addition of the remainder method).

ways to handle conflicts

The actual meaning of "handling conflicts" is to look for the next hash address for the address that generated the conflict.

1. Open addressing method

Obtain an address sequence for the conflicting address H (key): {H0, H1, ..., hs|1≤s≤m-1}

Where: H0 = H (key)

Hi = (H (key) + di)% m {i=1, 2, ..., s}

There are three ways to increment di:

1) linear detection and re-hashing
DI = c * I simplest case c=1

2) square detection re-hash
Di = 1^2, -1^2, 2^2, -2^2, ...,

3) random detection and re-hashing
Di is a set of pseudo-random sequences or DI=IXH2 (key) (also known as double-hash function detection)

Note: Incremental di should be "complete", i.e. the resulting hi is not the same, and the resulting s (m-1) Hi value can overwrite all addresses in the hash table. is required:

※ The table length m must be shaped like 4j+3 prime number (e.g. 7, 11, 19, 23, ...). , etc.);

※ There is no common factor for M and di at random detection.

2. Chain Address method (also known as Zipper method)

All records with the same hash address are linked in the same linked list (the method we will use).

Design and implementation of hash table

Hash table Design Template <typename hashedobj>class hashtable{public:    typedef typename VECTOR<HASHEDOBJ>:: Size_type size_type;public:    explicit HashTable (int tablesize = 101)        : Thelist (tablesize), currentsize (0) {}    ~hashtable ()    {        makeempty ();    }    Determines whether the element x exists in the hash table of    bool contains (const hashedobj &x) const;    void Makeempty ();    BOOL Insert (const hashedobj &x);    BOOL Remove (const hashedobj &x);p rivate:    vector< list

hash function
If the keyword is not a number, you need to digitize it first. Template <typename type>int hash (Type key) {    return key;} Template<>int Hash<const string &> (const string &key) {    int hashval = 0;    for (size_t i = 0; i < key.length (); ++i)    {        hashval = Notoginseng * hashval * key[i];    }    return hashval;} hash function template <typename hashedobj>int hashtable

Inserting a hash table
Insert Template <typename hashedobj>bool hashtable

Re-hash
//determine if it is prime bool    Is_prime (size_t N) {if (n = = 1 | |!n) return 0; for (size_t i = 2; i*i <= n; i++) if (! (    n%i)) return 0; return 1;}    Look for the next prime size_t NextPrime (size_t N) {for (size_t i = n;; ++i) {if (Is_prime (i)) return i; } return-1;} 
Re-hash template <typename hashedobj>void Hashtable

Lookup of a hash table
The lookup process is consistent with the watchmaking process. Assuming open addressing conflicts, the lookup process is: for the given value K, the hash address i = H (k) is computed, if r[i] = NULL The lookup is unsuccessful, if R[i].key = K is found to succeed otherwise "seek the next address Hi" until r[hi] = NULL (check Failed to find) or R[hi].key = K (found successfully).
And we use a relatively simple chain address method (also known as the Zipper method to find the implementation):
Lookup: Determines whether the element in the hash table exists in template <typename hashedobj>bool hashtable

Analysis of hash table lookups:
From the lookup process, the average lookup length of a hash table lookup is actually not equal to zero. Determining the elements of a hash table lookup for ASL
1) the selected hash function;
2) Selected methods of conflict resolution;
3) The extent to which the hash table is saturated, the size of the load factor α=n/m value (n: Number of records, M: the length of the table)
In general, it can be considered that the selected hash function is "uniform", then in the discussion of ASL, it can not consider its factors.
Therefore, the ASL of the hash table is a function of dealing with conflicting methods and loading factors. You can prove that the following results were found when the search was successful
Linear probing re-hashing:

 
     
Random probing and hashing:

 
     

 
     
Chain Address method
It can be seen from the above results that the average lookup length of a hash table is the function of the loading factor, not the N function; This shows that when you construct a lookup table with a hash table, you choose an appropriate filling factor to limit the average lookup length to a certain range (which is characteristic of the hash table).

delete operation of hash tableDelete template <typename hashedobj>bool hashtable

Empty hash table
Empty hash table template <typename hashedobj>void Hashtable

  1-  test code   
int main () {hashtable<int> iTable;    1 2 3 4 5 6 7 8 9 for (int i = 0; i < ++i) Itable.insert (i+1); for (int i = 0; i < ++i) if (Itable.contains (i+1)) cout << i << ": Contains ..." <& Lt        Endl    else cout << i << ": Not contains" << Endl;    cout << Endl;    1 2 for (int i = 0; i <; ++i) Itable.remove (i+3); for (int i = 0; i < ++i) if (Itable.contains (i)) cout << i << ": Contains ..." <&lt ;        Endl    else cout << i << ": Not contains" << Endl;    cout << Endl;    6 8 itable.makeempty ();    Itable.insert (6);    Itable.insert (8); for (int i = 0; i < ++i) if (Itable.contains (i)) cout << i << ": Contains ..." <&lt ;        Endl    else cout << i << ": Not contains" << Endl; return 0;}


 
       
2- Comparison of complexity of various algorithms

Data Structure Foundation (18)--design and implementation of hash table

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More