Data Structure BASICS (18) and data structure basics 18

Last Update:2015-01-15 Source: Internet

Author: User

Tags rehash

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Data Structure BASICS (18) and data structure basics 18
Hash table

Based on the set hash function H (key) and the selected method for handling conflicts, a set of keywords are mapped to a finite continuous address set (interval, the keyword "image" in the address set is used as the storage location of the corresponding record in the table. In this way, the searched table constructed is called a "hash table ".

How to construct a hash function

1. Direct addressing (array)

Linear functions with hash functions as keywords H (key) = key or H (key) = a * key + B

This method is only applicable to: address set size = the size of the keyword set

2. Digital Analysis

Assume that each keyword in the keyword set is composed of s-bit numbers (u1, u2 ,..., Us), analyzes the entire set of keywords, and extracts evenly distributed digits or their combinations as addresses.

This method is only suitable for: the frequency of occurrence of various numbers on each digit of all keywords can be estimated in advance.

3. China and France

Use the middle digit of the Square Value of the keyword as the storage address. The purpose of "the square value of a keyword" is to "Expand the difference". At the same time, everyone in the square value can be influenced by everyone in the entire keyword.

This method is suitable for: Each digit in a keyword has a high frequency of repeated numbers.

4. Folding Method

Divide keywords into several parts, and then combine them into Hash addresses. There are two kinds of superposition Processing Methods: Shift superposition and inter-boundary superposition.

This method is suitable for: the number of digits of a keyword is particularly large;

5. Except the remaining remainder

Set the hash function to {H (key) = key % p | where p ≤ m (table length) and p should be a prime number not greater than m or a prime factor not containing less than 20}

Why should I add restrictions on p?

For example, if a set of keywords are given: 12, 39, 18, 24, and 21, if p = 9 is obtained, their corresponding hash function values are: 3, 3, 0, 6, 6, 3;

It can be seen that if p contains prime factor 3, all key words containing prime factor 3 are mapped to the address in multiples of 3, thus increasing the possibility of conflict.

6. Random Number Method

Set the hash function to H (key) = Random (key), where Random is a pseudo-Random function;

Generally, this method is used to construct a hash function for keywords with different lengths.

(If the keyword is not a number, you need to digitize it first .)

In actual table creation, the method used to construct the hash function depends on the table creation keyword set (including the scope and form of the keywords ), the general principle is to minimize the possibility of conflicts (Next we will construct a hash function with the division of the remainder ).

Conflict Handling Method

The actual meaning of "processing conflict" is: Find the next hash address for the address that generates the conflict.

1. Open address Method

Obtain an address sequence for the conflicting address H (key): {H0, H1 ,..., Hs | 1 ≤ s ≤ M-1}

H0 = H (key)

Hi = (H (key) + di) % m {I = 1, 2 ,..., S}

Three methods are available for incremental di:

1) linear detection and re-partitioning
Di = c * I the simplest case c = 1

2) split by square Detection
Di = 1 ^ 2,-1 ^ 2, 2 ^ 2,-2 ^ 2 ,...,

3) random detection and re-partitioning
Di is a set of pseudo-random series or di = I × H2 (key) (also known as double hash function detection)

Note: Incremental di should have "completeness", that is, the generated Hi is different, and the generated s (m-1) Hi value can overwrite all the addresses in the hash table. Requirements:

※The table length m during square testing must be a prime number (such as 7, 11, 19, 23,…) in the form of 4j + 3 ,... );

※M and di at random test have no common factor.

2. link address method (also called zipper method)

Link all records with the same hash address to the same linked list (the method we will use ).

Design and Implementation of hash tables

// Hash table design template <typename HashedObj> class HashTable {public: typedef typename vector <HashedObj >:: size_type; public: explicit HashTable (int tableSize = 101 ): theList (tableSize), currentSize (0 ){}~ HashTable () {makeEmpty () ;}// determine whether element x exists in the bool contains (const HashedObj & x) const; void makeEmpty (); bool insert (const HashedObj & x); bool remove (const HashedObj & x); private: vector <list <HashedObj> theList; size_type currentSize; void rehash (); int myHash (const HashedObj & x) const ;};

Hash Functions

// If the keyword is not a number, you must first digitize it. template <typename Type> int hash (Type key) {return key ;} template <> int hash <const string &> (const string & key) {int hashVal = 0; for (size_t I = 0; I <key. length (); ++ I) {hashVal = 37 * hashVal * key [I];} return hashVal;} // hash function template <typename HashedObj> int HashTable <HashedObj>:: myHash (const HashedObj & x) const {// first, digitize the key. int hashVal = hash (x); // calculate the hash subscript hashVal = hashVal % theList. size (); if (hashVal <0) hashVal + = theList. size (); return hashVal ;}

Insert a hash table

// Insert template <typename HashedObj> bool HashTable <HashedObj>: insert (const HashedObj & x) {// first locate the bucket (Linked List) to be inserted) list <HashedObj> & whichList = theList [myHash (x)]; // The value if (find (whichList. begin (), whichList. end (), x )! = WhichList. end () return false; // insert whichList into the bucket. push_back (x); // If the hash table is "full" at this time (number of elements stored = number of slots in the hash table) // load factor = 1, hash if (++ currentSize> theList. size () rehash (); return true ;}

Rehash

// Determine whether it is a prime number, bool is_prime (size_t n) {if (n = 1 |! N) return 0; for (size_t I = 2; I * I <= n; I ++) if (! (N % I) return 0; return 1 ;}// find the next Prime Number size_t nextPrime (size_t n) {for (size_t I = n; ++ I) {if (is_prime (I) return I;} return-1 ;}

// Re-hash the template <typename HashedObj> void HashTable <HashedObj >:: rehash () {vector <list <HashedObj> oldList = theList; // reset the number of theList hash buckets with the first prime number that is twice the size of the original table. resize (nextPrime (2 * theList. size (); // clear the original table for (typename vector <list <HashedObj> >:: iterator iter = theList. begin (); iter! = TheList. end (); ++ iter) iter-> clear (); // insert data from the original table to the new table for (size_type I = 0; I <oldList. size (); ++ I) {typename list <HashedObj >:: iterator iter = oldList [I]. begin (); while (iter! = OldList [I]. end () {insert (* iter ++ );}}}

Hash Table search

The search process is the same as the table creation process. If an open address is used to handle conflicts, the search process is: for the given value K, calculate the hash address I = H (K). If r [I] = NULL, the search fails, if r [I]. if key = K, the search is successful. Otherwise, "Find the next address Hi" until r [Hi] = NULL (the search fails) or r [Hi]. key = K (search successful).

We use the simple link address method (also called the zipper method ):

// Search: Determine whether the specified template <typename HashedObj> bool HashTable <HashedObj>: contains (const HashedObj & x) exists in the hash table) const {const list <HashedObj> & whichList = theList [myHash (x)]; if (find (whichList. begin (), whichList. end (), x )! = WhichList. end () return true; return false ;}

Analysis of hash table search:

From the Query Process, we know that the average query length of the hash table is not equal to zero. Factors that determine the ASL of the hash table Query

1) selected hash functions;

2) Selected methods for handling conflicts;

3) Degree of hash table saturation, size of the load factor α = n/m value (n: number of records, m: Table length)

Generally, we can think that the selected hash function is "Uniform", so we can ignore the ASL factor when discussing it.

Therefore, the ASL of the hash table is a function for dealing with conflicting methods and load factors. The following results are displayed when the search is successful:

Linear detection and re-Hash:

Random detection and re-Hash:

Link address Method

From the above results, we can see that the average length of the hash table is the load factor function, not the n function. This indicates that when a hash table is used to construct a query table, you can select an appropriate fill factor to limit the average search length to a certain range (this is a unique feature of the hash table ).

Delete A hash table

// Delete template <typename HashedObj> bool HashTable <HashedObj>: remove (const HashedObj & x) {list <HashedObj> & whichList = theList [myHash (x)]; typename list <HashedObj>: iterator iter = find (whichList. begin (), whichList. end (), x); // The element if (iter = whichList. end () return false; whichList. erase (iter); -- currentSize; return true ;}

Clear hash table

// Clear the hash table template <typename HashedObj> void HashTable <HashedObj >:: makeEmpty () {for (typename vector <list <HashedObj >:: iterator iter = theList. begin (); iter! = TheList. end (); ++ iter) {iter-> clear ();}}

Appendix 1-test code

int main(){    HashTable<int> iTable;    // 1 2 3 4 5 6 7 8 9 10    for (int i = 0; i < 10; ++i)        iTable.insert(i+1);    for (int i = 0; i < 10; ++i)        if (iTable.contains(i+1))            cout << i << ": contains..." << endl;        else            cout << i << ": not contains" << endl;    cout << endl;    //1 2    for (int i = 0; i < 10; ++i)        iTable.remove(i+3);    for (int i = 0; i < 10; ++i)        if (iTable.contains(i))            cout << i << ": contains..." << endl;        else            cout << i << ": not contains" << endl;    cout << endl;    // 6 8    iTable.makeEmpty();    iTable.insert(6);    iTable.insert(8);    for (int i = 0; i < 10; ++i)        if (iTable.contains(i))            cout << i << ": contains..." << endl;        else            cout << i << ": not contains" << endl;    return 0;}

Appendix 2-Comparison of complexity of various algorithms

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More