Introduction to Algorithms 11th chapter of reading notes hash list

Source: Internet
Author: User
Tags rehash

This chapter introduces the concept of hash table, the design of hash function and the processing of hash conflicts. The hash list is similar to the dictionary directory, the found element has a key corresponding to it, in practice, hashing technology is very efficient, reasonable design of the hash function and conflict processing, can make the hash list to find an element of the expected time is O (1). Hash list is the generalization of the concept of ordinary arrays, in the hash table, not directly using the keyword as an array subscript, but according to the keyword by the hash function calculated. The introduction of the hash table in the book is very focused on reasoning and proof, looking at the time of a daze, once again proved that mathematics is really important. In the STL map container function is the function of the hash list, but the map is implemented by red and black trees, followed by learning, the operation of the map can be consulted: http://www.cplusplus.com/reference/map/.

1. Direct Addressing table

When the keyword of the whole domain (range) u relatively small, direct addressing is a simple and efficient technology, generally can be used to implement direct addressing table array, array subscript corresponding to the value of the keyword, that is, the element with the keyword K is placed in the slot K of the direct addressing table. The dictionary operation of the Direct addressing table is simple and can be done directly by manipulating the array, just O (1) time.

2. Hash list

The disadvantage of the direct addressing table is that when the range of keywords is large, you construct a store under the limit of the memory capacity of the computer | u|-sized tables are not very practical. When the keyword set k stored in the dictionary is smaller than the number of all possible key fields U, the hash table requires much less storage space than the direct addressing tables. The hash table calculates the position of the keyword K in the slot by the hash function H. The hash function h maps the keyword field u to the slot of the hash table t[0....m-1]. That is, h:u->{0,1...,m-1}. The purpose of the hashing function is to reduce the cost of space by narrowing the range of small scales that need to be processed.

Hash list problems: Two keywords may be mapped to the same slot, i.e. collisions (collision). There is a need to find effective ways to resolve collisions.

3. Hash function

A good hash function is characterized by the possibility of each keyword being hashed to any one of the M slots, regardless of which slot the other keywords have been hashed to. Most hash functions assume that the key field is a natural number n={0,1,2,....}, and if the given keyword is not a natural number, there must be a way to interpret them as natural numbers. For example, when a keyword is a string, you can convert it to a natural number by adding the ASCII code of each character in the string. The book introduces three design options: Division hashing, multiplication dispersion, and global hashing.

(1) Division hashing method

The keyword K is mapped to one of the M slots by taking the remainder of K divided by M. the hash function is: h (k) =k mod m . M should not be a power of 2, usually the value of M is a prime number that is not too close to the integer power of 2.

(2) Multiplication hashing method

This method to look at the time is not very clear, do not know what the meaning of the first to record the basic ideas, good digestion in the future. It takes two steps to construct a hash function by multiplying the hashing method. In the first step, the keyword K is multiplied by the constant A (0<a<1) and the decimal part of Ka is extracted. Then, multiply m by this value and then the bottom of the result. The hash function is as follows: H (k) = m (KA mod 1).

(3) Full-domain hashing

Given a set of hash function h, a hash function h is randomly selected from h at each hash, making h independent of the keyword to be stored. The average performance of the global hash function class is relatively good.

4. Collision Treatment

There are usually two types of methods for dealing with collisions: open-addressing method and link (Chaining) method. the former is to store all the nodes in the hash table t[0..m-1]; the latter is usually placed in a linked list of all the elements that are hashed into the same slot, and the head pointer of this list is placed in the hash table t[0..m-1].

(1) Open addressing method

All elements are in a hash table, each of which contains an element of a dynamic collection, or contains nil. In this way, the hash table may be filled so that no new elements can be inserted. In open addressing, when inserting an element, you can continuously inspect or probe the items of the hash table until there is an empty slot to place the keyword to be inserted. There are three techniques for open addressing: linear probing, two probing, and double probing.

<1> Linear detection

Given a normal hash function h ': u->{0,1,.....,m-1}, the hash function used by the linear probing method is: H (k,i) = (h ' (k) +i) mod m,i=0,1,...., m-1

Probe starting from I=0, first probe t[h ' (k)], then detect T[h ' (k) +1], ..., until T[h ' (k) +m-1], and then loop to t[0],t[1], ..., until the detection of T[h ' (k)-1]. The detection process terminates in three cases:
(1) If the current probe is empty, it indicates that the lookup failed (if inserted, the key is written to it);
(2) If the current probe unit contains a key, the lookup succeeds, but the insertion means failure;
(3) If the T[h ' (k)-1] is detected, no empty cell is found and no key is found, then either the lookup or the insert means the failure (at which time the table is full).

The linear detection method is easier to implement, but there is a cluster problem, that is, the sequence of continuously occupied slots becomes more and more long. Using an example to illustrate the linear detection process, a set of keywords is known (26,36,41,38,44,15,68,12,6,51), with the addition of the remainder method to construct the hash function, the initial situation is as follows:

The hash process is as follows:

<2> Two-time detection

The probing sequence of the two-time detection method is: H (k,i) = (h ' (k) +i*i)%m, 0≤i≤m-1. The initial detection position is t[h ' (k)], the post-order detection position on the sub-base plus an offset, the offset is two times dependent on I. The flaw of this method is that it is not easy to detect the entire hash space.

<3> Double Hash

This method is one of the best ways to open addressing because its resulting permutations have many characteristics of randomly selected permutations. The hash function used is: h (k,i) = (H1 (k) +ih2 (k)) mod m. Where H1 and H2 are auxiliary hash functions. The initial probe position is T[H1 (k)], and the subsequent detection position is based on the offset H2 (k) modulo m.

(2) Linking method

Links all keywords to synonyms in the same linked list. If the hash list length selected is M, the hash list can be defined as an array of pointers consisting of M head pointers T[0..m-1]. All nodes with hash address I are inserted into a single linked list with T[i] as the head pointer. The initial value of each component in T should be a null pointer. In the Zipper method, the filling factor α can be greater than 1, but generally take α≤1.

Examples of the implementation of the link method, with a set of keywords (26,36,41,38,44,15,68,12,6,51), with the addition of the remainder method to construct the hash function, the initial situation as shown:

The end result is as follows:

5. String hash

Usually the key of the element is converted to a number for hashing, if the key itself is an integer, then the hash function can take keymod tablesize (to ensure that the tablesize is prime). In practice, strings are often used as keywords, such as body names, positions, and so on. This time you need to design a good hash function process to handle the elements of the keyword string. Refer to the data structure and algorithm analysis of the 5th chapter, there are several ways to deal with:

Method 1: Add the ASCII value of all the characters of the string, the resulting sum, and the keyword as the element. The hash function is designed as follows:

1 int hash (const string& key,int tablesize) 2 {3     int hashval = 0;4 for     (int i=0;i<key.length (); i++) 5            Hashval + = key[i];6     return hashval% tablesize;7}

The disadvantage of this method is that the element cannot be effectively distributed, such as the assumption that the keyword is a 8-letter string, and the hash list is 10007 in length. The largest ASCII code is 127, according to Method 1 can be obtained the maximum number of keywords corresponding to 127x8=1016, that is, through the hash function mapping can only be mapped to the hash table slot 0-1016, so that most of the slots are not used, uneven distribution, and thus inefficient.

Method 2: Assuming that the keyword has at least three letters, the hash function simply hashes the first three letters. The hash function is designed as follows:

1 int hash (const string& key,int tablesize) 2 {3         //27 represents the number of letters plus the BLANK4         return ( KEY[0]+27*KEY[1]+729*KEY[2])%tablesize;5}

The method simply hashes the first three characters of the string, the maximum value is 2851, if the hash length is 10007, then only 28% of the space is used, most of the space is not used. Therefore, if the hash table is too large, it is not very applicable.

Method 3: Construct a polynomial of prime (usually 37) with Horner's rules, (very ingenious, not sure why 37). The calculation formula is: key[keysize-i-1]37^i,0<=i<keysize sum. The hash function is designed as follows:

1 int hash (const string & Key,int Tablesize) 2 {3         int hashval = 0; 4 for         (int i =0;i<key.length (); i++) 5
   hashval = 37*hashval + key[i]; 6         hashval%= tablesize; 7         if (hashval<0)  //Calculated hashval Overflow
8 Hashval + = tablesize; 9 return hashval;10}

The problem with this approach is that if the string keyword is longer, the calculation of the hash function becomes longer, potentially leading to a computed hashval overflow. For this scenario, you can take a partial character of a string to calculate, such as a character that calculates even or odd digits.

6, re-hash (rehashing)

If the hash list is full, then the new element will fail when it is inserted into the hash table. This can be done by creating another hash to make the new hash list more than twice times the length of the current hash, recalculating the hash value of each element and inserting it into the new hash list. The problem with hashing is when it's best to do it, there are three things you can tell if you want to hash again:

(1) When the hash table is almost full, given a range, for example, the hash is already used in the 80%, this time to re-hash.

(2) When a new element fails to be inserted, it is re-hashed.

(3) According to the loading factor (storing n elements of the hash table T with M slots, loading factor α=n/m, that is, the average number of elements stored in each chain) is judged, when the loading factor reaches a certain threshold, the hash is carried out.

When using the link method to deal with the collision problem, the third method is the best in the hashing efficiency.

7. Example Practice

After reading the book, there is a desire to put the hash table to achieve the impulse. In this design, the hash table for the keyword is a string element, the string hash function Method 3 for the design of the hash function, using the link method to deal with collisions, and then using the loading factor (specified as 1, while mapping n elements to a linked list, that is, n==m time) to re-hash. Using C + +, vector and list, the design of the hash table framework is as follows:

1 Template <class t> 2 class HashTable 3 {4 public:5     HashTable (int size = 101); 6     int Insert (const t& x); 7     int Remove (const t& x), 8     int contains (const t& x), 9     void Make_empty (), and     void display () const ; Private:12     vector<list<t> > lists;13     int currentsize;//The number of elements in the current hash list     int hash (const string& key),     Myhash (const t& x),     rehash (), 17};

The complete program implemented is as follows:

  1 #include <iostream> 2 #include <vector> 3 #include <list> 4 #include <string> 5 #include &  Lt;cstdlib> 6 #include <cmath> 7 #include <algorithm> 8 using namespace std; 9 int nextprime (const int n); The template <class t> class HashTable {public:16 HashTable (int size = 101), and the int insert (con St t& x); + int Remove (const t& x); contains int (const t& x); void Make_empty (); The void display () const; Private:23 vector<list<t> > lists; CurrentSize int; int hash (const string& key); Myhash Int (const t& x); void rehash (); 28}; Template <class t> hashtable<t>::hashtable (int size) + lists = Vector<list<t> ;(size); currentsize = 0; The <class t> $ int Hashtable<t>::hash (const string& key): 0, 4 1 int tablesize = Lists. Size (); (int i=0;i<key.length (); i++) Hashval = 37*hashval+key[i]; Hashval%= tablesize; if (Hashval < 0) Hashval + = Tablesize; Hashval return; <class t>, int hashtable<t>:: Myhash (const t& x) * * (); The return hash (key); t> <class int Hashtable<t>::insert (const t& x) (+ list<t> &whichlis t = Lists[myhash (x)]; if (Find (Whichlist.begin (), Whichlist.end (), x)! = whichlist.end ()) 0; Whichlist.push_back (x); CurrentSize = currentsize + 1; if (CurrentSize > Lists.size ()) rehash (); 1; <class t> int Hashtable<t>::remove (const t& x) @ typename std::list& Lt T>::iterator ITER; list<t> &whichlist = Lists[myhash (x)]; The Whichlist.begin iter = find (n/a), Whichlist.end (), x); 76     if (iter! = Whichlist.end ()) whichlist.erase (ITER); currentsize-- Turn 1; Bayi} The return 0; <class t> x int hashtable<t>::contains (const t& ×). {list<t> Whichli Qty TypeName Std::list<t>::iterator iter; Whichlist = Lists[myhash (x)]; The Whichlist.begin iter = find (n/a), Whichlist.end (), x); if (iter! = Whichlist.end ()), return 1; 94 return 0; <class t> 98 void Hashtable<t>::make_empty () (int i=0;i<lists.size (); i+ +) 101 lists[i].clear (); 102 currentsize = 0;103 return 0;104}105 106 template <class t>107 void Hash Table<t>::rehash () 108 {109 vector<list<t> > oldlists = lists;110 lists.resize (nextPrime (2*lists). Size ())); 111 for (int i=0;i<lists.size (); i++) lists[i].clear (); 113 currentsize = 0;114 for (int i= 0;i<olDlists.size (); i++) typename Std::list<t>::iterator iter = Oldlists[i].begin (); 117 while ( Iter! = Oldlists[i].end ()) 118 insert (*iter++); 119}120}121 template <class t>122 void hashtable<         T&gt::d isplay () const123 {124 for (int i=0;i<lists.size (); i++), 126 cout<<i<< ":"; 127         TypeName Std::list<t>::const_iterator iter = Lists[i].begin (); (iter = Lists[i].end ()) 129     {cout<<*iter<< ""; 131 ++iter;132}133 cout<<endl;134 }135}136 int nextprime (const int N) 137 {138 int ret,i;139 ret = n;140 while (1) 141 {142 int flag = 1;143 for (i=2;i<sqrt (ret); i++) 144 if (ret% i = = 0) 145 {146 flag = 0;1             break;148}149 if (flag = = 1) break;151 else152 {153 RET = ret +1; 154 continue;155}156}157 return ret;158}159 class Employee161 {162 public:163 Empl Oyee () {}164 Employee (const string N,int s=0): Name (n), salary (s) {}165 const string & GetName () const {return n Ame }166 bool operator = = (Const Employee &AMP;RHS) const167 {168 return getName () = = Rhs.getname (); 169}1 $ bool Operator! = (const Employee &AMP;RHS) const171 {172 return!         *this = = RHS); 173}174 friend ostream& operator << (ostream& out,const employee& e) 175 {176      out<< "(" <<e.name<< "," <<e.salary<< ")"; 177 return out;178}179 private:180 string name;181 int salary;182};183 184 int main () 185 {186 Employee E1 ("Tom", 6000); 187 employee E2 ("Anke R ", 7000); 188 employee E3 (" Jermey ", 8000); 189 employee E4 (" Lucy ", 7500); hashtable<employee> emp_table (1 3); 191 192 Emp_table.insert (E1); 193 emp_table.inSERT (E2); 194 Emp_table.insert (E3); 195 Emp_table.insert (E4); 196 197 cout<< "Hash table is:" <<endl;1 98 Emp_table.display () 199 if (Emp_table.contains (e4) = = 1) cout<< "Tom is exist in hash table" < <endl;201 if (emp_table.remove (e1) = = 1) 202 cout<< "Removing Tom form the hash table successfully" &lt     ; <endl;203 if (emp_table.contains (e1) = = 1) 204 cout<< "Tom is exist in hash table" <<endl;205  else206 cout<< "Tom is not exist in hash table" <<endl;207//emp_table.display (); 208 exit (0); 209 }

The program test results are as follows:

Reference: http://www.cnblogs.com/zhanglanyun/archive/2011/09/01/2161729.html

Introduction to Algorithms 11th chapter of reading notes hash list

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.