So far, we've introduced sequential lookups, binary lookups, block lookups, and binary sort trees. See the article before the author:
http://blog.csdn.net/u010025211/article/details/46635325
http://blog.csdn.net/u010025211/article/details/46635183
Today's blog post will cover hash lookups.
1. Why hashes are used to find
The previous method of finding is required to achieve the ability to find elements and linear tables or elements in a tree.
If this time complexity is O (n) or O (log n), then it is not possible to give an element xto be found. We calculate the element's position I in array a by a special calculation, then we can find the element directly a[i]
The hash function is such a special calculation that can reduce the complexity of time.
2. Definition of special calculations--hash function (hashing function)
stl There are hashtable such a class can be used directly, then the hash lookup is how to implement it?
hash function construction method:  
    2.1 Direct addressing method:
The value of a linear function that takes keyword or keyword is a hash address. That is: Address H (key) = key or h (key) = a*key+b Example: A university has been enrolled since 1960. There are a number of previous admissions statistics A, the year is keyword. The hash function can be designed as: H (key) = key - 1959 Direct addressing because the keyword has a corresponding relationship with the storage address one by one, therefore, there is no conflict.
Key |
1959 |
1960 |
1961 |
1962 |
H (Key) |
0 |
1 |
2 |
3 |
If you need to check the number of students 1961 years. Then get its address H (1961) = 2, then directly take A[2],a[2] in the 1961-year enrollment.
2.2 except congruential:
Select an appropriate positive integer p (p≤ table length), with P Remove keyword. The resulting remainder is taken as a hash address. namely: H (Key) = key % P (p ≤ ) The key to the remainder method is to select the appropriate p, the general selection p is less than or equal to the hash table long A prime number of M is good .
Example: m = 8,16,32,128. 256. 512 p = 7,13,31,127,251,503 In addition to congruential can not only directly to the keyword, but also in the folding, square take the medium operation after the mold.
  2.3 Square Take the middle method:
The median of keyword squared is the hash address. Because the middle of a number of squares is related to every bit of this number. Thus. The chance of a conflict is relatively small in the square-taking method. The number of bits taken in the square-take method is determined by the length of the table.
Example: K = 456, K2 = 207936 If the length of the hash table is m=102, then 79 (middle two bits) is preferable as the value of the hash function.
2.4 Folding method:
divide a key code into the same number of bits (the last segment can be, different), the length of the segment depends on the number of address bits of the hash table, and then the stacking and (rounding up) of each segment as the hash address.
The folding method is divided into two kinds: displacement superposition and boundary superposition. , the shift overlay aligns the lowest bits of each segment and then adds them. The boundary overlay is two adjacent segments folded back and forth along the boundary and then added together.
Example: keywordk=58242324169, hash table length is 1000. The keyword is divided into three bits. The results of the two overlays are as follows: 582+ 423+ 241+69=315,582+324+ 241+96= 243
folding can be used when the number of keyword is very large, and each of the keyword has a roughly uniform distribution of numbers.
2.5 Digital Analysis Method:
If the keyword is an R-based number. and keyword that may appear in the hash table are known beforehand, it is advisable to make a hash address of several bits in the keyword.
However, there may be conflicts in the above methods, for example, when using the remainder method, 15%13 = 2, 28%13=2. A conflict occurs when there are two keyword corresponding to the same address.
3. Methods of conflict resolution
the same hash address may be obtained for different keyword, a phenomenon known as a "conflict", and a conflict keyword for that hash function. Called "Synonyms". Because keyword set selects table length is large, the conflict is unavoidable.
3.1 Open Addressing method :
Basic practice: When a conflict occurs, a probe sequence is formed in a hash table using a method, and then the probe sequence is searched unit by cell until an open address (that is, the address cell is empty) is encountered.
There are three different ways to form a probe sequence in a hash table:
⑴ Linear probing method:
basic idea: To think of a hash as a ring-shaped table, the detector sequence is (iftable length is M):
H ( k), H (k) +1,h (k) +2,..., m-1,0,1,..., H (k)-1
when solving a conflict with a linear approach. The formula for finding the next open address is:
Hi = (H (k) +i) MOD m
⑵ Two-time probing method:
the test sequence of the two-time probing method is 12,-12,22,-22 ... And so on when there is a conflict. The formula for finding the next open address is:
h2i-1 = (H (k) +i2) MOD m
h2i = (H (k)-i2) MOD m (1=< i <= (m-1)/2)
Advantage: Reduces the likelihood of a heap occurrence.
Cons: Not easy to detect the entire hash table space.
⑶ Pseudo-random probing method:
when using random probing to resolve conflicts, the formula for the next open address is:
Hi = (H (k) +ri) MOD m
among them: R1,r2,...,rm-1 is 1. 2. ...。 A random arrangement of m-1. How to arrange randomly, involving the problem of random number generation.
3.2 Again hashing:
Basic practice: When a conflict occurs. A new hash address is computed using a hash function. Until the conflict no longer occurs, i.e.
Hi = RHi (key) i =,..., k
The RHI are all different hash functions.
The advantage of such a method is that it is not easy to create a "heap", but the disadvantage is that the calculation time is added.
3.3 Chain Address method:
Basic practice: Link all keyword as synonyms in the same single-linked list. If the hash address that is generated by the selected hash function is 0~m-1, the hash table can be defined as an array of pointers consisting of M-linked header pointers.
The advantages of such a method are:
① does not produce a "heap".
② because the node space is the dynamic application, it is more suitable for watchmaking before the table length can not be determined.
③ Delete node easy from the table.
3.4 Public Overflow Zone method:
Basic practice: If the value of the hash function is [0..m-1]. Set Vector hashtable[0..m-1] as the base table. Each component holds one record, and another sets the vector overtable[0..v] as an overflow table. All keyword and keyword records are synonyms in the base table, regardless of the hash address they get from the hash function, in the event of a conflict. are filled into the overflow table.
The process of finding on a hash table is basically the same as the process of building a table. If the given value is K. Based on the hash function H set when the table was built. Computes the hash address H (K). If the corresponding space in the table is not occupied, the lookup fails. Otherwise, the address of the node with the given value of K, if the equality is found successful, or the table is set when the processing conflict method to find the next address, so repeat, until you find an address space is not occupied (find failed) or keyword (find success).
4. Implementation of hash lookup
Hash is a typical algorithm for space-time change
The process of finding on a hash table is basically the same as the process of building a table. If the given value is K, the hash address H (k) is computed according to the hash function h set when the table was built. If the corresponding space in the table is not occupied, then the lookup fails, otherwise the address of the node with the given value of K, if the equality is found successful, otherwise, according to the table set when the processing conflict method to find the next address, so repeat, Until an address space is found to be unoccupied (lookup failed) or keyword is equal (lookup succeeded).
Although the hash table establishes a corresponding relationship between the keyword and the storage location, the lookup of the hash table is still a relatively keyword process because of the conflict. Just the hash table average lookup length is much smaller than the sequential lookup. It is smaller than the two-point search.
The number of keyword that you need to compare to a given value during the lookup process depends on the following three factors: hash function, method to handle the conflict, and filling factor for the hash table.
The "good or bad" of a hash function first affects the frequency with which the conflict occurs, but assumes that the hash function is uniform. The effect on the average lookup length is not generally considered.
Keyword for the same group. Set the same hash function, but use different conflict handling methods to get a different hash table. Their average search lengths are also different.
In general, the same hash table that handles the conflict method, whose average lookup length relies on the hash table's filling factor α.
Obviously. The smaller the alpha, the smaller the chance of conflict, but the smaller the α, the more wasted space. By selecting an appropriate filling factor α, the average lookup length can be limited to one range.
Perhaps the reference code will be given
-----------------------changed to 20150626-----------------------------
Explanation of several conflict resolution methods
For Open addressing : Only data with the same length as the hash table can be stored. So there are very big limitations, and once the hash table is full, it cannot be saved.
For the chain address method : In fact, is a linked list array, different keywords corresponding to the different linked list header, all keyword as synonyms of the node link in the same single linked list.
On the left is very obviously the array, each member of the arrays contains a pointer to the head of a linked list. Of course, this list may be empty. There may also be very many elements. We assign elements to different lists based on some of the characteristics of the elements. Find the correct linked list, and then find this element from the list.
The method that the element feature transforms the subscript is the hash method. That is, the hash function.
Scope of application: high-speed Find, delete the basic data structure, usually requires the total amount of data can be put into memory.
The common overflow area law is the same. Simply change the list to an overflow array.
/* Open Addressing Method */#define TABLESIZE 10//defines the length of the hash table, typedef int hashtable[10];typedef int KeyType; #include < Iostream>using namespace Std;//searchint search_hashtable (HashTable ht, KeyType key) {int address = key% Tablesize;int Comparetimer = 0; Aviod the loop of death. while (Comparetimer < tablesize && ht[address]! = key && ht[address]! =-1) {Comparetimer + +; address = ( Address + 1)% Tablesize; Sequence probing. }if (Comparetimer = = tablesize) Return-1;return address; No match if ht[address] =-1.} Insertint insert_hashtable (HashTable ht, KeyType key) {int address;address = search_hashtable (HT, key); if (ht[address] = =-1) {ht[address] = Key;return 1;//insert success.} elsereturn-1; The key has been insert into the Hashtable yet,or the Hashtable are full.} Initializationvoid initial_hashtable (HashTable ht) {for (int i = 0; i < tablesize; i++) {Ht[i] =-1;}}
/* Chain Address method */#include <stdio.h> #include <stdlib.h> #define tablesize 5typedef int elemtype; typedef struct HASHNODE {elemtype elem; struct Hashnode *next; }hashnode; typedef struct {Hashnode chainhash[tablesize]; int count; }hashtable; int Hash_mod (Elemtype key) {return key% Tablesize; } void Inserthash (HashTable *h, int key) {Hashnode *p; int index; p = (hashnode*) malloc (sizeof (Hashnode)); P->elem = key; index = Hash_mod (key); P->next = h->chainhash[index].next; H->chainhash[index].next = p; h->count++; } void Createhashtable (HashTable *h, int n) {int key; int i; for (i = 0; i < n; i++) {printf ("Input The%d key:", i+1); scanf_s ("%d", &key); Inserthash (H, key); }} void Printhashtable (HashTable *h) {int i; Hashnode *p; for (i = 0;i <= tablesize; i++) {p = h->chainhash[i].next; while (p) {printf ("%-5d", P->elem); p = p->next; }}} int Searchhash (HashTable *h, int key) {Hashnode *p; InchT index; int counter = 0; index = Hash_mod (key); p = h->chainhash[index].next; while (p) {if (P->elem = = key) return 1; else P = p->next; } return 0; }
void Main () { int n, key; int i; HashTable H; printf ("Input the length of the Hash that we want to build:"); scanf_s ("%d", &n); for (i = 0;i <= tablesize; i++) h.chainhash[i].next = NULL; H.count = 0; Createhashtable (&h,n); printf ("The hash table, the We build is:"); Printhashtable (&h); printf ("\ninput the key," We want to search ( -1 for exit): "); scanf_s ("%d", &key); while (Key! =-1) { if (Searchhash (&h, key)) printf ("There is a%d record in the Hash table!\n", key);
else printf ("There is not a%d record in the Hash table!\n", key); printf ("\ninput the key," We want to search ( -1 for exit): "); scanf_s ("%d", &key); } Free (&h); return;}
Full code Download click: Find algorithm code c++--include order, binary, BST, hash click to open link
Article: http://www.cnblogs.com/li-hao/archive/2011/10/16/2214017.html
Full code Download click: Find algorithm code c++--contains order, binary, BST, hash http://download.csdn.net/detail/u010025211/8841123
Finding complex algorithms for the algorithm family: Hash Lookup