Classic algorithm learning-Hash search
Hash Lookup is also called Hash Lookup. The so-called hash is to establish a definite correspondence relationship f between the storage location of the record and the key word of the record, so that each key corresponds to a storage location f (key ). Find the ing f (key) of the given value based on the corresponding relationship. If this record exists in the search set, it must be at the position of f (key. Hash is both a storage method and a search method. Sample Code uploads to: https://github.com/chenyufeng1991/HashSearch.
Six Methods to construct hash functions:
(1) Direct addressing
Function formula: f (key) = a * key + B (a and B are constants)
The advantage of this method is that it is simple, uniform, and does not produce conflicts. However, you need to know the distribution of keywords in advance, which is suitable for finding small and continuous tables.
(2) Digital Analysis
That is, several digits in the keyword are extracted to form a hash address. For example, our 11-digit mobile phone number is "187 *** 1234", and the first three are access numbers, which generally correspond to different telecommunications companies. The four digits in the middle indicate the region. The last four digits indicate the real user number.
If you want to store the mobile phone number of an employee in a department and use the mobile phone number as the keyword, it is very likely that the first seven digits are the same, so we should select the last four digits as the hash address.
(3) China and France
The number of digits in the middle after the square of the keyword is used as the hash address. Because the number of digits in the middle of a number is related to each digit of the number, the chance of conflicts between China and France is relatively small. The number of bits obtained by China and France is determined by the table length.
For example, K = 456, K ^ 2 = 207936. If the hash table length is 100, take 79 (two digits in the middle) as the hash function value.
(4) Folding Method
The folding method divides keywords from left to right into several parts with equal digits (the last part of the digits can be short enough), then overlays and sums these parts, and returns the table length by hash, take the last few digits as the hash address. You can use the folding method when the number of keywords is large and the number distribution on each of the keywords is roughly even.
For example, if our keyword is 9876543210 and the hash table is three characters long, we can divide the table into four groups: 987 | 654 | 321 | 0, and then overlay and sum them: 987 + 654 + 321 + 0 = 1962. After the last three digits are obtained, the hash address is 962.
(5) except the remaining remainder
Select an appropriate positive integer p (p <= table length) and divide it by the keyword p. The remainder can be used as a hash address. That is, H (key) = key % p (p <= table length). The key to removing the remainder is to select the appropriate p, generally, p is a prime number that is less than or equal to the length (m) of the hash table.
For example, m = 8, p = 7.
M = 16, p = 13.
M = 32, p = 31.
(6) Random Number Method
Function formula: f (key) = random (key). Here, random is a random function. This method is suitable when the length of a keyword is not equal to the length of a keyword.
In short, the hash function rules are: through a certain conversion relationship, the keywords are evenly distributed to the specified size sequence structure. The more scattered the search, the less time complexity it takes and the more space complexity it takes. Hash Lookup is obviously an algorithm that uses space for time.
The above mentioned how to construct a hash function, so we have to mention how to avoid conflicting algorithms.
(1) Open addressing Method
When a conflict occurs, a probe sequence is formed in the hash table in some way. Search by unit in the probe sequence until an open address is found (that is, the address unit is empty. There are three different methods for forming a exploration sequence in a hash table:
1. Linear probing
Think of the hashed column as a ring table, and the test sequence is (assuming the table length is m ):
H (k), H (k) + 1, H (k) + 2 ..... m-1, 0, 1 ...... H (k)-1. When the linear probing method is used to solve the conflict, the formula for finding the next open address is: Hi = (H (k) + I) MOD m.
2. Secondary Detection Method
The detection sequence of the secondary Probing Method is 12,-12, 22,-22, and so on. When a conflict occurs, the formula for finding the next open address is:
H2i-1 = (H (k) + i2) MODm
H2i = (H (k)-i2) MODm (1 = Advantage: This reduces the possibility of stacking;
Disadvantage: it is not easy to detect hash tablespace.
3. Pseudo-Random Detection
When the random detection method is used to resolve the conflict, the formula for the next open address is: Hi = (H (k) + Ri) MODm.
Where R1, R2,..., the Rm-1 is a random arrangement.
(2) rehash
When a conflict occurs, use another function to calculate a new hash address until the conflict does not occur. Hi = RHi (key) I = 1, 2 ,..., K. RHi is a different hash function. The advantage is that clustering is not easy to generate, but the disadvantage is that computing time is increased.
(3) link address Method
Link all nodes whose keywords are synonyms to the same single-chain table. If the hash address generated by the selected hash function is 0 ~ M-1, the hash table can be defined as a pointer array consisting of m linked list header pointers. Advantage: No clustering is generated. Because the Node space is dynamically applied, it is more suitable for situations where the table length cannot be determined before the table creation; it is easy to delete nodes from the table.
(4) Public overflow zone law
Assume that the value range of the hash function is [0... m-1, set the vector HashTable [0... m-1 is the basic table, each component stores one record, and another vector OverTable [0 .. v] is an overflow table. All records whose keywords are synonymous with the keywords in the basic table, no matter what their hash addresses are obtained by the hash function, are filled in the overflow table in the event of a conflict.
The process of searching a hash table is basically the same as that of creating a table. Assuming that the given value is k, the hash address H (k) is calculated based on the hash function H set during table creation. If the space corresponding to the address in the table is not occupied. The search fails. Otherwise, the node in the address is compared with the given value k. If the value is equal, the search is successful. Otherwise, find the next address based on the Conflict Resolution method set during table creation, until an address space is not occupied (search failed) or the keywords are equal (search successful.
The Code is as follows:
/// Main. c // HashSearch /// Created by chenyufeng on 16/2/17. // Copyright©Chenyufengweb. all rights reserved. // # include "stdio. h "# include" stdlib. h "# define HASHSIZE 7 // defines the length of the hash list as an array # define NULLKEY-32768 typedef int Status; typedef struct {int * elem; // data element storage address, dynamically allocate an array int count; // number of current data elements} HashTable; // hash table length, global variable int m = 0; void InitHashTable (HashTable * hashTable ); status Hash (int key); void Insert (HashTable * hashTable, int key); Status Search (HashTable * hashTable, int k Ey); void DisplayHashTable (HashTable * hashTable); int main (int argc, const char * argv []) {int result; HashTable hashTable; int arr [HASHSIZE] = {13, 29, 27,28, 26,30, 38}; // initialize the hash table InitHashTable (& hashTable);/*** insert data into the hash table, that is, map the elements to the hash table using the hash function; */for (int I = 0; I <HASHSIZE; I ++) {Insert (& hashTable, arr [I]) ;}// the data has been saved to the hash table, print and observe the hash table. The position of the element is completely different from that of the original array. DisplayHashTable (& hashTable); // Search data result = Search (& ha ShTable, 30); if (result =-1) {printf ("not found! ");} Else {printf (" the position in the hash table is % d \ n ", result);} return 0 ;} // Initialize an empty hash table void InitHashTable (HashTable * hashTable) {m = HASHSIZE; hashTable-> elem = (int *) malloc (m * sizeof (int )); // apply for memory hashTable-> count = m; for (int I = 0; I <m; I ++) {hashTable-> elem [I] = NULLKEY ;}} // Hash function (except for residual values) Status Hash (int key) {return key % m;} // Insert void Insert (HashTable * hashTable, int key) {/*** calculate the hash address hashAddress based on each keyword; */int h AshAddress = Hash (key); // calculate the Hash address/*** conflict, indicating that the location already contains data */while (hashTable-> elem [hashAddress]! = NULLKEY) {// solve the conflict by using the open-address linear Probing Method hashAddress = (hashAddress + 1) % m ;}// insert value hashTable-> elem [hashAddress] = key ;} // Search for Status Search (HashTable * hashTable, int key) {// calculate the Hash address int hashAddress = Hash (key ); // conflicting while (hashTable-> elem [hashAddress]! = Key) {// solve the conflict by using the open-address linear Probing Method hashAddress = (hashAddress + 1) % m; if (hashTable-> elem [hashAddress] = NULLKEY | hashAddress = Hash (key) {return-1 ;}/// return hashAddress ;} // print the result void DisplayHashTable (HashTable * hashTable) {for (int I = 0; I
In C language programming, we often set some predefined constants, or the result status code of the function, as shown below:
#define TRUE 1#define FALSE 0#define OK 1#define ERROR 0#define INFEASIBLE -1#define OVERFLOW -2
Because there is no BOOL boolean data type in C language, the above pre-definition can simplify programming. We sometimes make the following predefines:
typedef int Status;
Status indicates the function return type, and its value is the function result Status code. I also used this predefine in the above Code.