First, the basic concept
Hashing Technology : Establish a definite correspondence between the record's storage location and its keywords F, so that each keyword key corresponds to a storage location F (key).
F: hash function/hash function ;
A hash technique is used to store records in a contiguous storage space called a hash table or hash table .
The record storage location for the keyword is called a hash address .
Hashing technology is both a storage method and a search method.
Hashing techniques are suitable for solving problems where a record is found equal to a given value. Find Fast.
Hashing techniques are not suitable for range lookups, are not suitable for finding records of the same keywords, and are not suitable for sorting, maximum values of records.
conflict : The keyword key1 is not equal to Key2, but F (key1) =f (Key2).
Refer to Key1 and Key2 as synonyms of the hash function.
Two, hash function constructs
Two principles:
- Simple calculation
- Uniform Hash Address distribution
1. Directly addressing the hair
F (key) =a x Key+b (A, B is constant)
Simple, uniform, no conflict, but prior knowledge of the distribution of keywords, suitable for small and continuous lookup table.
2. Digital Analysis method
The number of keywords, such as mobile phone number, may be the same as before, just a few different, extract a portion of the keyword to calculate the hash storage location. The keyword distribution is known beforehand and several bits are evenly distributed.
3. The method of square take
The keyword distribution is not known, and the number of bits is not very large. 1234, Square 1522756, extract the middle 227 as the hash address.
4. Folding method
Do not know the keyword distribution, the number of bits.
From left to right, the parts are divided into equal numbers, and the portions are superimposed and summed, and the hash table is long, and then several are hashed addresses.
5. Residual remainder method
Hash table Length M
F (key) =key mod p (p<=m)
P selection is not good, resulting in conflict.
Usually p is the smallest prime number of <=m (preferably close to m) or a composite that does not contain less than 20 qualitative factors.
6. Random number method
The length of the keyword varies.
F (key) =random (key), Random stochastic function
When a keyword is a string, it is converted to some sort of number, such as ASCLL code or Unicode code.
Three, hash conflict processing 1. Open addressing Method
Also called the linear detection method: Once the conflict, look for the next empty hash address. Hash list is large.
Optimization: Two-time detection method
Two-way search, to prevent the back is not empty, but the front is empty.
Increase the square to not allow the keyword to accumulate in a certain area.
There is also the calculation of the displacement D stochastic function, called the random detection method.
2. Re-hash function method
RHI different hash functions, randomly use the addition of the left, collapse, square, each conflict with a hash function.
3. Link Address method
Stores all keywords as synonyms in a single-linked list (synonym table).
Only the head pointers of all synonym tables are stored in the hash list.
{12,67,56,16,25,37,22,29,15,47,48,34} mod 12
Cons: Single-linked list lookup traversal time-consuming.
4. Public overflow Zone law
Conflict keywords stored in overflow table
After the hash calculation, the base table is compared first. Wait for the overflow table to be searched sequentially.
Four, hash list find 1. Additional source code
#include "stdio.h"#include "stdlib.h"#include "io.h"#include "math.h"#include "time.h"#define OK 1#define ERROR 0#define TRUE 1#define FALSE 0#define MAXSIZE 100/* Storage space Initial allocation * /#define SUCCESS 1#define UNSUCCESS 0#define HASHSIZE 12/* Defines the length of the hash list as an array * /#define NULLKEY-32768typedef intStatus;/ * Status is the type of function, whose value is the function result status code, such as OK, etc. * / typedef struct{int*elem;/ * Data element stores base address, dynamically allocated array * / intCount/ * Number of current data elements * /}hashtable;intm=0;/ * Hash table length, global variable * // * Initialize Hash list * /Status inithashtable (HashTable *h) {intI M=hashsize; h->count=m; H->elem= (int*)malloc(m*sizeof(int)); for(i=0; i<m;i++) h->elem[i]=nullkey;returnOK;}/ * Hash function * /intHash (intKey) {returnKey% M;/ * Save Remainder method * /}/ * Insert keyword into hash list * /voidInserthash (HashTable *h,intKey) {intaddr = Hash (key);/ * Hash address * / while(H->ELEM[ADDR]! = Nullkey)/ * If not empty, the conflict * /{addr = (addr+1)% m;/ * Linear detection of open addressing method * /} h->elem[addr] = key;/ * Insert the keyword * * until there is a vacancy * /}/ * Hash list Find keywords * /Status Searchhash (HashTable H,intKeyint*ADDR) {*addr = Hash (key);/ * Hash address * / while(H.ELEM[*ADDR]! = key)/ * If not empty, the conflict * /{*addr = (*addr+1)% m;/ * Linear detection of open addressing method * / if(H.elem[*addr] = = Nullkey | | *addr = = Hash (key))/ * If the loop returns to the original point * / returnunsuccess;/ * Indicates that the keyword does not exist * /}returnSUCCESS;}intMain () {intarr[hashsize]={ A, the, About, -, -,Panax Notoginseng, A, in, the, -, -, the};intI,p,key,result; HashTable H; key= the; InitHashTable (&H); for(i=0; i<m;i++) Inserthash (&h,arr[i]); Result=searchhash (H,KEY,&P);if(Result)printf("Look for%d address:%d \ n", key,p);Else printf("failed to find%d. \ n ", key); for(i=0; i<m;i++) {key=arr[i]; Searchhash (H,KEY,&P);printf("Look for%d address:%d \ n", key,p); }return 0;}
2. Find Performance
If there is no conflict, O (1).
The average length of the lookup depends on:
- Whether the hash function is uniform
- Ways to handle conflicts
- Reload factor for hash table
Reload factor = number of records in the table/hash list length. (Indicates the extent of the hash list being filled)
The more records are filled in the table, the larger the filling factor, the greater the likelihood of conflict.
Typically, the hash table space is set larger than the lookup collection, sacrificing space for time.
Big liar data Structure-hash table lookup (hash table)