Big liar data Structure-hash table lookup (hash table)

Source: Internet
Author: User

First, the basic concept

Hashing Technology : Establish a definite correspondence between the record's storage location and its keywords F, so that each keyword key corresponds to a storage location F (key).

F: hash function/hash function ;
A hash technique is used to store records in a contiguous storage space called a hash table or hash table .
The record storage location for the keyword is called a hash address .

Hashing technology is both a storage method and a search method.
Hashing techniques are suitable for solving problems where a record is found equal to a given value. Find Fast.
Hashing techniques are not suitable for range lookups, are not suitable for finding records of the same keywords, and are not suitable for sorting, maximum values of records.

conflict : The keyword key1 is not equal to Key2, but F (key1) =f (Key2).
Refer to Key1 and Key2 as synonyms of the hash function.

Two, hash function constructs

Two principles:

    1. Simple calculation
    2. Uniform Hash Address distribution
1. Directly addressing the hair

F (key) =a x Key+b (A, B is constant)

Simple, uniform, no conflict, but prior knowledge of the distribution of keywords, suitable for small and continuous lookup table.

2. Digital Analysis method

The number of keywords, such as mobile phone number, may be the same as before, just a few different, extract a portion of the keyword to calculate the hash storage location. The keyword distribution is known beforehand and several bits are evenly distributed.

3. The method of square take

The keyword distribution is not known, and the number of bits is not very large. 1234, Square 1522756, extract the middle 227 as the hash address.

4. Folding method

Do not know the keyword distribution, the number of bits.
From left to right, the parts are divided into equal numbers, and the portions are superimposed and summed, and the hash table is long, and then several are hashed addresses.

5. Residual remainder method

Hash table Length M

F (key) =key mod p (p<=m)

P selection is not good, resulting in conflict.
Usually p is the smallest prime number of <=m (preferably close to m) or a composite that does not contain less than 20 qualitative factors.

6. Random number method

The length of the keyword varies.

F (key) =random (key), Random stochastic function

When a keyword is a string, it is converted to some sort of number, such as ASCLL code or Unicode code.

Three, hash conflict processing 1. Open addressing Method

Also called the linear detection method: Once the conflict, look for the next empty hash address. Hash list is large.

Optimization: Two-time detection method

Two-way search, to prevent the back is not empty, but the front is empty.
Increase the square to not allow the keyword to accumulate in a certain area.

There is also the calculation of the displacement D stochastic function, called the random detection method.

2. Re-hash function method


RHI different hash functions, randomly use the addition of the left, collapse, square, each conflict with a hash function.

3. Link Address method

Stores all keywords as synonyms in a single-linked list (synonym table).
Only the head pointers of all synonym tables are stored in the hash list.
{12,67,56,16,25,37,22,29,15,47,48,34} mod 12

Cons: Single-linked list lookup traversal time-consuming.

4. Public overflow Zone law

Conflict keywords stored in overflow table

After the hash calculation, the base table is compared first. Wait for the overflow table to be searched sequentially.

Four, hash list find 1. Additional source code
#include "stdio.h"#include "stdlib.h"#include "io.h"#include "math.h"#include "time.h"#define OK 1#define ERROR 0#define TRUE 1#define FALSE 0#define MAXSIZE 100/* Storage space Initial allocation * /#define SUCCESS 1#define UNSUCCESS 0#define HASHSIZE 12/* Defines the length of the hash list as an array * /#define NULLKEY-32768typedef intStatus;/ * Status is the type of function, whose value is the function result status code, such as OK, etc. * / typedef struct{int*elem;/ * Data element stores base address, dynamically allocated array * /   intCount/ * Number of current data elements * /}hashtable;intm=0;/ * Hash table length, global variable * // * Initialize Hash list * /Status inithashtable (HashTable *h) {intI    M=hashsize;    h->count=m; H->elem= (int*)malloc(m*sizeof(int)); for(i=0; i<m;i++) h->elem[i]=nullkey;returnOK;}/ * Hash function * /intHash (intKey) {returnKey% M;/ * Save Remainder method * /}/ * Insert keyword into hash list * /voidInserthash (HashTable *h,intKey) {intaddr = Hash (key);/ * Hash address * /     while(H-&GT;ELEM[ADDR]! = Nullkey)/ * If not empty, the conflict * /{addr = (addr+1)% m;/ * Linear detection of open addressing method * /} h->elem[addr] = key;/ * Insert the keyword * * until there is a vacancy * /}/ * Hash list Find keywords * /Status Searchhash (HashTable H,intKeyint*ADDR) {*addr = Hash (key);/ * Hash address * /     while(H.ELEM[*ADDR]! = key)/ * If not empty, the conflict * /{*addr = (*addr+1)% m;/ * Linear detection of open addressing method * /        if(H.elem[*addr] = = Nullkey | | *addr = = Hash (key))/ * If the loop returns to the original point * /            returnunsuccess;/ * Indicates that the keyword does not exist * /}returnSUCCESS;}intMain () {intarr[hashsize]={ A, the, About, -, -,Panax Notoginseng, A, in, the, -, -, the};intI,p,key,result;    HashTable H; key= the; InitHashTable (&AMP;H); for(i=0; i<m;i++) Inserthash (&h,arr[i]); Result=searchhash (H,KEY,&AMP;P);if(Result)printf("Look for%d address:%d \ n", key,p);Else        printf("failed to find%d. \ n ", key); for(i=0; i<m;i++) {key=arr[i]; Searchhash (H,KEY,&AMP;P);printf("Look for%d address:%d \ n", key,p); }return 0;}
2. Find Performance

If there is no conflict, O (1).
The average length of the lookup depends on:

    • Whether the hash function is uniform
    • Ways to handle conflicts
    • Reload factor for hash table
      Reload factor = number of records in the table/hash list length. (Indicates the extent of the hash list being filled)
      The more records are filled in the table, the larger the filling factor, the greater the likelihood of conflict.

Typically, the hash table space is set larger than the lookup collection, sacrificing space for time.

Big liar data Structure-hash table lookup (hash table)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.