Hash table Lookup C implementation (verbose comment)

Source: Internet
Author: User
Tags hash pow rand
"Definition":

1. Hash function: Establish a deterministic correspondence between the stored location of the record and its keyword ƒ, which corresponds to a unique storage location in each keyword and structure, which is called the ƒ (hash function ) (or hash function) of the corresponding relationship.

2. Conflict: The same hash address may be obtained for different keywords, i.e. Key1≠key2, while ƒ (key1) =ƒ (Key2)

, this phenomenon is called conflict (collision).

3. Hash table: Map a set of keywords to a finite contiguous set of addresses (intervals) based on the set hash function H (key) and the method of handling conflicts , and use the "image" of the keyword in the address set as the storage location for the record in the table, which is called Hashtable (hash table), a mapping process known as hash watchmaking or hashing, where the resulting storage location is called a hash address or hash address .

"Construction method":

Common ways to construct a hash function:

A hash function makes access to a data series more efficient, and the data elements are positioned more quickly through a hash function.

Different hashing functions are used in actual work depending on the situation, and the factors commonly considered are:

• Time required to calculate the hash function

• The length of the keyword

• Size of the hash table

• Distribution of keywords

· How often a record is found

1. Direct addressing method: Take a keyword or a keyword of a linear function value is a hash address. That is, H (key) =key or H (key) = a key + B, where A and B are constants (this hash function is called its own function). If there is already a value in H (key), go to the next one, until there is no value in H (key), put it in.

2. Digital Analysis: Analysis of a set of data, such as the date of birth of a group of employees, we found that the number of days before the birth of the first few numbers are roughly the same, so that the probability of conflict will be very large, but we found that the number of days after the month and the date of the numbers vary greatly, If you use the following numbers to form a hash address, the odds of the conflict will be significantly reduced. Therefore, the digital analysis method is to find out the laws of numbers, as far as possible to use this data to construct a low probability of conflict hash address.

3. The square takes the middle method: takes the key word square after several as the hash address.

4. Folding method: The keyword is divided into several parts of the same number of bits, the last part of the number can be different, and then take these parts of the overlay and (remove carry) as the hash address. Digital superposition can have two methods: shift superposition and boundary superposition. The shift overlay aligns the lowest bits of each part of the split, then adds, and the bounding overlay is folded back and forth from one end to the other, and then the addition is aligned.

5. Random number method: Select a random function, take the random value of the keyword as a hash address, usually used for different keyword lengths.

6. In addition to the remainder method: Take the keyword is not greater than the hash table length m of the number of p after the remainder is a hash address. That is, H (key) = key MOD p,p<=m. Not only can the keyword directly modulo, but also in the collapse, the square to take the medium operation after the modulo. The choice of P is very important, generally take prime or m, if p is not good, easy to produce synonyms.

Ways to handle Conflicts:

1. Open addressing Method: Hi= (key) +di MOD m,i=1,2,...,k (k<=m-1), where H (key) is a hash function, M is a hash table length, di is an incremental sequence, the following three methods can be used:

1.1. Di=1,2,3,...,m-1, called linear detection re-hash;

1.2. di=1^2,-1^2,2^2,-2^2,⑶^2,...,± (k) ^2, (K<=M/2) called Two-time detection and re-hashing;

1.3. di= pseudo random number sequence, called pseudo-random detection re-hash.

2. Re-hashing: Hi=rhi (key), i=1,2,...,k RHi are different hash functions, that is, when a synonym generates an address conflict, computes another hash function address, until the conflict no longer occurs, this method is not easy to generate "aggregation", but increase the calculation time.

3. Chain Address Method (Zipper method)

4. Create a public overflow area

"Performance analysis for lookups"

The lookup process for a hash table is basically the same as the watchmaking process. Some key codes can be found directly through the address of the hash function transformation, and some key codes have conflicts on the address of the hash function and need to be searched by the method of dealing with conflicts. In the three methods described for dealing with conflicts, post-conflict lookups are still the process of comparing a given value to a key code. Therefore, the measurement of the efficiency of the hash table is still measured by the average lookup length.

In the process of searching, the number of key code comparisons depends on how many conflicts are generated, the conflict is less, the search efficiency is high, the conflict is more, and the search efficiency is low. Therefore, the factors that affect the number of conflicts, that is, the factors that affect the search efficiency. There are three factors that affect the number of conflicts:

1. The hash function is uniform;

2. Methods of dealing with conflicts;

3. Reload factor for the hash table.

The reload factor for the hash list is defined as: α= the number of elements in the table/length of the hash list

α is the marker factor for the full extent of the hash table. Since the length of the table is fixed, α is proportional to the number of elements in the table, so the larger the alpha, the more elements are filled in the table, the more likely the conflict will be, and the smaller the alpha, the less likely it will be to have a conflict.

In fact, the average lookup length of a hash table is a function of filling factor α, but different methods of dealing with conflicts have different functions.

"C Code Implementation"

/***************************elemtype.h***************************/
#ifndef elemtype_h//header file Protector
#define Elemtype_h

#define Null_key 0//0 for no record flag
typedef int KEYTYPE;//keyword type
typedef struct ELEMTYPE
{// Data element (record) type
	KeyType key;
	int order;
} Elemtype;

The comparison convention for two numeric keywords is the following macro definition.
#define EQ (A, a) ((a) = = (b))
#define LT (A, a) ((a) < (b))
#define LE (A, a) ((a) <= (b))

//Basic operating function sound Ming
void Visit (int p,elemtype R);

#endif

/***************************elemtype.cpp***************************/
#include "ElemType.h"
#include < stdio.h>
void Visit (int p,elemtype r)
{
	printf ("address =%d (%d,%d) \ n", P,r.key,r.order);
}


/***************************hashtable.h***************************//* The method used in this algorithm to construct a hash table is: 1. Method for constructing a hash function: "The remainder method" H (key = key MOD P (p≤m), where M is the table length, this algorithm takes p=m; (in general, optional p is prime or does not contain less than 20 of the mass factor of composite) 2. Methods of dealing with conflicts: "Open addressing Method" Hi = (H (key) + di) MOD m, i=1,2,3 ,..., K (k <= m-1), wherein di is an incremental sequence, can have the following three kinds of extraction: (1) linear exploration re-hash: di = three-way,..., m-1 (2) Two explorations re-hash: di = 1,-1^2,2,-2^2,..., ±k^2 (K & Lt;= m-1) (3) pseudo-random exploration re-hash: di = pseudo-random number sequence */#ifndef hashtable_h//header File Protector #define HASHTABLE_H #include "ElemType.h" Static I
NT Hashsize[] = {11,19,29,37};//internal link static variable, hashtable capacity (table length m) increment table, an appropriate prime sequence. typedef struct HASHTABLE {//hash table storage structure Elemtype *elem;//Record store base address variable, dynamically allocate array int count;//current record number int Sizeindex;//hashsize[sizei

Ndex] for the current table length}hashtable; Macro defines typedef int STATUS; Status is the type of the function, whose value is the function result status code, such as the bottom three macro definitions #define SUCCESS 1//Find success #define Unsuccess 0//find failed #define DUPLICATE-1//record duplicate//basic operation

function declaration void inithashtable (HashTable &h);

void Destroyhashtable (HashTable &h);

Unsigned Hash (KeyType k);

int d (int i); void Collision (KeyType &k,int &p,int i);

Status Searchhash (HashTable h,keytype k,int &p,int &c);

Status Inserthash (HashTable &h,elemtype e);

void Recreatehashtable (HashTable &h);

void Traversehash (HashTable h,void (*visit) (Int,elemtype));
 #endif


 

/***************************hashtable.cpp***************************/#include <stdio.h>//null, etc. #include <stdlib.h>//exit (), rand () #include <malloc.h>//malloc #include <math.h>//overflow, pow () #include "
	HashTable.h "static int m;//internal link static variable, M is hash table table long void inithashtable (HashTable &h) {//Initialize hash table, construct a hash table with empty log int i; H.sizeindex = 0;//initialization Table Long array index is 0 m = hashsize[h.sizeindex];//initialization table length is hashsize[0] H.count = 0;//initialization record number is 0 H.elem = (elemtype * ) malloc (M * sizeof (ELEMTYPE));//dynamic allocation record array if (!
	H.elem) {//Allocation failed exit (OVERFLOW); } for (i = 0; i < m; ++i) {//Initialize record array keyword is empty h.elem[i].key = null_key;//non-filled record flag}} void Destroyhashtable (HashTable &AMP;H) {//Destroy hash Table free (h.elem);//release dynamic record array H.elem = null;//pointer is set to NULL H.count = 0;//record number is set to 0 h.sizeindex = 0;//Table long index entry 0} uns igned Hash (KeyType k) {//Returns the hash address evaluated by the hash function (a simple hash function constructed with the addition of the remainder method) return K% m;//H (key) = key MOD P (p≤m), take p=m} int d (int i ) {//increment sequence function, where I is the number of collisions.	Choose one of 3 methods as needed, and the other two comment out return i;//linear exploration hash: Di = three-way,..., m-1//Return ((i + 1)/2) * ((i + 1)/2) * (int) POW (double ( -1), i-1);//Two quest hash: di = 1,-1^2,2,-2^2,..., ±k^2 (k <= m-1)//Retu
	RN rand ();//pseudo-random exploration re-hash: di = pseudo-random number sequence} void Collision (KeyType &k,int &p,int i) {//Use open addressing to handle conflicts where p is the resulting hash address, and I is the number of collisions.
p = (Hash (k) + D (i))% m; } Status Searchhash (HashTable h,keytype k,int &p,int &c) {//Find the record with the keyword K in the Open address hash table, if the lookup succeeds, indicate the position of the record in the table with P, and return the success ;//Otherwise, p indicates the insertion position and returns unsuccess. C to count the number of collisions, the initial value is 0, for the table Insert reference P = hash (k);//hash function is used to calculate the hash address while (h.elem[p].key! = Null_key &&!
		EQ (h.elem[p].key,k)) {//The location is filled with records and is not equal to the unknown Origin record C + +;
		if (c < m) {//handle conflicts not exceeding m-1, continue to handle conflict collision (K,P,C);
		} else {//exceeds the maximum number of processing times, the H median finds a record break;
	}} if (EQ (h.elem[p].key,k)) {//Find successful return SUCCESS;
	} else {//lookup failed return unsuccess;
}} void Recreatehashtable (HashTable &);//Declaration of Function recreatehashtable () Status Inserthash (HashTable &h,elemtype e)
	{int p,c = 0;//The number of collisions is initially 0 if (Searchhash (h,e.key,p,c)) {//Find successful return duplicate;//h there are already records with the same keyword as E, do not insert} else if (C < HASHSIZE[H.SIZEINDEX]/2) {//not found, collision number C is not up to the upper limit (the threshold of C is adjustable, but the maximum is not more than Hashsize[h.sizeindex]-1) h.elem[p] = e;//
		Insert the data element e ++h.count in H; Return success;//Insert Success} else {//not found, but conflict number C has reached upper limit recreatehashtable (H);//Rebuild Hash Table return unsuccess;//insert unsuccessful}} void Re Createhashtable (HashTable &h) {int i,count = h.count;//h have record number elemtype *p,*elem = (Elemtype *) malloc (Count *sizeo
	F (elemtype));//dynamically generate a spatial p =elem that stores the original data of the hash table H; for (i = 0; i < m; ++i) {//To save all existing records to Elem if (! EQ (H.elem[i].key,null_key)) {//h has a record in the cell *p++ = h.elem[i];//records the record to Elem}} h.count = 0;//Set the original record number to 0, which is called below Inserhash Prepare h.sizeindex++;//table Long array index plus 1 m = hashsize[h.sizeindex];//new storage capacity (table length) H.elem = (Elemtype *) realloc (h.elem,m*sizeof (elemt YPE);//Regenerate empty hash table with new storage capacity H for (i = 0; i < m; ++i) {//Initialize new hash Table H.elem[i].key = null_key;//not filled record} for (P = elem; P & Lt Elem + count;
	++P) {//To re-store the records in the original table into a new hash table Inserthash (h,*p); } free (elem);//release elem storage space} void Traversehash (HashTable h,void (*visit) (iNt,elemtype)) {//Traverse hash Table H int i in the order of hash addresses;
	printf ("Hash address 0 ~%d\n", m-1); for (i = 0; i < m; ++i) {//For the entire hash table H if (!

 EQ (H.elem[i].key,null_key)) {//h has a record Visit (I,h.elem[i]) in unit I,//access to the I data}}}


What's in Records.txt

17 1
60 2
29 3
38 4
1 5
2 6
3 7
4 8
60 9
13 10

/***************************main.cpp***************************/#include <stdio.h> #include "HashTable.h" #
	Define N 15//Array can hold the number of records int main (void) {Elemtype r[n];//record array HashTable h;
	int i,n = 0,p = 0;
	Status s;
	KeyType K; File *f;//pointer type F = fopen ("Records.txt", "R");//Open record file Record.txt do {i = fscanf (f, "%d%d", &r[p].key,&r[p].ord
		ER);//record is pre-deposited into the record array r[p] if (i! =-1) {//input data successfully p++;
	}} while (!feof (f) && P < N);//The end of the data file is not reached and the record array is not full fclose (f);//Close Data file inithashtable (h);
		for (i = 0; i < p-1; ++i) {//before inserting in H P-1 Records (the last record is not inserted, the last record is inserted to rebuild the hash table) s = Inserthash (H,r[i]);
		if (DUPLICATE = = s) {printf ("a record with the keyword%d already in the table cannot be inserted again (%d,%d) \ n", R[i].key,r[i].key,r[i].order);
	}} printf ("Traverse hash table in the order of hash addresses: \ n");
	Traversehash (H,visit);
	printf ("Enter the keyword for the record you want to find:");
	scanf ("%d", &k);
	s = Searchhash (h,k,p,n);
	if (SUCCESS = = s) {//Find success Visit (P,h.elem[p]);//output This record} else {//lookup failed printf ("\ n" not found); } s = Inserthash (H,r[i]);//Insert last record (need to rebuild hash table) if (Unsuccess = = s) {//insert unsuccessful S = Inserthash (H,r[i]);//recreate hash table and reinsert} printf ("Traverse the rebuilt hash table in the order of hash table addresses: \ n");

	Traversehash (H,visit);
	printf ("Enter the keyword for the record you want to find:");
	scanf ("%d", &k);
	s = Searchhash (h,k,p,n);
	if (SUCCESS = = s) {//Find success Visit (P,h.elem[p]);//output This record} else {//lookup failed printf ("\ n" not found);
	} destroyhashtable (h);
return 0;
/******************************* running Results *******************************//*/table already has a record of keyword 60 and cannot insert record (60,9)  Traverse the hash table in the order of the hash address: hash address 0 ~ Address = 1 (1,5) address = 2 (2,6) address = 3 (3,7) address = 4 (4,8) address = 5 (60,2) address =  6 (17,1) address = 7 (29,3) address = 8 (38,4) Enter the keyword for the record you want to find: 13 no traversal of the rebuilt hash table in the order of the hash table addresses was found: hash address 0 ~ address = 0 (38,4) address = 1 (1,5) address = 2 (2,6) address = 3 (3,7) address = 4 (4,8) address = 5 (60,2) address = ten (29,3) address = (13,10) addr ESS = 17 (17,1) Enter the keyword for the record you want to find: Address = 13 (13,10) Press any key to continue ... * *


 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.