Data structure Hash table, hash function and conflict resolution

Last Update:2016-05-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1.Hash table

A hash table (hash table, also known as a hash list) is a data structure that is accessed directly from key. That is, it accesses the record by mapping the key to a location in the table to speed up the lookup. This mapping function is called a hash function, and the array that holds the record is called the hash table.

The value of the function is computed by the hash function h (k) as the argument K of each element in the data, and the function value is used as the cell address of a contiguous storage space to store the element in the corresponding cell of the function value.

The hash table stores the key-value pairs, and the time complexity of the lookup is irrelevant to the number of elements, and the hash table locates the elements by calculating the hash code value to locate the element's position and thus directly accessing the element, so the time complexity of the hash table lookup is O (1).

2. Hash table construction Method 2.1 Direct addressing method

The value of a linear function of the keyword or keyword as a hash address, i.e.
　　
H (key) =key or H (key) =a*key+b (A, B is an integer)
　　
This hash function is also called its own function. If the hash address of H (key) already has a value, then go to the next position, until you find the position of H (key) without value, put the element in.
This method is only appropriate for: the size of the address collection equals the size of the keyword collection

2.2 Digital Analysis method

Analyzing a set of data, such as the birth date of a group of employees, we find that the first few digits of the birth date are generally the same, so the probability of a conflict is very large, but we find that the number of months and days of the month and day is very different, if you use the following numbers to construct the hash address, The odds of the conflict are significantly reduced.
Therefore, the digital analysis method is to find out the laws of numbers , as far as possible to use this data to construct a low probability of conflict hash address.
This method is suitable for: the frequency of the various numbers appearing on each of the keywords can be estimated beforehand.

2.3 Square Take the middle method

The middle of the square value of the keyword as the storage address (hash address). The purpose of the "square value of the keyword" is to "widen the difference", while the middle of the square value can be affected by the whole of the keyword.

This method is suitable for: every bit in the keyword has a high frequency of repetition of certain numbers.

2.4 Folding Method

Divide the keywords into sections and then take their overlays and hash addresses. Two methods of superposition: Shift Overlay: Add the low-level alignment of the segmented parts, and overlap the boundaries: fold back and forth from one end to the next, and then add and then align.
　　
This method is suitable for: the number of digits of the keyword is particularly numerous .

2.5 Random Number method

Set the hash function to: H (key) = random (key) where random is a pseudo-random function
This method is suitable for constructing hash functions for keywords of unequal length .

2.6 Residual remainder method

The remainder is a hash address when the keyword is removed by a number p that is not larger than the hash table length m.
The hash function is: H (key) = key MOD P (p≤m), where M is the table length and p is a prime number less than M.

3. Hash table conflict resolution method

Hash table processing conflicts mainly include open addressing method , re-hashing method , chain address method (Zipper method) and the establishment of a public overflow zone four methods.
By constructing a well-performing hash function, you can reduce conflicts, but it is generally not possible to avoid conflicts altogether, so resolving conflicts is another key issue in hashing.
The actual meaning of "handling conflicts" is to look for the next hash address for the keyword that generated the conflict.

3.1 Open Addressing method

　　
In the event of a conflict, search for the next empty hash address, as long as the hash table is large enough, the empty hash address is always found and the record is deposited.

3.1.1 Linear detection

When a conflict occurs, the next cell in the table is viewed sequentially until an empty cell is found or a full table is searched.

Formula:
　　

fi(key) = (f(key)+di) MOD m (di=1,2,3,......,m-1)

3.1.22-Time detection method

When a conflict occurs, a jumping probe is made to the left and right of the table to find a possible empty position in both directions.

Formula:

fi(key) = (f(key)+di) MOD m (di = 12, -12, 22, -22,……, q2, -q2, q <= m/2)

3.1.3 Random Detection method

In the case of conflict, the displacement di is calculated by random function, which we call random detection method.

Formula:
　

fi(key) = (f(key)+di) MOD m (di是一个随机数列)

Linear detection and re-hashing is prone to "two aggregation", that is, when dealing with synonyms conflicts, it leads to non-synonym conflicts.
The advantage of a linear probing re-hash is that, as long as the hash table is dissatisfied, it is possible to find a hash address that does not conflict, while the two-probe hash and pseudo-random probing re-hash are not necessarily.

3.2 Chain Address method

All records with the same hash address are linked in the same linked list. The node space on each list is applied dynamically, so it is more suitable for the case that the table length cannot be determined before watchmaking.
The processing conflict is simple, and no accumulation phenomenon, that is, non-synonym will never conflict, so the average search length is short;

3.3 Re-hash method

This method constructs several different hash functions at the same time:

Hi=RH1（key），i=1，2,3，…,n.

When the hash address HI=RH1 (key) conflicts, calculate HI=RH2 (key) ... until the conflict no longer occurs. This method is not easy to generate aggregation, but increases the computational time.

3.4 Creating a public overflow zone

The basic idea of this method is that the hash table is divided into the basic table and the overflow table two parts, and the elements that conflict with the basic table are filled in overflow table. (Note: In this method, the elements are separated by two tables to store)

Data structure Hash table, hash function and conflict resolution

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Data structure Hash table, hash function and conflict resolution

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Data structure Hash table, hash function and conflict resolution

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support