A detailed explanation of hash algorithm principle

Last Update:2018-07-26 Source: Internet

Author: User

Tags hash numeric numeric value time limit

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Concept

A hash table is a structure that stores data in key-value (key-indexed), and we can find its corresponding value simply by entering the value to be looked up as key.

The idea of hashing is very simple, if all the keys are integers, then you can use a simple unordered array to implement: The key as an index, the value is its corresponding value, so that you can quickly access the value of any key. This is the case for a simple key, which we extend to a key that can handle more complex types.

There are two steps to using a hash lookup:

1. Use the hash function to convert the lookup key to an index of an array. In an ideal situation, different keys are converted to different index values, but in some cases we need to handle the case where multiple keys are hashed to the same index value. So the second step of the hash lookup is to handle the conflict

2. Handle the hash collision conflict. There are many ways to deal with hash collision collisions, and the zipper and linear detection methods are described later in this article.

A hash table is a classic example of a trade-off between time and space. If there is no memory limit, you can directly index the key as an array. Then all the lookup time complexity is O (1), and if there is no time limit, then we can use unordered arrays and order lookups, which requires very little memory. The hash table uses a modest amount of time and space to find a balance between these two extremes. You only need to adjust the hash function algorithm to make time and space choices.

In the hash table, there is a definite relationship between the position of the record in the table and its keywords . This allows us to know in advance the position of the keyword being looked up in the table, so that we can locate the record directly from the subscript. Bring the ASL closer to 0.

1) hash function is an image, that is, the set of keywords to map to an address collection, its settings are flexible, as long as the size of the address set does not exceed the allowable range;

2) because the hash function is a compressed image, in general, it is easy to create a "conflict" phenomenon, namely: Key1. =key2, while f (key1) = f (key2).

3). You can only minimize conflicts and not completely avoid conflicts, because the common keyword collection is larger and its elements include all possible keywords, while the elements of the address collection are only address values in the hash table

In constructing this special lookup table, you need to choose a hash function that is "good" (with as few conflicts as possible) , and you need to find a way to "handle conflicts" .

Two. Method of Hash constructor

1. Direct Addressing method:

the direct addressing method is to use the data element keyword K itself or its linear function as its hash address, namely: H (k) =k or H (k) =axk+b; (where a, b is a constant)

Example 1, there is a demographic table that records the number of people from 1 to 100 years old, where age is the keyword, and the hash function takes the keyword itself, as shown in figure (1):

Address	A1	A2	......	A99	A100
Age	1	2	......	99	100
Number	980	800	......	495	107

As you can see, when you need to find the number of people of a certain age, look for the appropriate item directly. If the number of elderly persons aged 99 is found, the 99th item can be read directly.

Address	A0	A1	......	A99	A100
Age	1980	1981	......	1999	2000
Number	980	800	......	495	107

If we are going to count the number of births of Gen Y, as shown in the table above, then our year of birth this keyword can be used as the year minus 1980来 as the address, when F (key) =key-1980

This hash function is simple, and does not conflict with different keywords, but it can be seen that this is a more special kind of hash function, in real life, the elements of the keyword is rarely continuous. The hash table generated by this method can cause a lot of wasted space, so this method is not adaptable. [2]↑

This method is only appropriate for: size of the address collection = = the size of the keyword collection, where A and B are constants.

2. Digital Analysis Method:

Assume that each keyword in the keyword collection consists of a number of s digits (U1, U2, ..., US), parses the whole of the keyword set, and extracts the evenly spaced bits or combinations of them as addresses.

The digital analysis method is a method to take the numeric bits with more uniform value in the data element keyword as the hash address. That is, when the number of bits of the keyword is very large, you can analyze the keywords, discard the uneven distribution of bits, as a hash value. It is only suitable for situations where all keyword values are known. By analyzing the distribution, the keyword value interval is transformed into a smaller keyword value interval.

Example 2, to construct a data element number n=80, hash length m=100 hash table. Without losing its generality, we have only given 8 of these keywords for analysis, and 8 keywords are as follows:

k1=61317602 k2=61326875 k3=62739628 k4=61343634

k5=62706815 k6=62774638 k7=61381262 k8=61394220

Analysis of the above 8 keywords, the key word from left to right of the 1th, 2, 3, 6 bit value comparison set, not as a hash address, the remaining 4th, 5, 7, 8 bit value is more uniform, you can choose the two bit as the hash address. Set to select the last two bits as the hash address, the hash addresses of these 8 keywords are: 2,75,28,34,15,38,62,20.

This method is suitable for: the frequency of the various numbers appearing on each of the keywords can be estimated beforehand.

3. Folding Method:

Divide the keywords into sections and then take their overlays and hash addresses. Two methods of overlay processing: Shift Overlay: Add a few parts of the split to the bottom, add the boundary overlay: fold back and forth from one end along the divider, and then add the lines together.

The so-called folding method is to divide the keyword into the same number of bits (the last part of the number can be different), and then take the superposition of these parts and (rounding up), this method is called the folding method. This method is suitable for a large number of keywords, and each of the keywords in the number distribution is roughly uniform.

The folding method is divided into two methods: displacement overlay and boundary superposition, which is the lowest bit of each part is aligned and then added, and the boundary overlay is folded back and forth from one end to the other and then added together.

Example 4, when the hash table length is 1000, the keyword key=110108331119891, the allowed address space is three-bit decimal number, then the two superposition situation as shown:

Displacement Overlay Boundary Overlay

8 9 1 8 9 1

1 1 9 9 1 1

3 3 1 3 3 1

1 0 8 8 0 1

+ 1 1 0 + 1 1 0

(1) 5 5 9 (3) 0 4 4

Figure (2) The hash address is obtained by the folding method

The hash address obtained with the shift overlay is 559, and the hash address given by the boundary overlay is 44. If the keyword is not a numeric value but a string, it can be converted first. The conversion method can be used Ascⅱ character or the order value of the character.

This method is suitable for: the number of digits of the keyword is particularly numerous.

4. The method of square take

This is a common method of constructing hash functions. This method takes the square of the keyword first, and then, depending on the size of the available space, the square number is the middle of the hash address.

hash function H (key) = "middle of key2" because the principle of this method is to enlarge the difference by taking the square, the middle of the square value and each bit of this number are related, then the hash value obtained by different keywords is not easy to conflict, resulting in a more uniform hash address.

Example 5, if you set a hash table length of 1000, you can have the middle three bits of the keyword squared value, as shown in the figure:

Key words	The square of the keyword	hash function value
1234	1522756	227
2143	4592449	924
4132	17073424	734
3214	10329796	297

The hash function of the square method is given below.

Square takes a median hash function, which is a 32-bit integer that sets the keyword value

The hash function returns the middle 10 bits of key * key

int Hash (int key)

{

Calculates the square of a key

Key * = key;

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A detailed explanation of hash algorithm principle

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A detailed explanation of hash algorithm principle

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support