Integer Hash Function

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Integer Hash Functions

There are three common methods: Direct remainder method, Product Integer method, and square method. The three methods are discussed below. The following assumes that our keyword is that the capacity of the hash table is, And the hash function is
.

1. Direct remainder Extraction

We use the keyword divided
To take the remainder as the position in the hash table. Function expressions can be written as follows:

For example, table capacity and key value
. The advantage of this method is that it is easy to implement and fast, and is a very common method. However, if the selection is not good, but the specimen is quite special, it is easy for the data to be hashed in the hash to affect the efficiency.

In terms of experience, we generally choose a prime number that is not very close to each other. If the value range of the keyword is small, we generally choose 1.1 ~ Within 1.6 times. For example, if the value range is, it is a good choice. During the competition, you can write a prime number generator or simply write a "compare the number of workers" by yourself.

I inserted a hash table with a capacity of 4000 with a quota of 701. The result is:

Test Data	Random Data	Continuous Data
Minimum unit capacity:	0	5
Maximum unit capacity:	15	6
Expected capacity:	5.70613	5.70613
Standard Deviation:	2.4165	0.455531

It can be seen that for random data, the maximum unit capacity of the remainder method has nearly three times the expected capacity. Tested on my machine (Pentium III 866 MHz, 128 mb ram), the function runs for about 39ns, that is, about 35 clock cycles.

2. Product Integer Method

We use the keyword multiplied by
The real number (preferably the irrational number) in the hash table to obtain a real number between them. Take the fractional part of the number, multiply it, and then take the integer part, that is, the position in the hash table. Function expressions can be written as follows:

The decimal part, that is. For example, the table capacity, seed (a good choice), and key value.

Insert a hash table () with a capacity of 4000 with a number of 701. The result is as follows:

Test Data	Random Data	Continuous Data
Minimum unit capacity:	0	4
Maximum unit capacity:	15	7
Expected capacity:	5.70613	5.70613
Standard Deviation:	2.5069	0.619999

From the formula, we can see that this method is very small, and the method is very good when the value is not suitable for the direct remainder method. However, from the test above, the performance is not very satisfactory, and the running speed is slow due to the large number of floating point operations. After repeated optimization, we still need 892ns on our machine to complete a computation, that is, 810 clock cycles, 23 times the direct remainder method.

3. China and France

We take the square of the keyword and take the intermediate bit as the hash function value to return. Because each digit is squareIntermediateSeveral digits have an impact, so the effect of this method is also good. However, it is not ideal for smaller values, and it is complicated to implement. To make full use of the space of the hash table, it is best to take the integer power of 2. For example, the table capacity and key value.

Insert a hash table with a capacity of 4000 to 512 (note that 701 is not used here to use the space of the hash table). The result is as follows:

Test Data	Random Data	Continuous Data
Minimum unit capacity:	0	1
Maximum unit capacity:	17	17
Expected capacity:	7.8125	7.8125
Standard Deviation:	2.95804	2.64501

The effect is worse than we think, especially for continuous data. However, since only multiplication and bitwise operations are supported, this function is the fastest. On my machine, an operation only requires 23 NS, that is, 19 clock cycles, which is faster than the direct remainder method.

Compare the three methods:

	Implementation difficulty	Actual Effect	Running Speed	Other applications
Direct remainder Method	Ease	Good	Fast	String
Product Integer Method	Ease of use	Better	Slow	Floating Point Number
China and France	Medium	Better	Fast	None

From this table, we can easily see that the cost-effectiveness of the direct remainder method is the highest, so it is also the most used method in our competition.

For real-number hash functions, we can directly use the product to take an integer. For Hash Functions Whose specimens are other types of data, we can first convert them into integers, then insert it into the hash table. Next we will study how to convert other types of data into integers.

Ii. String Hash Functions

The string itself can be regarded as a large integer in decimal form (ANSI string is decimal form). Therefore, we can use the direct remainder method to directly calculate the hash function value in linear time. To ensure the effect, we still cannot select a number that is too close to each other. Especially when we regard a string as an hexadecimal number, if this parameter is selected, the hash function values of any sort of the string are the same. (Think About It, why ?)

Common string hash functions, such as elfhash and aphash, are simple and effective methods. These functions use bitwise operations to make every character affect the final function value. There are also Hash Functions Represented by MD5 and sha1, which are almost impossible to find a collision (MD5 was cracked some time ago ).

I randomly selected 1000 from one of Mark Twain's novels.DifferentAnd 1000 wordsDifferentAs the test data of short and long strings, then use different hash functions to convert them into integers, and then insert a hash table with a capacity of 1237 using the direct remainder method, in case of a conflict, overwrite the old string with the new string. By observing the number of "remaining" strings, We can roughly obtain the actual effects of different hash functions.

	Short string	Long String	Average	Encoding difficulty
Get the remainder directly	667	676	671.5	Ease
P. J. Weinberger hash	683	676	679.5	Hard
Elf hash	683	676	679.5	Hard
Sdbm hash	694	680	687.0	Ease
Bkdr hash	665	710	687.5	Ease of use
Djb hash	694	683	688.5	Ease of use
AP hash	684	698	691.0	Hard
RS hash	691	693	692.0	Hard
JS hash	684	708	696.0	Ease of use

Insert 1000 random numbers into a hash table with a capacity of 1237 using the direct remainder method. The number of covered units also reaches 694. It can be seen that the following methods have reached the limit, randomness is excellent. However, it is difficult to choose because there is no perfect, simple, and practical solution. I generally choose JS hash or sdbm.
Hash is a string hash function. The code for these two functions is as follows:

unsigned int JSHash(char *str){unsigned int hash = 1315423911; // nearly a prime - 1315423911 = 3 * 438474637while (*str){hash ^= ((hash << 5) + (*str++) + (hash >> 2));}return (hash & 0x7FFFFFFF);}unsigned int SDBMHash(char *str){unsigned int hash = 0;while (*str){// equivalent to: hash = 65599*hash + (*str++);hash = (*str++) + (hash << 6) + (hash << 16) - hash;}return (hash & 0x7FFFFFFF);}

Jshash operations are complicated. sdbmhash is a good choice if the performance requirements are not particularly high.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Integer Hash Function

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Integer Hash Function

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support