Integer Hash Function

Source: Internet
Author: User
I. Integer Hash Functions

There are three common methods: Direct remainder method, Product Integer method, and square method. The three methods are discussed below. The following assumes that our keyword is that the capacity of the hash table is, And the hash function is
.

1. Direct remainder Extraction

We use the keyword divided
To take the remainder as the position in the hash table. Function expressions can be written as follows:

For example, table capacity and key value
. The advantage of this method is that it is easy to implement and fast, and is a very common method. However, if the selection is not good, but the specimen is quite special, it is easy for the data to be hashed in the hash to affect the efficiency.

In terms of experience, we generally choose a prime number that is not very close to each other. If the value range of the keyword is small, we generally choose 1.1 ~ Within 1.6 times. For example, if the value range is, it is a good choice. During the competition, you can write a prime number generator or simply write a "compare the number of workers" by yourself.

I inserted a hash table with a capacity of 4000 with a quota of 701. The result is:

Test Data

Random Data

Continuous Data

Minimum unit capacity:

0

5

Maximum unit capacity:

15

6

Expected capacity:

5.70613

5.70613

Standard Deviation:

2.4165

0.455531

It can be seen that for random data, the maximum unit capacity of the remainder method has nearly three times the expected capacity. Tested on my machine (Pentium III 866 MHz, 128 mb ram), the function runs for about 39ns, that is, about 35 clock cycles.

2. Product Integer Method

We use the keyword multiplied by
The real number (preferably the irrational number) in the hash table to obtain a real number between them. Take the fractional part of the number, multiply it, and then take the integer part, that is, the position in the hash table. Function expressions can be written as follows:

The decimal part, that is. For example, the table capacity, seed (a good choice), and key value.

Insert a hash table () with a capacity of 4000 with a number of 701. The result is as follows:

Test Data

Random Data

Continuous Data

Minimum unit capacity:

0

4

Maximum unit capacity:

15

7

Expected capacity:

5.70613

5.70613

Standard Deviation:

2.5069

0.619999

From the formula, we can see that this method is very small, and the method is very good when the value is not suitable for the direct remainder method. However, from the test above, the performance is not very satisfactory, and the running speed is slow due to the large number of floating point operations. After repeated optimization, we still need 892ns on our machine to complete a computation, that is, 810 clock cycles, 23 times the direct remainder method.

3. China and France

We take the square of the keyword and take the intermediate bit as the hash function value to return. Because each digit is squareIntermediateSeveral digits have an impact, so the effect of this method is also good. However, it is not ideal for smaller values, and it is complicated to implement. To make full use of the space of the hash table, it is best to take the integer power of 2. For example, the table capacity and key value.

Insert a hash table with a capacity of 4000 to 512 (note that 701 is not used here to use the space of the hash table). The result is as follows:

Test Data

Random Data

Continuous Data

Minimum unit capacity:

0

1

Maximum unit capacity:

17

17

Expected capacity:

7.8125

7.8125

Standard Deviation:

2.95804

2.64501

The effect is worse than we think, especially for continuous data. However, since only multiplication and bitwise operations are supported, this function is the fastest. On my machine, an operation only requires 23 NS, that is, 19 clock cycles, which is faster than the direct remainder method.

Compare the three methods:

Implementation difficulty

Actual Effect

Running Speed

Other applications

Direct remainder Method

Ease

Good

Fast

String

Product Integer Method

Ease of use

Better

Slow

Floating Point Number

China and France

Medium

Better

Fast

None

From this table, we can easily see that the cost-effectiveness of the direct remainder method is the highest, so it is also the most used method in our competition.

For real-number hash functions, we can directly use the product to take an integer. For Hash Functions Whose specimens are other types of data, we can first convert them into integers, then insert it into the hash table. Next we will study how to convert other types of data into integers.

Ii. String Hash Functions

The string itself can be regarded as a large integer in decimal form (ANSI string is decimal form). Therefore, we can use the direct remainder method to directly calculate the hash function value in linear time. To ensure the effect, we still cannot select a number that is too close to each other. Especially when we regard a string as an hexadecimal number, if this parameter is selected, the hash function values of any sort of the string are the same. (Think About It, why ?)

Common string hash functions, such as elfhash and aphash, are simple and effective methods. These functions use bitwise operations to make every character affect the final function value. There are also Hash Functions Represented by MD5 and sha1, which are almost impossible to find a collision (MD5 was cracked some time ago ).

I randomly selected 1000 from one of Mark Twain's novels.DifferentAnd 1000 wordsDifferentAs the test data of short and long strings, then use different hash functions to convert them into integers, and then insert a hash table with a capacity of 1237 using the direct remainder method, in case of a conflict, overwrite the old string with the new string. By observing the number of "remaining" strings, We can roughly obtain the actual effects of different hash functions.

Short string

Long String

Average

Encoding difficulty

Get the remainder directly

667

676

671.5

Ease

P. J. Weinberger hash

683

676

679.5

Hard

Elf hash

683

676

679.5

Hard

Sdbm hash

694

680

687.0

Ease

Bkdr hash

665

710

687.5

Ease of use

Djb hash

694

683

688.5

Ease of use

AP hash

684

698

691.0

Hard

RS hash

691

693

692.0

Hard

JS hash

684

708

696.0

Ease of use

Insert 1000 random numbers into a hash table with a capacity of 1237 using the direct remainder method. The number of covered units also reaches 694. It can be seen that the following methods have reached the limit, randomness is excellent. However, it is difficult to choose because there is no perfect, simple, and practical solution. I generally choose JS hash or sdbm.
Hash is a string hash function. The code for these two functions is as follows:

unsigned int JSHash(char *str){unsigned int hash = 1315423911; // nearly a prime - 1315423911 = 3 * 438474637while (*str){hash ^= ((hash << 5) + (*str++) + (hash >> 2));}return (hash & 0x7FFFFFFF);}unsigned int SDBMHash(char *str){unsigned int hash = 0;while (*str){// equivalent to: hash = 65599*hash + (*str++);hash = (*str++) + (hash << 6) + (hash << 16) - hash;}return (hash & 0x7FFFFFFF);}

Jshash operations are complicated. sdbmhash is a good choice if the performance requirements are not particularly high.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.