Comparison of common hash functions and implementation of C language

Source: Internet
Author: User

Basic concepts
The so-called perfect hash function refers to a hash function that has no conflict, that is, to any key1! = Key2 has h (key1)! = h (key2).
Set the definition field to X, the range is Y, n=| x|,m=| y|, then there must be m>=n, if for different key1,key2 belong to X, have H (key1)!=h (key2), then called H is the perfect hash function, when m=n, H is called the smallest perfect hash function (this time is one by one mapping).

When working with large-scale string data, it is often necessary to assign an integer ID to each string. This requires a hash function for the string. How to find a perfect string hash function?

There are some commonly used string hash functions. Like Bkdrhash,aphash,djbhash,jshash,rshash,sdbmhash,pjwhash,elfhash and so on. are more classic.

Commonly used string hash function and Elfhash,aphash, and so on, are very simple and effective method. These functions use bitwise arithmetic to make each character affect the last function value. There are also hash functions represented by MD5 and SHA1, which are almost impossible to find collisions.

Commonly used string hash functions have bkdrhash,aphash,djbhash,jshash,rshash,sdbmhash,pjwhash,elfhash and so on. For the above hash functions, I have a small evaluation of them.

hash function Data 1 Data 2 Data 3 Data 4 Data 1 Score Data 2 Score Data 3 Score Data 4 Score Average score
Bkdrhash 2 0 4774 481 96.55 100 90.95 82.05 92.64
Aphash 2 3 4754 493 96.55 88.46 100 51.28 86.28
Djbhash 2 2 4975 474 96.55 92.31 0 100 83.43
Jshash 1 4 4761 40W 100 84.62 96.83 17.95 81.94
Rshash 1 0 4861 505 100 100 51.58 20.51 75.96
Sdbmhash 3 2 4849 504 93.1 92.31 57.01 23.08 72.41
Pjwhash 30 26 4878 513 0 0 43.89 0 21.95
Elfhash 30 26 4878 513 0 0 43.89 0 21.95
Where data 1 is the number of random string hash collisions consisting of 100,000 letters and numbers. Data 2 is the number of 100,000 meaningful English sentence hash collisions. Data 3 is the number of conflicts that are stored in a linear table after the hash value of data 1 is modeled with 1000003 (large prime). Data 4 is the number of conflicts that are stored in a linear table after the hash value of data 1 is modeled with 10000019 (greater prime).


After comparison, the above average score is obtained. The average is the square average. It can be found that the Bkdrhash effect is the most prominent in both actual and coding implementations. Aphash is also an excellent algorithm. Djbhash,jshash,rshash and Sdbmhash have their own merits. Pjwhash and Elfhash have the worst effect, but the scores are similar and the algorithms are similar in nature.

unsigned int sdbmhash (char *str) {unsigned int hash = 0;        while (*STR) {//equivalent To:hash = 65599*hash + (*str++);    hash = (*str++) + (hash << 6) + (hash << +)-hash; } return (hash & 0x7FFFFFFF);}    RS Hash functionunsigned int Rshash (char *str) {unsigned int b = 378551;    unsigned int a = 63689;     unsigned int hash = 0;        while (*STR) {hash = hash * A + (*str++);    a *= b; } return (hash & 0x7FFFFFFF);}     JS Hash functionunsigned int Jshash (char *str) {unsigned int hash = 1315423911;    while (*STR) {hash ^= (hash << 5) + (*str++) + (hash >> 2)); } return (hash & 0x7FFFFFFF);} P. J. Weinberger Hash functionunsigned int Pjwhash (char *str) {unsigned int bitsinunignedint = (unsigned int) (sizeof    (unsigned int) * 8);    unsigned int threequarters = (unsigned int) ((Bitsinunignedint * 3)/4); unsigned int oneeighth = (unsigned int) (bitsinunignedINT/8);    unsigned int highbits = (unsigned int) (0xFFFFFFFF) << (bitsinunignedint-oneeighth);    unsigned int hash = 0;     unsigned int test = 0;        while (*STR) {hash = (hash << oneeighth) + (*str++); if (test = hash & highbits)! = 0) {hash = (hash ^ (test >> threequarters)) & (~highbits        )); }} return (hash & 0x7FFFFFFF);}    ELF Hash functionunsigned int Elfhash (char *str) {unsigned int hash = 0;     unsigned int x = 0;        while (*STR) {hash = (hash << 4) + (*str++);            if ((x = hash & 0xf0000000l)! = 0) {hash ^= (x >> 24);        Hash &= ~x; }} return (hash & 0x7FFFFFFF);}    BKDR Hash functionunsigned int Bkdrhash (char *str) {unsigned int seed = 131;//131 1313 13131 131313 etc..     unsigned int hash = 0;    while (*STR) {hash = hash * seed + (*str++); } rEturn (hash & 0x7FFFFFFF);}     DJB Hash functionunsigned int Djbhash (char *str) {unsigned int Hash = 5381;    while (*STR) {hash + = (hash << 5) + (*str++); } return (hash & 0x7FFFFFFF);}    AP Hash functionunsigned int Aphash (char *str) {unsigned int hash = 0;     int i; for (i=0; *str; i++) {if ((I & 1) = = 0) {Hash ^= (hash << 7) ^ (*str++) ^ (Hash &        Gt;> 3));        } else {hash ^= ((hash << one) ^ (*str++) ^ (hash >> 5)); }} return (hash & 0x7FFFFFFF);}

a hash function in programming Zhu Ji Nanxiong

The size of the hash table with the prime number closest to the element # # # Nhash 29989#define MULT 31unsigned in hash (char *p) {    unsigned int h = 0;    for (; *p; p++)        h = MULT *h + *p;    return h% Nhash;}


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Comparison of common hash functions and implementation of C language

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.