Basic concepts
The so-called perfect hash function. It means a hash function that has no conflict. That is, the casual key1! = Key2 has h (key1)! = h (key2).
Set the definition field to X, the range is Y, n=| x|,m=| y|. Then there must be m>=n, assuming that for different key1,key2 belong to X, there is H (key1)!=h (Key2), then H is the perfect hash function, when m=n, H is called the smallest perfect hash function (this time is one by one mapping).
When processing large-scale string data. It is common to assign an integer ID to each string. This requires a hash function for the string. How to find a perfect string hash function?
There are some commonly used string hash functions.
Like Bkdrhash,aphash. Djbhash. Jshash,rshash. Sdbmhash,pjwhash. Elfhash and so on. are more classic.
Often used string hash function and Elfhash,aphash, and so on, are very simple and effective method.
These functions use bit arithmetic to make each character affect the last function value. There are also hash functions represented by MD5 and SHA1. These functions are almost impossible to find collisions.
A string hash function is often used with bkdrhash. Aphash. Djbhash,jshash,rshash,sdbmhash,pjwhash,elfhash and so on. For the above several hash functions. I made a small assessment of it.
hash function |
Data 1 |
Data 2 |
Data 3 |
Data 4 |
Data 1 Score |
Data 2 Score |
Data 3 Score |
Data 4 Score |
Average score |
Bkdrhash |
2 |
0 |
4774 |
481 |
96.55 |
100 |
90.95 |
82.05 |
92.64 |
Aphash |
2 |
3 |
4754 |
493 |
96.55 |
88.46 |
100 |
51.28 |
86.28 |
Djbhash |
2 |
2 |
4975 |
474 |
96.55 |
92.31 |
0 |
100 |
83.43 |
Jshash |
1 |
4 |
4761 |
40W |
100 |
84.62 |
96.83 |
17.95 |
81.94 |
Rshash |
1 |
0 |
4861 |
505 |
100 |
100 |
51.58 |
20.51 |
75.96 |
Sdbmhash |
3 |
2 |
4849 |
504 |
93.1 |
92.31 |
57.01 |
23.08 |
72.41 |
Pjwhash |
30 |
26 |
4878 |
513 |
0 |
0 |
43.89 |
0 |
21.95 |
Elfhash |
30 |
26 |
4878 |
513 |
0 |
0 |
43.89 |
0 |
21.95
|
Data 1 is the number of random string hash collisions consisting of 100,000 letters and numbers.
Data 2 is the number of 100,000 meaningful English sentence hash collisions. Data 3 is the number of conflicts that are stored in a linear table after the hash value of data 1 is modeled with 1000003 (large prime).
Data 4 is the number of conflicts that are stored in a linear table after the hash value of data 1 is modeled with 10000019 (greater prime).
After comparison. The above average score is obtained.
The average is the square average. It can be found that bkdrhash whether in actual effect or coding implementation. The effect is the most prominent. Aphash is also an excellent algorithm. Djbhash,jshash,rshash and Sdbmhash have their own merits. Pjwhash and Elfhash have the worst effect, but the scores are similar and the algorithms are similar in nature.
unsigned int sdbmhash (char *str) {unsigned int hash = 0; while (*STR) {//equivalent To:hash = 65599*hash + (*str++); hash = (*str++) + (hash << 6) + (hash << +)-hash; } return (hash & 0x7FFFFFFF);} RS Hash functionunsigned int Rshash (char *str) {unsigned int b = 378551; unsigned int a = 63689; unsigned int hash = 0; while (*STR) {hash = hash * A + (*str++); a *= b; } return (hash & 0x7FFFFFFF);} JS Hash functionunsigned int Jshash (char *str) {unsigned int hash = 1315423911; while (*STR) {hash ^= (hash << 5) + (*str++) + (hash >> 2)); } return (hash & 0x7FFFFFFF);} P. J. Weinberger Hash functionunsigned int Pjwhash (char *str) {unsigned int bitsinunignedint = (unsigned int) (sizeof (unsigned int) * 8); unsigned int threequarters = (unsigned int) ((Bitsinunignedint * 3)/4); unsigned int oneeighth = (unsigned int) (bitsinunignedINT/8); unsigned int highbits = (unsigned int) (0xFFFFFFFF) << (bitsinunignedint-oneeighth); unsigned int hash = 0; unsigned int test = 0; while (*STR) {hash = (hash << oneeighth) + (*str++); if (test = hash & highbits)! = 0) {hash = (hash ^ (test >> threequarters)) & (~highbits )); }} return (hash & 0x7FFFFFFF);} ELF Hash functionunsigned int Elfhash (char *str) {unsigned int hash = 0; unsigned int x = 0; while (*STR) {hash = (hash << 4) + (*str++); if ((x = hash & 0xf0000000l)! = 0) {hash ^= (x >> 24); Hash &= ~x; }} return (hash & 0x7FFFFFFF);} BKDR Hash functionunsigned int Bkdrhash (char *str) {unsigned int seed = 131;//131 1313 13131 131313 etc.. unsigned int hash = 0; while (*STR) {hash = hash * seed + (*str++); } rEturn (hash & 0x7FFFFFFF);} DJB Hash functionunsigned int Djbhash (char *str) {unsigned int Hash = 5381; while (*STR) {hash + = (hash << 5) + (*str++); } return (hash & 0x7FFFFFFF);} AP Hash functionunsigned int Aphash (char *str) {unsigned int hash = 0; int i; for (i=0; *str; i++) {if ((I & 1) = = 0) {Hash ^= (hash << 7) ^ (*str++) ^ (Hash & Gt;> 3)); } else {hash ^= ((hash << one) ^ (*str++) ^ (hash >> 5)); }} return (hash & 0x7FFFFFFF);}
a hash function in programming Zhu Ji Nanxiong
The size of the hash table with the prime number closest to the element # # # Nhash 29989#define MULT 31unsigned in hash (char *p) { unsigned int h = 0; for (; *p; p++) h = MULT *h + *p; return h% Nhash;}
Common use of hash functions and their C language implementations