Comparison of common hash algorithms

Last Update:2018-08-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

the significance of hash algorithm is to provide a fast access data method, it uses an algorithm to establish the corresponding relationship between the key value and the real value (each True value can have only one key value, but a key value can correspond to multiple real values), so that the data can be accessed quickly in the array and other conditions. read a lot of hash data on the Internet, so the relevant data of the hash is summarized and collected. HashTable.h template class hashtable{public:hashtable (int count); void put (t* T, int key); t* get (int key);　private:t** Tarray; }//hashtable.cpp template hashtable::hashtable (int count) {Tarray = new T*[count];} template void HashTable::p ut (T * t, int key) {this->tarray[key] = t;}　　　Template t* hashtable::get (int key) {return this->tarray[key];}　In this way, we can quickly access T-type data as long as we know the key value, rather than looking for it in a data structure such as a list. As for key values, they are usually calculated with some sort of algorithm (the so-called hash algorithm). For example: hash algorithm for strings, char* value = "Hello";　int key = (((((27* (int) ' H ' +27) * (int) ' E ') +) * (int) ' l ') +) * (int) ' L ' +27) * + (int) ' O '; hash function processing process hash, the general translation to do "hash", there is a direct transliteration to "hash", is the arbitrary length of the input (also called the pre-image), through hashing algorithm, transform into a fixed length of output, the output is hash value. This conversion is a compression map, in which the space of the hash value is usually much smaller than the input space, and different inputs may be hashed out into the same output, and it is not possible to uniquely determine the input value from the hash value. Simply put, it is an encryption that converts the input of any content into the same length output.

Let me make a metaphor.
We have a lot of piglets, each weight is different, assuming the weight distribution is more average (we consider the kilogram level), we according to weight, divided into 100 small sty.
Then each pig, according to the weight of the drive into their own pigsty, record files.

Well, what if we're looking for some piggy? We need every pigsty, every pig's right.
Of course it's not necessary.

We looked at the pig's weight and then found the pigsty.
The number of piglets in this pigsty is relatively small.
In this pigsty we can find the little piggy that we're looking for relatively quickly.

Corresponds to the hash algorithm.
is according to hashcode distribution different pigsty, will hashcode the same pig put in a pigsty.
Find the time, first find the hashcode corresponding pigsty, and then compare the inside of the pig.

So the crux of the matter is how many pigsty is more suitable to build.

If each pig had a different weight (taking into account the MG level), each built a pigsty, then we could find the pig at the quickest speed. The disadvantage is that the cost of building so many pigsty is a little too high.

If we divide by 10 kg, then there are only a few pigs in the pigsty, so there are a lot of piglets in each circle. Although we can quickly find the pigsty, but from this pigsty to determine the pig is also very tired.

So, good hashcode, according to the actual situation, according to the specific needs, in the time cost (more pigsty, faster speed) and space (less pigsty, lower space demand) between the balance.

There are many kinds of hash algorithms. Specific can refer to the previous hash algorithm I wrote some analysis. The Department to provide you with a lot of use of the hash algorithm class, should be able to meet the needs of many people:

Java code

Commonly used string hash functions and Elfhash,aphash and so on, are very simple and effective methods. These functions use bitwise operations to make each character have an effect on the last function value. There are also hash functions represented by MD5 and SHA1, which are almost impossible to find collisions with.

Common string hash functions are bkdrhash,aphash,djbhash,jshash,rshash,sdbmhash,pjwhash,elfhash and so on. For the above several hash functions, I have a small evaluation of it.

hash function	Data 1	Data 2	Data 3	Data 4	Data 1 Score	Data 2 Score	Data 3 Score	Data 4 Score	Average score
Bkdrhash	2	0	4774	481	96.55	100	90.95	82.05	92.64
Aphash	2	3	4754	493	96.55	88.46	100	51.28	86.28
Djbhash	2	2	4975	474	96.55	92.31	0	100	83.43
Jshash	1	4	4761	40W	100	84.62	96.83	17.95	81.94
Rshash	1	0	4861	505	100	100	51.58	20.51	75.96
Sdbmhash	3	2	4849	504	93.1	92.31	57.01	23.08	72.41
Pjwhash	30	26	4878	513	0	0	43.89	0	21.95
Elfhash	30	26	4878	513	0	0	43.89	0	21.95

The number of random string hash conflicts in which data 1 is 100,000 letters and numbers. Data 2 is the number of hash conflicts for 100,000 meaningful English sentences. The number of conflicts in the linear table is stored in data 3 after modulo the hash value of data 1 and 1000003 (large primes). Data 4 is the number of conflicts that are stored in a linear table after modulo the hash value of data 1 and 10000019 (larger primes).

After comparison, the above average score is obtained. The average is the square average. It can be found that the effect of Bkdrhash is the most outstanding in both the actual effect and the coding implementation. Aphash is also a more excellent algorithm. Djbhash,jshash,rshash and Sdbmhash. Pjwhash and Elfhash effect is the worst, but the score is similar, its algorithm essence is similar.

In the information contest, in order to be in line with the principle of easy code debugging, personally think Bkdrhash is the most suitable for memory and use of

#define M 249997 #define M1 1000001 #define M2 0xf0000000/RS Hash Function unsigned int rshash (CHAR*STR) { 
    unsigned int b=378551; 
    unsigned int a=63689; 
     
    unsigned int hash=0; 
        while (*STR) {hash=hash*a+ (*str++); 
    A*=b; 
Return (hash% M); 
     
    }//JS Hash Function unsigned int jshash (CHAR*STR) {unsigned int hash=1315423911; 
    while (*STR) {hash^= (hash<<5) + (*str++) + (hash>>2)); 
Return (hash% M); }//P. Weinberger Hash Function unsigned int pjwhash (CHAR*STR) {unsigned int bitsinunignedint= (unsigned in 
    T) (sizeof (unsigned int) *8); 
    unsigned int threequarters= (unsigned int) ((bitsinunignedint*3)/4); 
    unsigned int oneeighth= (unsigned int) (BITSINUNIGNEDINT/8); 
    unsigned int highbits= (unsigned int) (0xFFFFFFFF) << (bitsinunignedint-oneeighth); 
    unsigned int hash=0; 
     
    unsigned int test=0; while (*STR) {
        Hash= (hash<<oneeighth) + (*str++); 
        if ((test=hash&highbits)!=0) {hash= ((hash^ (Test>>threequarters)) & (~highbits)); 
} return (hash% M); 
    }//ELF Hash Function unsigned int elfhash (CHAR*STR) {unsigned int hash=0; 
     
    unsigned int x=0; 
        while (*STR) {hash= (hash<<4) + (*str++); 
            if ((x=hash&0xf0000000l)!=0) {hash^= (x>>24); 
        Hash&=~x; 
} return (hash% M); }//BKDR Hash Function unsigned int bkdrhash (CHAR*STR) {unsigned int seed=131;//131 1313 13131-131313 et 
     
    C.. unsigned int hash=0; 
    while (*STR) {hash=hash*seed+ (*str++); 
Return (hash% M); 
     
    }//SDBM Hash Function unsigned int sdbmhash (CHAR*STR) {unsigned int hash=0; while (*STR) {hash= (*str++) + (hash<<6) + (hash<<16)-hash; 
Return (hash% M); 
     
    }//DJB Hash Function unsigned int djbhash (CHAR*STR) {unsigned int hash=5381; 
    while (*STR) {hash+= (hash<<5) + (*str++); 
Return (hash% M); 
    }//AP Hash Function unsigned int aphash (CHAR*STR) {unsigned int hash=0; 
     
    int i; for (i=0;*str;i++) {if (i&1) ==0) {hash^= (hash<<7) ^ (*str++) ^ (hash>>3) 
        ); 
        else {hash^= ((hash<<11) ^ (*str++) ^ (hash>>5))); 
} return (hash% M);  }

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Comparison of common hash algorithms

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Comparison of common hash algorithms

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support