the significance of hash algorithm is to provide a fast access data method, it uses an algorithm to establish the corresponding relationship between the key value and the real value (each True value can have only one key value, but a key value can correspond to multiple real values), so that the data can be accessed quickly in the array and other conditions. read a lot of hash data on the Internet, so the relevant data of the hash is summarized and collected. HashTable.h template class hashtable{public:hashtable (int count); void put (t* T, int key); t* get (int key); private:t** Tarray; }//hashtable.cpp template hashtable::hashtable (int count) {Tarray = new T*[count];} template void HashTable::p ut (T * t, int key) {this->tarray[key] = t;} Template t* hashtable::get (int key) {return this->tarray[key];} In this way, we can quickly access T-type data as long as we know the key value, rather than looking for it in a data structure such as a list. As for key values, they are usually calculated with some sort of algorithm (the so-called hash algorithm). For example: hash algorithm for strings, char* value = "Hello"; int key = (((((27* (int) ' H ' +27) * (int) ' E ') +) * (int) ' l ') +) * (int) ' L ' +27) * + (int) ' O '; hash function processing process hash, the general translation to do "hash", there is a direct transliteration to "hash", is the arbitrary length of the input (also called the pre-image), through hashing algorithm, transform into a fixed length of output, the output is hash value. This conversion is a compression map, in which the space of the hash value is usually much smaller than the input space, and different inputs may be hashed out into the same output, and it is not possible to uniquely determine the input value from the hash value. Simply put, it is an encryption that converts the input of any content into the same length output.
Let me make a metaphor.
We have a lot of piglets, each weight is different, assuming the weight distribution is more average (we consider the kilogram level), we according to weight, divided into 100 small sty.
Then each pig, according to the weight of the drive into their own pigsty, record files.
Well, what if we're looking for some piggy? We need every pigsty, every pig's right.
Of course it's not necessary.
We looked at the pig's weight and then found the pigsty.
The number of piglets in this pigsty is relatively small.
In this pigsty we can find the little piggy that we're looking for relatively quickly.
Corresponds to the hash algorithm.
is according to hashcode distribution different pigsty, will hashcode the same pig put in a pigsty.
Find the time, first find the hashcode corresponding pigsty, and then compare the inside of the pig.
So the crux of the matter is how many pigsty is more suitable to build.
If each pig had a different weight (taking into account the MG level), each built a pigsty, then we could find the pig at the quickest speed. The disadvantage is that the cost of building so many pigsty is a little too high.
If we divide by 10 kg, then there are only a few pigs in the pigsty, so there are a lot of piglets in each circle. Although we can quickly find the pigsty, but from this pigsty to determine the pig is also very tired.
So, good hashcode, according to the actual situation, according to the specific needs, in the time cost (more pigsty, faster speed) and space (less pigsty, lower space demand) between the balance.
There are many kinds of hash algorithms. Specific can refer to the previous hash algorithm I wrote some analysis. The Department to provide you with a lot of use of the hash algorithm class, should be able to meet the needs of many people:
Java code
Commonly used string hash functions and Elfhash,aphash and so on, are very simple and effective methods. These functions use bitwise operations to make each character have an effect on the last function value. There are also hash functions represented by MD5 and SHA1, which are almost impossible to find collisions with.
Common string hash functions are bkdrhash,aphash,djbhash,jshash,rshash,sdbmhash,pjwhash,elfhash and so on. For the above several hash functions, I have a small evaluation of it.
hash function |
Data 1 |
Data 2 |
Data 3 |
Data 4 |
Data 1 Score |
Data 2 Score |
Data 3 Score |
Data 4 Score |
Average score |
Bkdrhash |
2 |
0 |
4774 |
481 |
96.55 |
100 |
90.95 |
82.05 |
92.64 |
Aphash |
2 |
3 |
4754 |
493 |
96.55 |
88.46 |
100 |
51.28 |
86.28 |
Djbhash |
2 |
2 |
4975 |
474 |
96.55 |
92.31 |
0 |
100 |
83.43 |
Jshash |
1 |
4 |
4761 |
40W |
100 |
84.62 |
96.83 |
17.95 |
81.94 |
Rshash |
1 |
0 |
4861 |
505 |
100 |
100 |
51.58 |
20.51 |
75.96 |
Sdbmhash |
3 |
2 |
4849 |
504 |
93.1 |
92.31 |
57.01 |
23.08 |
72.41 |
Pjwhash |
30 |
26 |
4878 |
513 |
0 |
0 |
43.89 |
0 |
21.95 |
Elfhash |
30 |
26 |
4878 |
513 |
0 |
0 |
43.89 |
0 |
21.95 |
The number of random string hash conflicts in which data 1 is 100,000 letters and numbers. Data 2 is the number of hash conflicts for 100,000 meaningful English sentences. The number of conflicts in the linear table is stored in data 3 after modulo the hash value of data 1 and 1000003 (large primes). Data 4 is the number of conflicts that are stored in a linear table after modulo the hash value of data 1 and 10000019 (larger primes).
After comparison, the above average score is obtained. The average is the square average. It can be found that the effect of Bkdrhash is the most outstanding in both the actual effect and the coding implementation. Aphash is also a more excellent algorithm. Djbhash,jshash,rshash and Sdbmhash. Pjwhash and Elfhash effect is the worst, but the score is similar, its algorithm essence is similar.
In the information contest, in order to be in line with the principle of easy code debugging, personally think Bkdrhash is the most suitable for memory and use of
#define M 249997 #define M1 1000001 #define M2 0xf0000000/RS Hash Function unsigned int rshash (CHAR*STR) {
unsigned int b=378551;
unsigned int a=63689;
unsigned int hash=0;
while (*STR) {hash=hash*a+ (*str++);
A*=b;
Return (hash% M);
}//JS Hash Function unsigned int jshash (CHAR*STR) {unsigned int hash=1315423911;
while (*STR) {hash^= (hash<<5) + (*str++) + (hash>>2));
Return (hash% M); }//P. Weinberger Hash Function unsigned int pjwhash (CHAR*STR) {unsigned int bitsinunignedint= (unsigned in
T) (sizeof (unsigned int) *8);
unsigned int threequarters= (unsigned int) ((bitsinunignedint*3)/4);
unsigned int oneeighth= (unsigned int) (BITSINUNIGNEDINT/8);
unsigned int highbits= (unsigned int) (0xFFFFFFFF) << (bitsinunignedint-oneeighth);
unsigned int hash=0;
unsigned int test=0; while (*STR) {
Hash= (hash<<oneeighth) + (*str++);
if ((test=hash&highbits)!=0) {hash= ((hash^ (Test>>threequarters)) & (~highbits));
} return (hash% M);
}//ELF Hash Function unsigned int elfhash (CHAR*STR) {unsigned int hash=0;
unsigned int x=0;
while (*STR) {hash= (hash<<4) + (*str++);
if ((x=hash&0xf0000000l)!=0) {hash^= (x>>24);
Hash&=~x;
} return (hash% M); }//BKDR Hash Function unsigned int bkdrhash (CHAR*STR) {unsigned int seed=131;//131 1313 13131-131313 et
C.. unsigned int hash=0;
while (*STR) {hash=hash*seed+ (*str++);
Return (hash% M);
}//SDBM Hash Function unsigned int sdbmhash (CHAR*STR) {unsigned int hash=0; while (*STR) {hash= (*str++) + (hash<<6) + (hash<<16)-hash;
Return (hash% M);
}//DJB Hash Function unsigned int djbhash (CHAR*STR) {unsigned int hash=5381;
while (*STR) {hash+= (hash<<5) + (*str++);
Return (hash% M);
}//AP Hash Function unsigned int aphash (CHAR*STR) {unsigned int hash=0;
int i; for (i=0;*str;i++) {if (i&1) ==0) {hash^= (hash<<7) ^ (*str++) ^ (hash>>3)
);
else {hash^= ((hash<<11) ^ (*str++) ^ (hash>>5)));
} return (hash% M); }