hash algorithm (hash algorithm)--an approach to __ algorithm

Source: Internet
Author: User
Tags comparison hash strcmp blizzard
Coincidentally, almost all of the popular hash map uses the DJB hash function, commonly known as "TIMES33" algorithm.

Perl, Berkeley DB, Apache, MFC, STL, and so on.

Times33 's algorithm is also very simple, is constantly multiply by 33. Nhash = nhash*33 + *key++;

I didn't find any theory to illustrate the rationality of this algorithm, it is said that only through testing and practice found that the algorithm is relatively good. It would be appreciated if any of you could provide this information.

I've compared times33 with some other hashing algorithms, and times33 is really faster than the other hashing algorithms I found. In addition, some people say that times33 to the English alphabet efficiency is better, the efficiency is relatively low when dealing with Chinese; I have tested this and found that the performance difference between ASCII and Chinese is below 3 per thousand, which I think is a normal error.



Create the fastest hash table (dialogue with Blizzard) http://blog.csdn.net/zeronecpp/archive/2005/04/11/342756.aspx
This is in other people's blog to see an article, tell Blizzard how to improve the hash table. In the above hashing algorithm there is a "Seed2 + (Seed2 << 5)" equivalent to multiplying by 33, in fact, can be seen as a variant of the TIMES33 algorithm. I have doubts about the efficiency of Blizzard's approach to implementation.

The core of the above Blizzard hashing algorithm is as follows (I assign the simplest value to crypttable and set the Dwhashtype to 1):

Inline UINT Cmymap::hashkey (LPCTSTR key) const
{
int dwhashtype = 1;
unsigned long seed1 = 0x7fed7fed, seed2 = 0xEEEEEEEE;
int ch;
while (*key! = 0)
{
ch = toupper (*key++);

seed1 = crypttable[(dwhashtype << 8) + ch] ^ (seed1 + seed2);
Seed1 = ((Dwhashtype << 8) + ch) ^ (seed1 + seed2);
SEED2 = ch + seed1 + seed2 + (Seed2 << 5) + 3;
}
return seed1;
}

I tested it and found that Blizzard's hash algorithm was less distributed than the classic TIMES33 algorithm. It is distributed as follows: elements=10000, good=4293 bad2=1786 bad3=528 bad4=109 vbad=22
and the classical TIMES33 algorithm distribution is: elements=10000, good=4443 bad2=1775 bad3=501 bad4=107 vbad=15
Description: This is the output of my test program, when I tested it, I set the bucket number to 12007 by InitHashTable (). The elements in the output indicates how many elements are stored in the hash table, good represents the number of buckets for "only one element", Bad2 represents the number of buckets "with two elements", BAD3 represents the number of buckets "with three elements", Vbad says " Number of buckets with five or more than five elements ".

The classic TIMES33 algorithm is as follows:
Inline UINT Cmymap::hashkey (LPCTSTR key) const
{
UINT Nhash = 0;
while (*key)
Nhash = (nhash<<5) + Nhash + *key++;
return nhash;
}
From the code can be clearly seen that the blizzard hash algorithm of the computational effort is much larger than the classic TIMES33 algorithm.

My understanding is: this is to let the same string, according to the different dwhashtype to calculate the different independent hash value. To achieve this goal, Blizzard's hash algorithm has paid some price for performance.

//
The above is a comparison of the hash algorithm
/////////////////////////////////////////
The following is a comparison of the overall structure of the hash table
//

In addition, blizzard this algorithm essentially put the data in the hash bucket inside, also in each hash bucket inside there is a list queue.
Only the general hash table, after finding the hash bucket, the direct comparison element, and the blizzard of this hash table, is the "additional two hash value comparison" to replace the direct comparison of element. It depends on the specific application environment.
Given the amount of work done to calculate three hash values, I think it might be slower to set up a suitable hash bucket count,blizzard.
The hash distribution test I've done above has shown that when the hash bucket count is more than 20% larger than elements, the number of strcmp calls to find an element is approximately (4443*1+1175*2*1.5+501*3*2+107*4 *2.5+15*5*3)/10000=1.2269 times, about 1.2 times. (4,443 buckets has only one element, so one strcmp can be confirmed.) There are 1175 buckets with two elements and an average of 1.5 times strcmp to find it. And so on )
Do 1.2 times strcmp () and do 2 HashKey () Believe everyone knows who is time-consuming.


The so-called "fastest hash table" seems a bit of a misnomer. There's something else I can't see.
The so-called "one-way hash" is actually a non-reversible hash, mainly used for encryption, and speed is not fast unpleasant relationship. In fact, "one-way hash" is usually slower to achieve irreversible purposes. Blizzard is my favorite company, I am also Blizzard's hardcore fans, but this time it seems that some people boast the wrong direction of Blizzard:

Search for "hash algorithm" on Google to find a lot of interesting things.
Http://www.partow.net/programming/hashfunctions/is a very interesting article.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.