Php-perl hash algorithm implementation (times33 hash algorithm)

Source: Internet
Author: User
Php-perl hash implementation algorithm-DJBX33A (DanielJ. Bernstein, Times33withAddition) APR hash default algorithm The code is as follows:


APR_DECLARE_NONSTD (unsigned int) apr_hashfunc_default (const char * char_key,
Apr_ssize_t * klen)
{
Unsigned int hash = 0;
Const unsigned char * key = (const unsigned char *) char_key;
Const unsigned char * p;
Apr_ssize_t I;

/*
* This is the popular 'Times 33' hash algorithm which is used
* Perl and also appears in Berkeley DB. This is one of the best
* Known hash functions for strings because it is both computed
* Very fast and distributes very well.
*
* The originator may be Dan Bernstein but the code in Berkeley DB
* Cites Chris Torek as the source. The best citation I have found
* Is "Chris Torek, Hash function for text in C, Usenet message
* <27038@mimsy.umd.edu> in comp. lang. c, October, 1990. "in Rich
* Salz's USENIX 1992 paper about INN which can be found
*.
*
* The magic of number 33, I. e. why it works better than extends other
* Constants, prime or not, has never been adequately explained
* Anyone. So I try an explanation: if one experimentally tests all
* Multipliers between 1 and 256 (as I did while writing a low-level
* Data structure library some time ago) one detects that even
* Numbers are not useable at all. The remaining 128 odd numbers
* (Random T for the number 1) work more or less all equally well.
* They all distribute in an acceptable way and this way fill a hash
* Table with an average percent of approx. 86%.
*
* If one compares the chi ^ 2 values of the variants (see
* Bob Jenkins ''hashing Frequently Asked Questions''
* Http://burtleburtle.net/bob/hash/hashfaq.html for a description
* Of chi ^ 2), the number 33 not even has the best value. But
* Number 33 and a few other equally good numbers like 17, 31, 63,
* 127 and 129 have nevertheless a great advantage to the remaining
* Numbers in the large set of possible multipliers: their multiply
* Operation can be replaced by a faster operation based on just one
* Shift plus either a single addition or subtraction operation. And
* Because a hash function has to both distribute good _ and _ has
* Be very fast to compute, those few numbers shoshould be preferred.
*
* -- Ralf S. Engelschall
*/

If (* klen = APR_HASH_KEY_STRING ){
For (p = key; * p; p ++ ){
Hash = hash * 33 + * p;
}
* Klen = p-key;
}
Else {
For (p = key, I = * klen; I --, p ++ ){
Hash = hash * 33 + * p;
}
}
Return hash;
}

Translation of function comments: This is a well-known times33 hash algorithm, which is used by perl and appears in Berkeley DB. it is one of the best known hash algorithms. it has extremely fast computing efficiency and good hash distribution when processing string-based hash. dan Bernstein was the first to propose this algorithm, but the source code is indeed implemented by Clris Torek in Berkeley DB. I found the most accurate quote to say this, "Chris Torek, C-language text hash function, Usenet message <27038@mimsy.umd.edu> in comp. lang. c. May October 1990. "Rich Salz mentioned in his article about INN, which was published in USENIX on April 9, 1992. this article can be found in. 33. why is it better than other values? No matter whether it is important or not, no one can fully explain the reasons. so here, I will try to explain it. if someone tries to test every number between 1 and 256 (just like an underlying data structure library I wrote some time ago), he will find that, no number is outstanding. the performance of the 128 odd numbers (except 1) is almost the same. both of them can achieve an acceptable hash distribution with an average distribution rate of about 86%. if we compare the difference (gibbon: statistical term, indicating the mean deviation between a random variable and its mathematical expectation) (see Bob Jenkins's <哈希常见疑问> Http://burtleburtle.net/bob/hash/hashfaq.html, ), the number 33 is not the best. (gibbon: According to my understanding, it should be the smaller the variance, but the formula for calculating the variance is not clear here, and in the hash discrete table, the higher the degree of discretization, the better, so it is not clear whether the better here refers to a large difference or a small difference), but the number 33 and other similar good numbers such as 17,31, 63,127 and 129 for other remaining numbers, there is still a big advantage in the face of a large number of hash operations, that is, these numbers can replace multiplication with bitwise operations with addition and subtraction, this operation speed will increase. after all, a good hash algorithm requires both good distribution and high computing speed, and few numbers can reach both of these two points.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.