Php-perl hash Algorithm implementation (TIMES33 hash algorithm) _php techniques

Source: Internet
Author: User
Tags data structures

Copy Code code as follows:

APR_DECLARE_NONSTD (unsigned int) apr_hashfunc_default (const char *char_key,
apr_ssize_t *klen)
{
unsigned int hash = 0;
Const unsigned char *key = (const unsigned char *) Char_key;
const unsigned char *p;
apr_ssize_t i;

/*
* This is the popular ' Times ' hash algorithm which be used by
* Perl and also appears in Berkeley DB. This is one of the best
* Known hash functions for strings because it is both computed
* Very fast and distributes very.
*
* The originator May is Dan Bernstein but the code in Berkeley DB
* cites Chris Torek as the source. The best citation I have found
* is ' Chris Torek, Hash function for text in C, Usenet message
* <27038@mimsy.umd.edu> in Comp.lang.c, October, 1990. "In Rich
* Salz ' s USENIX 1992 paper about INN which can is found at
* .
*
* The magic of number, i.e. why it works better than many
* constants, prime or not, has never been adequately by
* anyone. So I try a explanation:if one experimentally tests all
* Multipliers between 1 and 256 (as I did while writing a low-level
* Data Structure Library some time ago) one detects that even
* Numbers are not useable in all. The remaining 128 odd numbers
* (except for the number 1) work, or less all equally.
* They all distribute the acceptable way and this way fill a hash
* Table with an average percent of approx. 86%.
*
* If One compares the chi^2 values of the variants
* Bob Jenkins ' hashing frequently asked Questions ' at
* Http://burtleburtle.net/bob/hash/hashfaq.html for a description
* of chi^2), the number is not even has the best value. But the
* Number and a few other equally good numbers like 17, 31, 63,
* 127 and 129 have nevertheless a great advantage to the remaining
* Numbers in the large set of possible multipliers:their multiply
* operation can be replaced by a faster operation based on just one
* Shift plus either a single addition or subtraction operation. and
* Because a hash function has to both distribute good _and_ has to
* is very fast to compute, those few numbers should is preferred.
*
*--Ralf S. Engelschall
*/

if (*klen = = apr_hash_key_string) {
for (p = key; *p; p++) {
hash = hash * + *p;
}
*klen = P-key;
}
else {
for (p = key, i = *klen i--, p++) {
hash = hash * + *p;
}
}
return hash;
}

translation of the function Annotation Section : This is a well-known times33 hashing algorithm, which is adopted by Perl and appears in Berkeley DB. It is one of the best known hashing algorithms, and when dealing with a hash with a string as the key value, It has a very fast computational efficiency and a good hash distribution. This algorithm was first proposed by Dan Bernstein, but the source code was actually made by Clris Torek in Berkeley DB. The most exact citation I found said "Chris torek,c language text hash function, Usenet news <<27038@mimsy.umd.edu> in comp.lang.c, October 1990. " It was mentioned in an article in a discussion Inn published in the Usenix newspaper in 1992 by Rich Salz. This article can be found on the. 33 This fantastic number, why is it better than other numerical results? No matter whether it is important or not, no one has ever been able to fully explain why. So here, let me try to explain. If someone tries to test every number from 1 to 256 (as I wrote in the previous period in a database of underlying data structures), He will find that no number is particularly prominent. Of these 128 odd (1 except) performance is similar, can achieve an acceptable hash distribution, the average distribution rate is about 86%. If you compare the variance values in these 128 odd numbers (Gibbon: a statistical term that represents the average deviation between a random variable and its mathematical expectation) (see Bob Jenkins's < hash common question >http://burtleburtle.net/bob/ hash/hashfaq.html, a description of the squared difference, the number 33 is not the best performance. (Gibbon: Here, as I understand it, according to common sense, should be the smaller the variance stability, but because here is not clear the calculation formula of variance of the author, as well as in the hash discrete table, is not the greater the degree of dispersion is better, so it is unclear whether the performance here is mean variance value or mean variance value is small), But the number 33 and some other equally good numbers, such as 17,31,63,127 and 129, for the rest of the numbers, still have a big advantage in the face of a lot of hashing, which is that these numbers can replace multiplication with addition and subtraction with bitwise operations. This will increase the speed of the operation. After all, a good hashing algorithm requires both a good distribution and a high computational speed, which can achieve a small number of these two points at the same time.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.