Php-perl Hash Algorithm implementation

Source: Internet
Author: User
Tags hash advantage

APR_DECLARE_NONSTD (unsigned int) apr_hashfunc_default (const char * char_key,
                                                      apr_ssize_t * klen)
{
    unsigned int hash = 0;
    const unsigned char * key = (const unsigned char *) char_key;
    const unsigned char * p;
    apr_ssize_t i;

    / *
     * This is the popular `times 33 'hash algorithm which is used by
     * perl and also appears in Berkeley DB. This is one of the best
     * known hash functions for strings because it is both computed
     * very fast and distributes very well.
     *
     * The originator may be Dan Bernstein but the code in Berkeley DB
     * cites Chris Torek as the source. The best citation I have found
     * is "Chris Torek, Hash function for text in C, Usenet message
     * <27038@mimsy.umd.edu> in comp.lang.c, October, 1990. "in Rich
     * Salz's USENIX 1992 paper about INN which can be found at
     *.
     *
     * The magic of number 33, i.e. why it works better than many other
     * constants, prime or not, has never been adequately explained by
     * anyone. So I try an explanation: if one experimentally tests all
     * multipliers between 1 and 256 (as I did while writing a low-level
     * data structure library some time ago) one detects that even
     * numbers are not useable at all. The remaining 128 odd numbers
     * (except for the number 1) work more or less all equally well.
     * They all distribute in an acceptable way and this way fill a hash
     * table with an average percent of approx. 86%.
     *
     * If one compares the chi ^ 2 values of the variants (see
     * Bob Jenkins `` Hashing Frequently Asked Questions '' at
     * http://burtleburtle.net/bob/hash/hashfaq.html for a description
     * of chi ^ 2), the number 33 not even has the best value. But the
     * number 33 and a few other equally good numbers like 17, 31, 63,
     * 127 and 129 have nevertheless a great advantage to the remaining
     * numbers in the large set of possible multipliers: their multiply
     * operation can be replaced by a faster operation based on just one
     * shift plus either a single addition or subtraction operation. And
     * because a hash function has to both distribute good _and_ has to
     * be very fast to compute, those few numbers should be preferred.
     *
     *-Ralf S. Engelschall
     * /

    if (* klen == APR_HASH_KEY_STRING) {
        for (p = key; * p; p ++) {
            hash = hash * 33 + * p;
        }
        * klen = p-key;
    }
    else {
        for (p = key, i = * klen; i; i--, p ++) {
            hash = hash * 33 + * p;
        }
    }
    return hash;
}
Translation of the function comment section: This is the well-known times33 hash algorithm, which is adopted by the Perl language and appears in Berkeley DB. It is one of the best known hash algorithms. Key-value hashing has extremely fast computing efficiency and a good hash distribution. Dan Bernstein was the first to propose this algorithm, but the source code was indeed implemented by Clris Torek in Berkeley DB. The most exact I found The citation says "Chris Torek, C-language text hash function, Usenet message <27038@mimsy.umd.edu in comp.lang.c, October 1990." published by Rich Salz in the USENIX newspaper in 1992. Mentioned in the article on INN. This article can be found on it. Why is this wonderful number 33 better than other numerical values? Whether it is important or not, no one can fully explain the reason. Therefore Here, let me try to explain. If someone tries to test every number between 1 and 256 (like an underlying data structure library I wrote some time ago), he will find that there is no one number The performance is particularly outstanding. 128 of them The numbers (except 1) all perform similarly, and they can reach an acceptable hash distribution, with an average distribution rate of about 86%. If you compare the variance values in these 128 odd numbers (gibbon: statistical term, it means that the random variable and the The average deviation between its mathematical expectations) (see Bob Jenkins' <Hash Frequently Asked Questions> http://burtleburtle.net/bob/hash/hashfaq.html, description of the squared difference), the number 33 and Not the best performing one. (Gibbon: According to my understanding, as usual, the smaller the variance should be, the more stable it is, but because the calculation formula of the author's variance is not clear here, and whether the larger the dispersion, Good, so it is unknown whether the good performance here refers to the large variance value or the small variance value), but the number 33 and other equally good numbers such as 17, 31, 63, 127, and 129 are for the remaining numbers. In the face of a large number of hash operations, there is still a big advantage that these numbers can be replaced by bit operations combined with addition and subtraction, which will increase the speed of operation. After all, a good hash algorithm requires both good The distribution must also have high calculations Degree, can simultaneously achieve these two numbers are few.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.