The Hash algorithm in PHP. HashTable is the core of PHP. PHP arrays, associated arrays, object attributes, function tables, symbol tables, and so on all use HashTable as the container. PHP HashTable adopts the zipper Hash Table, which is the core of PHP.
PHP arrays, associated arrays, object attributes, function tables, symbol tables, and so on all use HashTable as the container.
PHP HashTable adopts the zipper method to solve conflicts. I don't need to mention this. today I am mainly concerned with the PHP Hash algorithm and some ideas revealed by this algorithm.
PHP Hash is currently the most common DJBX33A (Daniel J. bernstein, Times 33 with Addition), this algorithm is widely used in multiple software projects, such as Apache, Perl and Berkeley DB. this is currently the best hash algorithm known for strings, because the algorithm is fast and has a good classification (small conflicts and even distribution ).
The core idea of an algorithm is:
1. hash (I) = hash (i-1) * 33 + str [I]
In zend_hash.h, we can find this algorithm in PHP:
1. static inline ulong zend_inline_hash_func (char * arKey, uint nKeyLength)
2 .{
3. register ulong hash = 5381;
4.
5./* variant with the hash unrolled eight times */
6. for (; nKeyLength> = 8; nKeyLength-= {
7. hash = (hash <5) + hash) + * arKey ++;
8. hash = (hash <5) + hash) + * arKey ++;
9. hash = (hash <5) + hash) + * arKey ++;
10. hash = (hash <5) + hash) + * arKey ++;
11. hash = (hash <5) + hash) + * arKey ++;
12. hash = (hash <5) + hash) + * arKey ++;
13. hash = (hash <5) + hash) + * arKey ++;
14. hash = (hash <5) + hash) + * arKey ++;
15 .}
16. switch (nKeyLength ){
17. case 7: hash = (hash <5) + hash) + * arKey ++;/* fallthrough ...*/
18. case 6: hash = (hash <5) + hash) + * arKey ++;/* fallthrough ...*/
19. case 5: hash = (hash <5) + hash) + * arKey ++;/* fallthrough ...*/
20. case 4: hash = (hash <5) + hash) + * arKey ++;/* fallthrough ...*/
21. case 3: hash = (hash <5) + hash) + * arKey ++;/* fallthrough ...*/
22. case 2: hash = (hash <5) + hash) + * arKey ++;/* fallthrough ...*/
23. case 1: hash = (hash <5) + hash) + * arKey ++; break;
24. case 0: break;
25. EMPTY_SWITCH_DEFAULT_CASE ()
26 .}
27. return hash;
28 .}
Compared with the classic Times 33 algorithm directly used in Apache and Perl:
1. hashing function used in Perl 5.005:
2. # Return the hashed value of a string: $ hash = perlhash ("key ")
3. # (Defined by the PERL_HASH macro in hv. h)
4. sub perlhash
5 .{
6. $ hash = 0;
7. foreach (split //, shift ){
8. $ hash = $ hash * 33 + ord ($ _);
9 .}
10. return $ hash;
11 .}
In the hash algorithm of PHP, we can see that the difference is very detailed.
First of all, the most difference is that PHP does not use directly multiplication 33, but uses:
1. hash <5 + has
This will certainly be faster than multiplication.
Then, the special idea is to use the unrolled. I have read an article about the Discuz Cache mechanism a few days ago. one of them is that Discuz will adopt different Caching policies based on the popularity of the post, according to user habits, only the first page of The Post is cached (because few people will flip the post ).
In this similar idea, PHP encourages the 8-bit character index. he uses unrolled in 8 units to improve efficiency, which is also very detailed and meticulous.
There are also inline, register variables... It can be seen that PHP developers are also painstaking in hash optimization.
Finally, the hash initial value is set to 5381. why do we choose 5381 compared to the times algorithm in Apache and the Hash algorithm in Perl (both use the initial hash value 0? I don't know the specific cause, but I found some features of 5381:
1. Magic Constant 5381:
2. 1. odd number
3. 2. prime number
4. 3. deficient number
001/010/100/000/101/
After reading this, I have reason to believe that the selection of this initial value can provide better classification.
As for why is it "Times 33" instead of "Times", it is also explained in the comments of the PHP Hash algorithm, hoping to be useful to those who are interested in it:
1. DJBX33A (Daniel J. Bernstein, Times 33 with Addition)
2.
3. This is Daniel J. Bernstein's popular 'Times 33' hash function
4. posted by him years ago on comp. lang. c. It basically uses a function
5. like ''hash (I) = hash (i-1) * 33 + str [I] ''. This is one of the best
6. known hash functions for strings. Because it is both computed very
7. fast and distributes very well.
8.
9. The magic of number 33, I. e. why it works better than extends other
10. constants, prime or not, has never been adequately explained
11. anyone. So I try an explanation: if one experimentally tests all
12. multipliers between 1 and 256 (as RSE did now) one detects that even
13. numbers are not useable at all. The remaining 128 odd numbers
14. (wait t for the number 1) work more or less all equally well. They
15. all distribute in an acceptable way and this way fill a hash table
16. with an average percent of approx. 86%.
17.
18. If one compares the Chi ^ 2 values of the variants, the number 33 not
19. even has the best value. But the number 33 and a few other equally
20. good numbers like 17, 31, 63,127 and 129 have nevertheless a great
21. advantage to the remaining numbers in the large set of possible
22. multipliers: their multiply operation can be replaced by a faster
23. operation based on just one shift plus either a single addition
24. or subtraction operation. And because a hash function has to both
25. distribute good _ and _ has to be very fast to compute, those few
26. numbers shocould be preferred and seems to be the reason why Daniel J.
27. Bernstein also preferred it.
28.
29. www.2cto.com -- Ralf S. Engelschall
• Author: Laruence
• Address: http://www.laruence.com/2009/07/23/994.html
Http://www.bkjia.com/PHPjc/478471.htmlwww.bkjia.comtruehttp://www.bkjia.com/PHPjc/478471.htmlTechArticleHash Table is the core of PHP, this is not a point. PHP arrays, associated arrays, object attributes, function tables, symbol tables, and so on all use HashTable as the container. PHP HashTable adopts the zipper...