PHP hash Table (ii) hash function

Source: Internet
Author: User
Tags bitwise

Some of the most critical aspects of a hash table are:

    1. Access via key (calculates key by hash function)
    2. Mapped to data structures (the storage structure of the hash table itself)
    3. Processing of mappings (conflict or collision detection and processing functions)
understanding PHP's hashing algorithm

In general, for the shaping of an index, we can easily think of a modulo operation, such as an array (1=> ' a ', 2=> ' B ', 3=> ' C '), which we use index%3 to hash, but the subscript of the PHP array has a more flexible array (' A ' = ' C ', ' b ' = ' = ' d '), what hash function is selected at this time? The answer is the djbx33a algorithm.

The ps:djbx33a algorithm, which is the time33 algorithm, is the APR default hashing algorithm, PHP, Apache, Perl, BSDDB also use time33 hash. For 33 of this number, the DJB note says that all odd numbers between 1 and 256 can achieve an acceptable hash distribution, averaging about 86%. And the number of 33,17,31,63,127,129 these numbers in the face of a large number of hashing has a greater advantage, is that these figures can be multiplied with the bitwise operation Plus subtraction substitution, so that the operation speed is higher. The GCC compiler automatically converts the multiplication to a bitwise operation when it is turned on for optimization.

Here is the specific code implementation of this hash function:

StaticInlineULONGZend_inline_hash_func (Char*arkey,UINTnkeylength) {RegisterULONGhash =5381;/*variant with the hash unrolled eight times*/         for(; Nkeylength >=8; Nkeylength-=8) {Hash= ((hash <<5+ hash) + *arkey++; Hash= ((hash <<5+ hash) + *arkey++; Hash= ((hash <<5+ hash) + *arkey++; Hash= ((hash <<5+ hash) + *arkey++; Hash= ((hash <<5+ hash) + *arkey++; Hash= ((hash <<5+ hash) + *arkey++; Hash= ((hash <<5+ hash) + *arkey++; Hash= ((hash <<5+ hash) + *arkey++; }    Switch(nkeylength) { Case 7: hash = (hash <<5+ hash) + *arkey++;/*Fallthrough ...*/                 Case 6: hash = (hash <<5+ hash) + *arkey++;/*Fallthrough ...*/                 Case 5: hash = (hash <<5+ hash) + *arkey++;/*Fallthrough ...*/                 Case 4: hash = (hash <<5+ hash) + *arkey++;/*Fallthrough ...*/                 Case 3: hash = (hash <<5+ hash) + *arkey++;/*Fallthrough ...*/                 Case 2: hash = (hash <<5+ hash) + *arkey++;/*Fallthrough ...*/                 Case 1: hash = (hash <<5+ hash) + *arkey++; Break;  Case 0: Break; Empty_switch_default_case ()}returnHash;}
Ntablemask

The PHP hash table has a minimum capacity of 8 (2^3), a maximum capacity of 0x80000000 (2^31), and a full power round to 2 (that is, the length is automatically extended to 2 of the entire power, such as 13 elements of the hash table length is 16, 100 elements of the hash table length is 128). Ntablemask is initialized to hash table length (rounded) minus 1.

The mask value of the hash table equals nTableSize-1, what is his role? The correct index value used to correct the hash value computed by the DBJ algorithm in the hash table of the current ntablesize size. For example, "foo" by the fixed algorithm after the hash value is 193491849, if the size of the table is 64, it is clear that the maximum index value, it is necessary to use the mask of the hash table to correct the actual use of the method is to do with the mask bit and operation, This is done in order to map the hash value as large as the ntalbesize space.

Hash  |   193491849 |    0b1011100010000111001110001001 & Mask  | &        | &  0b0000000000000000000000111111---------------------------------------------------------= Index | =         9 | = 0b0000000000000000000000001001    

The specific code is in ZEND/ZEND_HASH.C's _zend_hash_init function, where the section related to this article is truncated with a few comments.

Zend_apiint_zend_hash_init (HashTable *ht,UINTnSize, hash_func_t phashfunction, dtor_func_t pdestructor, Zend_bool persistent zend_file_line_dc) {    UINTi =3; Buckets**tmp;    Set_inconsistent (HT_OK); //full number of lengths to 2 power round    if(NSize >=0x80000000) {        /*Prevent overflow*/HT->ntablesize =0x80000000; } Else {         while((1U<< i) <nSize) {i++; } HT->ntablesize =1<<i; } HT->ntablemask = Ht->ntablesize-1; /*some code is omitted here ...*/    returnSUCCESS;}

The hashing algorithm of Zend Hashtable is relatively simple:

Hash (key) =key & Ntablemask

That is, the original key of the data and the ntablemask of Hashtable can be simply bitwise AND.

If the original key is a string, first use the TIMES33 algorithm to convert the string to reshape and then to the Ntablemask bitwise with.

Hash (strkey) =time33 (strkey) & Ntablemask

Here is the code for finding a hash table in the Zend Source:

Zend_apiintZend_hash_index_find (ConstHashTable *ht,ULONGHvoid**pData) {    UINTNIndex; Buckets*p;    Is_consistent (HT); NIndex= h & ht->Ntablemask; P= ht->Arbuckets[nindex];  while(P! =NULL) {        if((P->h = = h) && (p->nkeylength = =0)) {            *pdata = p->PData; returnSUCCESS; } P= p->Pnext; }    returnFAILURE;} Zend_apiintZend_hash_find (ConstHashTable *ht,Const Char*arkey,UINTNkeylength,void**pData) {    ULONGh; UINTNIndex; Buckets*p;    Is_consistent (HT); H=Zend_inline_hash_func (Arkey, nkeylength); NIndex= h & ht->Ntablemask; P= ht->Arbuckets[nindex];  while(P! =NULL) {        if((P->h = = h) && (p->nkeylength = =nkeylength)) {            if(!MEMCMP (p->Arkey, Arkey, Nkeylength)) {                *pdata = p->PData; returnSUCCESS; }} P= p->Pnext; }    returnFAILURE;}

Where Zend_hash_index_find is used to find the integer key, Zend_hash_find is used to find the string key. The logic is basically consistent, except that the string key is encapsulated with the TIMES33 algorithm by Zend_inline_hash_func to integer key,zend_inline_hash_func.

processing of hash conflicts

With regard to hash collisions, the implementation of PHP is achieved by the zipper method, when the key value is hashed to the same slot (bucket) is a conflict, this time will be pulled from the bucket a list of conflicting elements to link up the order.

About those two pairs of pointers, the foreign site has been mistaken, here the detection of hash conflict php function, Pnext pointer function is at a glance.

Zend_apiintZend_hash_exists (ConstHashTable *ht,Const Char*arkey,UINTnkeylength) {    ULONGh; UINTNIndex; Buckets*p;    Is_consistent (HT); H=Zend_inline_hash_func (Arkey, nkeylength); NIndex= h & ht->Ntablemask; P= ht->Arbuckets[nindex];  while(P! =NULL) {        if(P->arkey = = Arkey | |(P->h = = h) && (p->nkeylength = = nkeylength) &&!memcmp (p->Arkey, Arkey, nkeylength))) {                return 1; } P= p->Pnext; }    return 0;}

PHP hash Table (ii) hash function

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.