Hash comparison between PHP and Python (1)

Source: Internet
Author: User
Tags key string
The array in PHP and the dict in python are implemented through the hash table (hash table or hash table), or the array and dict are themselves hash structures, this article and subsequent articles will compare the implementation algorithms of hash tables in PHP and python source code to learn their design ideas, it can also be used to avoid some bugs that may reduce efficiency during development.

The array in PHP and the dict in python are implemented through the hash table (hash table or hash table), or the array and dict are themselves hash structures, this article and subsequent articles will compare the implementation algorithms of hash tables in PHP and python source code to learn their design ideas, it can also be used to avoid some bugs that may reduce efficiency during development.

The array in PHP and the dict in python are implemented through the hash table (hash table or hash table), or the array and dict are themselves hash structures, this article and subsequent articles will compare the implementation algorithms of hash tables in PHP and python source code to learn their design ideas, in addition, it can be used to avoid operations that may reduce efficiency or cause bugs during development.

First come to PHP. Everything comes from PHP's built-in data type zval (see PHP_X_X/Zend/zend. h ):

typedef union _zvalue_value {    long lval;                  //long value    double dval;                //double value    struct {        char *val;        int len;    } str;    HashTable *ht;              //hash table value    zend_object_value obj;} zvalue_value;struct _zval_struct {    //Variable information    zvalue_value value;     //value    zend_uint refcount_gc;    zend_uchar type;    //active type    zend_uchar is_ref_gc;};

HashTable * ht is the structure used to represent the Array type in PHP. Before going into the HashTable structure, first understand the principle of the hash table, in C language, arrays use natural numbers as array indexes to store data. In PHP or python, hash tables are accessed in key-value format, to implement this storage method, you need to map any possible keys to the natural number sequence index of the array or memory.

Index = hash (key)

Hash () is a hash function. Ideally, hash () can map any key to a natural number set that is evenly distributed and does not overlap. However, due to key uncertainty, this is obviously impossible, therefore, a good hash function should be able to avoid overlap or collision (collisions) as much as possible. The hash function implementing this function in PHP adopts the DJBX33A algorithm. The implementation code in the source code is as follows:

static inline ulong zend_inline_hash_func(const char *arKey, uint nKeyLength){    register ulong hash = 5381;    /* variant with the hash unrolled eight times */    for (; nKeyLength >= 8; nKeyLength -= 8) {        hash = ((hash << 5) + hash) + *arKey++;        hash = ((hash << 5) + hash) + *arKey++;        hash = ((hash << 5) + hash) + *arKey++;        hash = ((hash << 5) + hash) + *arKey++;        hash = ((hash << 5) + hash) + *arKey++;        hash = ((hash << 5) + hash) + *arKey++;        hash = ((hash << 5) + hash) + *arKey++;        hash = ((hash << 5) + hash) + *arKey++;    }    switch (nKeyLength) {        case 7: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */        case 6: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */        case 5: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */        case 4: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */        case 3: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */        case 2: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */        case 1: hash = ((hash << 5) + hash) + *arKey++; break;        case 0: break;EMPTY_SWITCH_DEFAULT_CASE()    }    return hash;}

The DJBX33A (Daniel J. Bernstein, Times 33 with Addition) algorithm can be briefly described

Hash (I) = hash (I-1) * 33 + str [I]

As to why 33 is used instead of other numbers, the explanation is "1 ~ The results of the best choice after the 256 tests are conducted separately are not theoretically supported, and the initial hash value of 5381 is nothing special, right? So far, the first rule that can be determined is to define the length of the key in PHP when an array is used, so it is better not to exceed 7 to save the for loop in the first step, therefore, considering the efficiency, it is obviously not advisable to set the variable name to dozens of characters or even a sentence to increase the readability of the Code.

Hash collision can be reduced through clever algorithms, but it is not completely avoided (for example, PHP hash table collision attack principle). Since conflicts are inevitable, there are many solutions to conflicts in the algorithm book. PHP adopts the zipper method, for the specific implementation method, you must first pursue its definition (see PHP_X_X/Zend/zend_hash.h ):

typedef struct bucket {    ulong h;                        //Used for numeric indexing    uint nKeyLength;    void *pData;    void *pDataPtr;    struct bucket *pListNext;    struct bucket *pListLast;    struct bucket *pNext;    struct bucket *pLast;    const char *arKey;} Bucket;typedef struct _hashtable {    uint nTableSize;    uint nTableMask;    uint nNumOfElements;    ulong nNextFreeElement;    Bucket *pInternalPointer;   //Used for element traversal    Bucket *pListHead;    Bucket *pListTail;    Bucket **arBuckets;    dtor_func_t pDestructor;    zend_bool persistent;    unsigned char nApplyCount;    zend_bool bApplyProtection;#if ZEND_DEBUG    int inconsistent;#endif} HashTable;

The key of the hash table is stored in the Bucket. the length of the arKey is Bucket. nKeyLength. The hash value calculated by the hash function is saved as a Bucket. h. when a conflict is solved, a static linked list is introduced. Its implementation is as follows:

ZEND_API int zend_hash_exists(const HashTable *ht, const char *arKey, uint nKeyLength){    ulong h;    uint nIndex;    Bucket *p;    IS_CONSISTENT(ht);    h = zend_inline_hash_func(arKey, nKeyLength);    nIndex = h & ht->nTableMask;    p = ht->arBuckets[nIndex];    while (p != NULL) {        if (p->arKey == arKey ||            ((p->h == h) && (p->nKeyLength == nKeyLength)             && !memcmp(p->arKey, arKey, nKeyLength))) {                return 1;        }        p = p->pNext;    }    return 0;}

P = p-> pNext is the next element that creates a new location storage conflict on the existing element. So far, the basic idea of implementing HashTable in PHP has been introduced, and the python part will be added when it is free.

Small trick for building dynamic struct

The last arKey element of the Bucket struct is defined as char * arKey. You can also see char arKey [1]. Some people have explained that the variable-length struct is used and comments are added.

Char arKey [1];/* Must be last element */

Even more, I thought that arKey must store the last character of the key string in HashTable... After some struggles, I found that this was not the meaning, shit! (See what-is-your-favorite-c-programming-trick), the so-called variable length struct is only to achieve dynamic allocation of elements inside the struct in consideration of memory continuity, based on the nature of struct, place the variables that need to be dynamically allocated at the end of the struct, so that the memory allocated to struct dynamically through malloc exceeds the part required by the struct itselfsizeof(struct)It can be naturally accessed by the last element to implement a variable length struct.Must be last elementIt doesn't mean that the last character of the key is stored, but it must be placed in the last element of the struct! Shit again (but a good trick: P )!

Reference

  1. In-depth analysis of PHP hash table structure

Original article address: Hash comparison between PHP and Python (1). Thank you for sharing it with the original author.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.