PHP Source of the Road chapter III first (hash table implementation)

Source: Internet
Author: User
Tags numeric key string zend

PHP Hash Table implementation

    The basic principle of a hash table has been introduced and a fundamental hash table has been implemented, and in real projects, the need for a hash table is far more than simple. There are different requirements for performance and flexibility. Let's look at how the hash table is implemented in PHP.

    Hash implementation

    of PHP The hash table in the PHP kernel is a very important data structure, and most of PHP's language features are based on hash tables, such as the scope of variables, function tables, properties of classes, methods, and so on, and many of the data inside the Zend engine are stored in the hash table.

    data structure and description

    we mentioned earlier that the hash table in PHP uses the Zipper method to resolve the conflict, specifically by using a linked list to store the hash to the same slot, Zend to save the relationship between the data using two-way lists to link elements.

    PHP Hash Table implementation in ZEND/ZEND_HASH.C, or in the way of the previous section, first look at the PHP implementation of the data structure, PHP uses the following two data structures to implement a hash table, Hashtable structure to save the entire hash table needs the basic information, The bucket structure is used to hold specific data content, as follows:
typedef struct _HASHTABLE
 { 
    uint ntablesize;        The size of the hash bucket, with a minimum of 8, increases by 2x.
    UINT Ntablemask;        NTableSize-1, optimization of index value
    uint nnumofelements;    The number of elements that are currently in the hash Bucket, and the count () function returns the value 
    ulong nnextfreeelement;///The position of the next numeric index
    Bucket *pinternalpointer;   The current traversal pointer (foreach is one of the reasons for fast)
    Bucket *plisthead;          Storage array header element pointer
    Bucket *plisttail;          Storage array tail element pointer
    Bucket **arbuckets;         Storage hash array
    dtor_func_t pdestructor;    The callback function that is executed when the element is deleted, for the release of the resource
    zend_bool persistent;       The way of bucket memory allocation is pointed out. If Persisient is true, use the memory allocation function of the operating system itself to allocate memory for bucket, otherwise use the memory allocation function of PHP.
    unsigned char napplycount//mark the number of times the current hash bucket is recursively accessed (prevent multiple recursion)
    Zend_bool bapplyprotection;// Mark current Hash bucket allow multiple access is not allowed, the maximum can only be recursive 3 times
#if zend_debug
    int inconsistent;
#endif
} HashTable;
The Ntablesize field is used to indicate the capacity of a hash table with a minimum initial capacity of 8. First look at the initialization function of the hash table:
zend_api int _zend_hash_init (HashTable *ht, uint nsize, hash_func_t phashfunction, dtor_func_
    T Pdestructor, Zend_bool persistent zend_file_line_dc) {UINT i = 3;
    .. if (nsize >= 0x80000000) {/* prevent overflow/ht->ntablesize = 0x80000000;
        else {while (1U << i) < nsize) {i++;
    } ht->ntablesize = 1 << i;

    }//... ht->ntablemask = ht->ntablesize-1; /* Uses Ecalloc () so, bucket* = = NULL/if (persistent) {TMP = (Bucket * *) calloc (ht->ntablesize, Siz
        EOF (Bucket *));
        if (!tmp) {return failure;
    } ht->arbuckets = tmp;
        else {tmp = (Bucket * *) Ecalloc_rel (ht->ntablesize, sizeof (Bucket *));
        if (tmp) {ht->arbuckets = tmp;
} return SUCCESS; }
For example, if you set the initial size to 10, the above algorithm will resize to 16. That is, always resize to an integer that is close to the initial size of 2.
    Why do you make such an adjustment? Let's take a look at Hashtable's method of mapping hash values to slots, and in the previous section we used modulo to map the hash value to the slot. For example, a hash table with a size of 8 and a hash value of 100, the mapped slot index is: 100 8 = 4, because the index usually starts with 0, so the slot's index value is 3 , the index is computed in PHP using the following methods:
    h = Zend_inline_hash_func (Arkey, nkeylength);
    nindex = h & ht->ntablemask;
    From the above _zend_hash_init () function, the ht->ntablemask size is ht->ntablesize-1. This is done using & rather than using modulo, because it is a relatively large amount of consumption and bitwise manipulation of the modulo operation.


    The role of mask is to map the hash value to the range of indexes that the slot can store. For example: The index value of a key is 21, the hash table size is 8, then the mask is 7, then the binary representation is: 10101 & 111 = 101 is the decimal 5. Because 2 of the whole number of the second-1 binary is special: The next n-bit value is 1, so it is easier to map the value, if the normal number of binary and then affect the result of the hash value. Then the average distribution of the values computed by the hash function may have an effect.

    after you set the hash table size, you need to request the space for the data to be stored for the hash table, such as the code initialized above, and different memory request methods are invoked depending on whether you need to persist. As described in the previous PHP lifecycle, the need for persistence is embodied in the following: persistent content can be accessed across multiple requests, rather than persistent storage as a space to be freed at the end of a request. Specific content will be covered in the Memory Management section. The

    nnumofelements field in Hashtable is well understood, and this field is updated every time an element is inserted or the unset is deleted. This makes it possible to quickly return the count () function when it counts the number of elements in the array.

nnextfreeelement fields are useful. First look at a section of PHP code:
            <?php
            $a = array (=> ' Hello ');
            $a [] = ' tipi ';
            Var_dump ($a);

            Ouput
            Array (2) {
              [10]=>
              string (5) "Hello"
              [11]=>
              string (5) "Tipi"
            }
    PHP can not specify the index value to add elements to the array, then the default use of numbers as the index, and the C language is similar to the enumeration, and the index of this element is determined by the Nnextfreeelement field. If the number key is present in the array, the most recent key + 1 is used by default, for example, an element with 10 as a key already exists in the previous example, so the default index for the new insert is 11.

Data containers: Slot position

below look at the slot data structure that holds the hash table data:
typedef struct BUCKET {
    ulong H;            A hash value for char *key, or a user-specified numeric index value
    uint Nkeylength;    The length of the hash keyword, if the array index is a number, this value is 0
    void *pdata;        Point to value, which is usually a copy of the user's data and, if it is pointer data, point to pdataptr
    void *pdataptr;     In the case of pointer data, this value points to true value, while above pdata points to the value
    struct bucket *plistnext;   The next element of the entire hash table
    struct bucket *plistlast;   The entire hash table the previous element of the element
    struct bucket *pnext;       The next element stored in the same hash bucket
    struct bucket *plast;       The last element of the same hash bucket
    //holds the key string for the current value, which can only be defined at the end, to implement the variable-length structure
    char arkey[1];              
Bucket;

    such as the comments on the fields above. The H field holds the value of the hash table key hash. The hash value saved here is not the index value in the hash table, this is because the index value is directly related to the capacity of the hashtable, and if the hash tables are expanded, the indexes must be hashed for index mapping, which is also an optimization method. You can use a string or number as the index of an array in PHP. Numeric indexes are directly indexed to a hash table, and numbers do not need to be hashed. The Nkeylength field at the back of the H field is labeled as the key length, and Nkeylength is 0 if the index is a number. In a PHP array, if an index string can be converted to a number, it is also converted to a numeric index. So in PHP, for example, ' 10 ', ' 11 ', the character index and the numeric index 10, 11 are no different. The


    last field in the structure above is used to hold the string of the key, which is declared to be an array of only one character, in fact here is a long, variable-length structure, the main purpose is to increase flexibility. The following code to request space when inserting new elements into a hash table:



    p = (Bucket *) pemalloc (sizeof (Bucket)-1 + nkeylength, ht->persistent);
    if (!p) {return
        failure;
    }
    memcpy (P->arkey, Arkey, nkeylength);
    such as code, the size of the application space plus the length of the string key, and then copy the key to the new application space. In the back, such as the need for a hash search when you need to contrast key so that you can compare the P->arkey and find the key is the same data to find. The size of the application space-1 is because the byte in the structure body itself is still available.

This field is defined as a const char* Arkey type in PHP5.4.


The bucket structure maintains two bidirectional linked lists, and the Pnext and Plast pointers respectively point to the relationship of the linked list where the slot is located.

The Plistnext and Plistlast pointers point to the link between all the data in the hash table. The Plisthead and Plisttail in the Hashtable structure maintain a pointer to the head element pointer and the last element of the entire hash table.

    There are very many action functions for arrays in PHP, such as the Array_shift () and the Array_pop () function, which pops elements from the head and tail of the array, respectively. The header and tail pointers are saved in the hash table, so that the target can be found within constant time when these operations are performed. PHP also has some of the less-used array manipulation functions: Next (), Prev (), and so on, another pointer to the hash table works: Pinternalpointer, which holds the pointer inside the current hash table. This is useful when looping.

As in the lower-left corner of the diagram, suppose you inserted the Bucket1,bucket2,bucket3 three elements in turn:

When inserting the BUCKET1, the hash table is empty, and the hash is positioned to the slot with index 1. At this point the 1 slots are only one element Bucket1. Where Bucket1 's pdata or pdataptr point to the data stored by BUCKET1. Because there is no link relationship at this time. Pnext, plast,plistnext,plistlast pointers are empty. The first element pointer of the entire hash table and the last element pointer are also saved in the Hashtable structure, where the Hashtable plisthead and Plisttail pointers point to Bucket1.

When inserting Bucket2, the Bucket2 is placed in front of the double linked list because of a conflict between the key of the Bucket2 and the Bucket1 key. Since Bucket2 is inserted and placed at the front end of the list, the Bucket2.pnext points to Bucket1, which is inserted after Bucket2. Bucket1.plistnext points to Bucket2, then Bucket2 is the last element of the hash table, which is hashtable.plisttail pointing to Bucket2.

3. Insert Bucket3, the key does not hash to slot 1, then bucket2.plistnext points to Bucket3, because after Bucket3 insert. At the same time hashtable.plisttail to point Bucket3.

In simple terms, the bucket structure of the hash table maintains the sequence of inserted elements in the hash tables, which maintains the head and tail of the entire Hashtable. The relationship between budgets is always maintained during the operation of the hash table.

Operation Interface of hash table

Initialization operations, such as the Zend_hash_init () function, for initializing a hash table interface, allocating space, and so on.

Find, insert, delete, and update the Operation interface, which is a more general operation.

Iterations and loops, which are used to loop through the hash table.

Copy, sort, invert, and destroy operations.

    This section selects the insert operation to introduce. In PHP, whether it is an array of additions (Zend_hash_add) or an array of update operations (Zend_hash_update), it is ultimately called _zend_hash_add_or_update function completion, This corresponds to the structure of two public methods and a public private method in object-oriented programming to achieve some degree of code reuse.
Zend_api int _zend_hash_add_or_update (HashTable *ht, const char *arkey, uint nkeylength, void *pdata, uint ndatasize, void **pdest, int flag zend_file_line_dc) {//...
    Initialization of ellipses and exception handling of nkeylength <=0 h = zend_inline_hash_func (Arkey, nkeylength);

    nindex = h & ht->ntablemask;
    p = ht->arbuckets[nindex]; while (P!= NULL) {if (p->h = h) && (p->nkeylength = = nkeylength)) {if!memcmp (P-&G
                T;arkey, Arkey, nkeylength) {//Update operation if (flag & Hash_add) {return failure;

                } handle_block_interruptions (); //..
                Omit the debug output if (ht->pdestructor) {ht->pdestructor (p->pdata);
                Update_data (HT, p, PData, ndatasize);
                if (pdest) {*pdest = p->pdata;
                } handle_unblock_interruptions (); Return SUCCESS;
    }} p = p->pnext;
    p = (Bucket *) pemalloc (sizeof (Bucket)-1 + nkeylength, ht->persistent);
    if (!p) {return failure;
    } memcpy (P->arkey, Arkey, nkeylength);
    P->nkeylength = Nkeylength;
    Init_data (HT, p, PData, ndatasize);
    P->h = h; Connect_to_bucket_dllist (P, Ht->arbuckets[nindex]);
    Bucket bidirectional linked list operation if (pdest) {*pdest = p->pdata;
    } handle_block_interruptions ();    Connect_to_global_dllist (P, HT);
    Add the new bucket element to the last face of the linked table of the array ht->arbuckets[nindex] = p;

    Handle_unblock_interruptions ();
    ht->nnumofelements++;        Zend_hash_if_full_do_resize (HT); /* If the size of the array is full, expand it.
* * return SUCCESS; }

The entire write or update operation process is as follows:

Generates a hash value that is executed and manipulated with the ntablemask to obtain the bucket in the arbuckets array.

If an element already exists in the bucket, traverse the entire bucket to find out if the same key value element exists and, if so, an update call, perform the update data operation.

Creates a new BUCKET element, initializes the data, and adds the new element to the front of the BUCKET list corresponding to the current hash value (connect_to_bucket_dllist).

Adds the new bucket element to the last face of the linked table (connect_to_global_dllist) of the array.

Add the number of elements to 1, and if the array is full at this time, enlarge it. The judgment here is based on the size of nnumofelements and ntablesize. If nnumofelements > Ntablesize will invoke zend_hash_do_resize to expand in 2X (Ntablesize << 1).

Resources:

Http://nikic.github.com/2012/03/28/Understanding-PHPs-internal-array-implementation.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.