Detailed description on constructing HashTable by Array in PHP

Source: Internet
Author: User
Detailed description of the Array structure HashTable in PHP. we know that the Array in PHP is stored in a Hash structure internally. This article mainly analyzes and records the static and dynamic structures of arrays in PHP. The static structure here refers to the HashTable structure of the Array structure in PHP when storing Array data in PHP.

We know that the Array in PHP is stored internally in the Hash structure. This article mainly analyzes and records the static and dynamic structures of arrays in PHP.

The static structure here refers to the data structure used to store Array data in PHP, that is, HashTable.

A dynamic structure refers to the storage status of Array data when a program is running.

?

First, the hashTable structure in PHP is as follows:

typedef struct bucket {    ulong h;                        /* Used for numeric indexing */    uint nKeyLength;    void *pData;    void *pDataPtr;    struct bucket *pListNext;    struct bucket *pListLast;    struct bucket *pNext;    struct bucket *pLast;    char *arKey;} Bucket;typedef struct _hashtable {    uint nTableSize;    uint nTableMask;    uint nNumOfElements;    ulong nNextFreeElement;    Bucket *pInternalPointer;   /* Used for element traversal */    Bucket *pListHead;    Bucket *pListTail;    Bucket **arBuckets; ? ? ? ? ?    dtor_func_t pDestructor;    zend_bool persistent;    unsigned char nApplyCount;    zend_bool bApplyProtection;#if ZEND_DEBUG    int inconsistent;#endif} HashTable;

?

An Array in PHP corresponds to a HashTable internally. the pointer data of the four Bucket types in HashTable records the addresses of the elements actually stored in the Array. The specific content and field names can be self-interpreted.

?

?

If you only read these lines of code, you may not be able to understand the actual working principle of the PHP array. Next, we can manually simulate some of the simplest operations in the PHP array.

?

1. from scratch

To initialize HashTable, you must first construct a memory space for a HashTable. the code is as follows:

?

// Hash_func_t is not used in the function. the hash function is a fixed int _ zend_hash_init (HashTable * ht, uint nSize, hash_func_t pHashFunction, extends pDestructor, zend_bool persistent listener) in the PHP range) {uint I = 3; SET_INCONSISTENT (HT_ OK); if (nSize> = 0x80000000) {/* prevent overflow */ht-> nTableSize = 0x80000000 ;} else {while (1U <I) <nSize) {I ++;} ht-> nTableSize = 1 <I;} ht-> nTableMask = 0; /* 0 means that Ht-> arBuckets is uninitialized */ht-> pDestructor = pDestructor; ht-> arBuckets = (Bucket **) & uninitialized_bucket ;? // The actual data storage space has not been created with ht-> pListHead = NULL; ht-> pListTail = NULL; ht-> nNumOfElements = 0 ;? ? ? ? ? ? ? ? ? // Indicates that no element exists in the array. ht-> nNextFreeElement = 0; ht-> pInternalPointer = NULL; ht-> persistent = persistent; ht-> nApplyCount = 0; ht-> bApplyProtection = 1; return SUCCESS ;}
?

The code above can be understood as constructing a general gate for the array, and data can all enter the corresponding memory block through this gate. Of course, there is no "seat" in the door.

?

2. data insertion

How can I add something to an empty space? This is the data insertion, that is, how the data is saved to this HashTable.

The PHP array index can be a numeric value or a string. the code is as follows:

Int _ encode (HashTable * ht, const char * arKey, uint nKeyLength, void * pData, uint nDataSize, void ** pDest, int flag ZEND_FILE_LINE_DC) {ulong h; uint nIndex; bucket * p; IS_CONSISTENT (ht); if (nKeyLength <= 0) {# if ZEND_DEBUGZEND_PUTS ("zend_hash_update: Can't put in empty key \ n "); # endifreturn FAILURE;} CHECK_INIT (ht );?????????????? // Check whether the array space initializes h = zend_inline_hash_func (arKey, nKeyLength); // calculate the hash value of the string index nIndex = h & ht-> nTableMask; p = ht-> arBuckets [nIndex]; while (p! = NULL) {if (p-> arKey = arKey | (p-> h = h) & (p-> nKeyLength = nKeyLength )&&! Memcmp (p-> arKey, arKey, nKeyLength) {if (flag & HASH_ADD) {return FAILURE;} HANDLE_BLOCK_INTERRUPTIONS (); # if ZEND_DEBUGif (p-> pData = pData) {ZEND_PUTS ("Fatal error in zend_hash_update: p-> pData = pData \ n"); HANDLE_UNBLOCK_INTERRUPTIONS (); return FAILURE; }# endifif (ht-> pDestructor) {ht-> pDestructor (p-> pData);} UPDATE_DATA (ht, p, pData, nDataSize); if (pDest) {* pDest = p-> pData;} HANDLE_UNBLOCK_INTERR UPTIONS (); return SUCCESS ;? // Exit directly after Update} p = p-> pNext;} if (IS_INTERNED (arKey) {p = (Bucket *) pemalloc (sizeof (Bucket ), ht-> persistent); if (! P) {return FAILURE;} p-> arKey = (char *) arKey;} else {p = (Bucket *) pemalloc (sizeof (Bucket) + nKeyLength, ht-> persistent); if (! P) {return FAILURE;} p-> arKey = (char *) (p + 1); memcpy (p-> arKey, arKey, nKeyLength );} p-> nKeyLength = nKeyLength; INIT_DATA (ht, p, pData, nDataSize); p-> h = h; CONNECT_TO_BUCKET_DLLIST (p, ht-> arBuckets [nIndex]); if (pDest) {* pDest = p-> pData;} HANDLE_BLOCK_INTERRUPTIONS (); CONNECT_TO_GLOBAL_DLLIST (p, ht); ht-> arBuckets [nIndex] = p; blocks (); ht-> nNumOfElements ++; ZEND_HASH_IF_FULL_DO_RESIZE (ht);/* If the Hash table is full, resize it */return SUCCESS ;}

First, check whether the array space is initialized. the code is as follows:

?

#define CHECK_INIT(ht) do {                                             \    if (UNEXPECTED((ht)->nTableMask == 0)) {                                \        (ht)->arBuckets = (Bucket **) pecalloc((ht)->nTableSize, sizeof(Bucket *), (ht)->persistent);   \        (ht)->nTableMask = (ht)->nTableSize - 1;                        \    }                                                                   \} while (0)
?

?

Then, calculate the hash value of the string index to be inserted, and perform bitwise and with nTableMask to obtain the nindex. this nIndex is the offset of the corresponding bucket * in the two-dimensional array arBucket. According to the code logic, if the nIndex position is not empty, it indicates that the calculated hash value exists. If the key is the same and the flag is HASH_ADD, the update operation fails. The update operation does not affect the existing array structure. after updating the corresponding value, exit directly.

?

When a new element needs to be inserted to HashTable, the newly constructed element is chained to the HashTable in two steps.

?

The code for the first step is as follows:

?

#define CONNECT_TO_BUCKET_DLLIST(element, list_head)        \    (element)->pNext = (list_head);                         \    (element)->pLast = NULL;                                \    if ((element)->pNext) {                                 \        (element)->pNext->pLast = (element);                \    }

?

In this step, if the hash value of the key of the new element exists before, the list_head is HashTable. arBucket [nIndex], and the nIndex has been mentioned before. After this step, you will assign HashTable. arBucket [nIndex] to the current new element.

?

If the hash corresponding to the key of the new element does not exist before, the list_head is NULL because HashTable. arBucket [nIndex] is NULL. You also know.

?

The code for step 2 is as follows:

?

#define CONNECT_TO_GLOBAL_DLLIST(element, ht)               \    (element)->pListLast = (ht)->pListTail;                 \    (ht)->pListTail = (element);                            \    (element)->pListNext = NULL;                            \    if ((element)->pListLast != NULL) {                     \        (element)->pListLast->pListNext = (element);        \    }                                                       \    if (!(ht)->pListHead) {                                 \        (ht)->pListHead = (element);                        \    }                                                       \    if ((ht)->pInternalPointer == NULL) {                   \        (ht)->pInternalPointer = (element);                 \    }
?

For more information about the impact of this step on HashTable content, see the following dynamic example. I believe you also know.

?

?

?

Dynamic example:

Now, we assume that the array does not contain any elements, so we can insert them. Now, follow the code logic to manually simulate the data insertion process:

?

1.

Insert the first element A. assume that the hash value of its key is 1.

After insertion, the status in the memory is as follows:

?

HashTable. arBucket [1] =;

HashTable. pListHead =

HashTable. pListTail =

HashTable. pInternalPointer =

A. pNext = null

A. pLast = null

A. pListLast = null

A. pListNext = null

?

2.

Insert the second element B. assume that the hash value of the key is 2.

The memory status after insertion is as follows:

HashTable. arBucket [2] = B;

HashTable. pListHead =

HashTable. pListTail = B

HashTable. pInternalPointer = ?????? // This is only set at the first time

A. pNext = null

A. pLast = null

A. pListNext = B

A. pListLast = null

B. pListLast =

B. pListNext = null

B. pNext = null

B. pLast = null

?

3.

Insert the third element C. assume that the hash value of the key is 1, which is the same as that of.

The memory status after insertion is as follows:

HashTable. arBucket [1] = C;

HashTable. pListHead =

HashTable. pListTail = C

HashTable. pInternalPointer = ?????? // This is only set at the first time

A. pNext = null

A. pLast = C

A. pListNext = B

A. pListLast = null

?

B. pNext = null

B. pLast = null

B. pListLast =

B. pListNext = C

C. pNext =

C. pLast = null

C. pListNext = null

C. pListLast = B

?

The memory status after the values A, B, and C are inserted is:

HashTable. arBucket [1] = C;

HashTable. pListHead =

HashTable. pListTail = C

HashTable. pInternalPointer =

A. pNext = null

A. pLast = C

A. pListNext = B

A. pListLast = null

?

B. pNext = null

B. pLast = null

B. pListLast =

B. pListNext = C

C. pNext =

C. pLast = null

C. pListNext = null

C. pListLast = B

?

OK, the elements A, B, and C have been inserted. now we need to implement two tasks:

?

1.

Search for the element value of a key ):

If we want to access element A, provide the key: key_a of element A and obtain the corresponding hash value 1.

Then find HastTable. arBucket [1]. In this case, HastTable. arBucket [1] is actually C rather than A, but since the key of C is not equal to the key of A, you need to find it along the pointer of pNext until NULL, and C. pNext is A, that is, the value A corresponding to key_a is found.

In short, when a key is used to search for an element, hash is required first, and then the pNext pointer at the index position after hash is searched until NULL, if the value is the same as the key to be searched, it is found; otherwise, it cannot be found.

?

2.

Traverse the array:

Because the key in our example is of the string type, for cannot be used for all loop traversal. Only foreach can be used. how can we implement foreach traversal?

?

Simple. based on the final HashTable state, we can find it from HastTable. pListHead in the order of pListNext pointers. Taking the example in this article as an example, the result is:

?

?

HashTable. pListHead ===>

A. pListNext ?????????????????? ====> B

B. pListNext ?????????????????? ====> C

?

The final traversal order is A, B, and C. it is found that the traversal order of foreach is related to the order in which elements are inserted into the array.

?

?

If the key of the inserted element is not a string, it is a numerical value. You can skip this step to calculate the hash value and use the key of the value as the hash value.

In this way, there will be no hash conflicts, so that the pNext and pLast pointers of each element will not be used. both pointers will only be NULL.

?

In this way, we can use the for loop to traverse the array, because there is no hash conflict.

?

Similarly, if we use foreach to traverse the array, the traversal order is still the element insertion order. you certainly know this.

?

?

Ps:

This article does not fully record the hash knot in zend, but only analyzes and demonstrates the key logic code involved in the topic of this article. At the same time, in order to grasp the key points. Some codes are not listed, such as the logic of re-hash and the code for indexing numeric data. The details can be found in the code file Zend/zend_hash.c.

?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.