Structure and definition of PHP arrays

Source: Internet
Author: User

Arrays are a very powerful and flexible type of data in PHP, and its underlying implementation is a hash table (HashTable, also known as a hash table)

A hash table is a data structure that is accessed directly from a key value, and there is a mapping function between its key-value, which can be indexed directly to the corresponding value value by a mapping function based on key, which is not based on the comparison of the keywords. Using direct addressing technology (that is, it is directly mapped to the memory address via key), thus speeding up the search speed, in the ideal case, without any comparison can find the unknown origin keyword, look for the expected time is O (1).

The array that holds the record is called a hash table, this array is used to store value, and the value is stored in the array is determined by the mapping function according to key calculation, the mapping function can be modeled, key can be used for example, "times 33" algorithm to get an integer value, Then modulo the total size of the array to get the storage location in the hash table. This is a common hash table implementation, the implementation of the PHP hash is the whole idea, but there are a few special places, the following is the PHP hashtable data structure:

1 struct_zend_array {2Zend_refcounted_h GC;//Reference Count3 Union {4         struct {5 Zend_endian_lohi_4 (6 Zend_uchar Flags,7 Zend_uchar Napplycount,8 Zend_uchar Niteratorscount,9 Zend_uchar consistency)Ten } V; One uint32_t flags; A } u; -uint32_t Ntablemask;//hash value calculation mask, equal to negative value of ntablesize (Ntablemask =-ntablesize) -Bucket *ardata;//stores an array of elements, pointing to the first bucket theuint32_t nnumused;//number of buckets used -uint32_t nnumofelements;//hash table number of valid elements -uint32_t ntablesize;//The total size of the hash table, which is 2 of the n-th square - uint32_t Ninternalpointer; +Zend_long nnextfreeelement;//the next available numeric index, such as: arr[] = 1;arr["a"] = 2;arr[] = 3; Then nnextfreeelement = 2; - dtor_func_t Pdestructor; +};

There are two very similar values in Hashtable:, nNumUsed nNumOfElements , nNumOfElements indicates that the hash table already has the number of elements, then this value is not the nNumUsed same? Why would you define a two? In fact, they have different meanings, when an element is deleted from the hash table does not remove the corresponding bucket, but the bucket storage zval modified to IS_UNDEF , only when the expansion of the nnumofelements and nnumused to find a certain amount of difference (this quantity is : ht->nNumUsed - ht->nNumOfElements > (ht->nNumOfElements >> 5) ) Removes all deleted elements and re-constructs the hash table. So nNumUsed >=nNumOfElements

Another very important value in Hashtable, which arData points to the first bucket that stores an array of elements, inserting the elements sequentially in order, such as the first element in Ardata[0], and the second in ardata[1]...ardata[ Nnumused]. The order of the PHP array is arData guaranteed, which is the first one that differs from the normal hash list implementation.

Since Ardata is not a hash list by key, how does the mapping function map the key to the value in Ardata?

In fact, the hash list is also in arData , more specifically, the hash list before ht->ardata memory, the hash table allocates memory with the bucket array is allocated, Ardata backward to the bucket array start position, not the beginning of the application memory, Such a hash list can be accessed by the Ardata pointer forward, i.e. ARDATA[-1], ardata[-2], ardata[-3] ... The structure of a hash table is uint32_t that it holds the position of value in the bucket array.

So, overall, Hashtable mainly relies on ardata to implement the storage and indexing of elements. Inserting an element first inserts the element in the bucket array in a sequential order, the position is IDX, and then maps to a location in the hash table according to the hash value of key nindex, and the idx is stored in this position; When the lookup is first mapped to nindex in the hash table, Get the value in the bucket array position idx, and then remove the element from the bucket array.

mapping Functions (that is, hash function) is a key part of the hash table, it maps key with value, the general mapping function can be modeled according to the hash value of key and bucket array size, that is key->h % ht->nTableSize , but PHP does not do this:

NIndex = Key->h | ht->ntablemask;

It is obvious that bit operations are faster than modulo.

nTableMaskis a nTableSize negative number, that is: nTableMask = -nTableSize because nTableSize equals 2^n, so the nTableMask right side of bits is all 0, it is guaranteed that the nindex falls within the range of the array index ( |nIndex| <= nTableSize ):

hash collisions mean that different keys may calculate the same hash value (the hash value of the numeric index is directly the value itself), but these values need to be inserted into the same hash list. The general solution is to string the buckets into a linked list, looking at the link list to compare key.

The same is true of PHP, which simply points the pointer of the list to the conversion in order to point to the value, that is, the pointer to the conflicting element does not exist directly in the bucket, but is saved zval in value:

1 struct_zval_struct {2Zend_value value;/*value*/3     ...4 Union {5 uint32_t var_flags; 6 uint32_t next; /* Hash collision Chain (Hashed collision chain) */7uint32_t Cache_slot;/*literal cache slot*/8uint32_t Lineno;/*Line number (for AST nodes)*/9uint32_t Num_args;/*arguments number for EX (this)*/Tenuint32_t Fe_pos;/*foreach Position*/ Oneuint32_t Fe_iter_idx;/*foreach iterator index*/ A } U2; -};

When a conflict occurs, the location of the original value is saved to the new value zval.u2.next , and then the location of the newly inserted value is updated to the hash table, which is where the conflicting value is always inserted in the header

Structure of stored elements in an array

1 struct _bucket {2     zval              // stores the specific value, where a zval is embedded instead of a pointer 3     Zend _ulong        H;   // key based on times 33 computed hash value, or numeric index number 4     Zend_string      // storage element of key5 } buckets;

Structure and definition of PHP arrays

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.