PHP source of the internal implementation of the array ____php

Source: Internet
Author: User
Tags prev zend
the internal implementation of the PHP source array
Hash Table

Basically, everything in PHP is a hash table. Not only in the following PHP array implementations, they are also used to store object properties, methods, functions, variables, and almost everything.

Because the hash table is too basic for PHP, it's worth digging deeper into how it works. What is a hash table

Remember, in C, the array is a block of memory, and you can access the blocks of memory by subscript. Therefore, the array in C can only use integers and ordered key values (that is, you cannot use the 1332423442 key value after the key value 0). c There is no associative array of these things.

Hash tables are things like this: they use hash functions to convert string key values to normal integer key values. The result of the hash can be used as a key value for the normal C array (also known as a memory block). The problem now is that the hash function has a conflict, which means that multiple string key values may generate the same hash value. For example, in PHP, an array of more than 64 elements, the string "foo" and "Oof" have the same hash value.

This problem can be stored in a linked list by storing potentially conflicting values, rather than storing the value directly into the generated subscript. Hashtable and Bucket

typedef struct _HASHTABLE {
UINT Ntablesize;
UINT Ntablemask;
UINT Nnumofelements;
ULONG Nnextfreeelement;
Bucket Pinternalpointer;
Bucket Plisthead;
Bucket Plisttail;
Bucket *arbuckets;
dtor_func_t Pdestructor;
Zend_bool persistent;
unsigned char napplycount;
Zend_bool bapplyprotection;
if zend_debug int inconsistent;
} HashTable; Nnumofelements
Identifies the number of values that are now stored in the array. This is also the return value of the function Count Ntablesize
Represents the capacity of a hash table. It is usually the next power value greater than or equal to Nnumofelements 2. For example, if the array stores 32 elements, then the hash table is 32 size. But if one more element is added, that is, the array now has 33 elements, the hash table's capacity is adjusted to 64. This is to keep the hash table always valid in space and time. Obviously, if the hash table is too small, there will be a lot of conflicts and performance will be reduced. On the other hand, if the hash table is too large, it wastes memory. A power value of 2 is a good compromise. Ntablemask
Is the capacity of the hash table minus one. This mask is used to adjust the generated hash value based on the current table size. For example, the true hash value of "foo" (using the djbx33a hash function) is 193491849. If we now have a hash table with 64 capacity, we obviously can't use it as the subscript for the array. Instead, use the mask of the hash table and then take the low level of the hash table.
Hash | 193491849 | 0b1011100010000111001110001001
& Mask | & 63 | & 0b0000000000000000000000111111
= Index | = 9 | = 0b0000000000000000000000001001 Nnextfreeelement
Is the next available numeric key value, when you use $array[] = xyz is used to. Pinternalpointer
Stores the current location of the array. This value can be accessed using reset (), current (), key (), Next (), Prev (), and End () on the foreach traversal. Plisthead and Plisttail
Identifies the position of the first and last elements of the array. Remember: The PHP array is an ordered collection. For example, [' foo ' => ' bar ', ' Bar ' => ' foo '] and [' Bar ' => ' foo ', ' foo ' => ' bar '] These two arrays contain the same elements, but in different order. Arbuckets
Is what we often talk about "hash table (internal C array)". It is defined with bucket * so it can be viewed as an array of bucket pointers (we'll talk about what bucket is right away). Pdestructor
Is the destructor for the value. If a value is removed from the HT, the function is invoked. The common destructor is zval_ptr_dtor. Zval_ptr_dtor will reduce the number of zval references and, if it encounters O, it will destroy and release it.

typedef struct BUCKET {
ULONG H;
UINT Nkeylength;
void PData;
void Pdataptr;
struct bucket plistnext;
struct bucket plistlast;
struct bucket pnext;
struct bucket pLast;
const char *arkey;
} Bucket;

H
is a hash value (the value before the Mask value mapping is applied).

Arkey
Used to hold the string key value.

Nkeylength
is the corresponding length. If it is a numeric key value, neither of these variables will be used.

PData
And

Pdataptr
is used to store the real value. For a PHP array, its value is a zval structure (but it is also used elsewhere). Don't obsess over why there are two attributes. The difference between them is who is responsible for releasing the value.

Plistnext
And

Plistlast
Identifies the next element and the previous element of the array element. If PHP wants to traverse the array sequentially, it will start with the bucket (inside the hashtable structure), and then use Plistnext bucket as the traversal pointer plisthead. The same is true in reverse order, starting with the Plisttail pointer, and then using the Plistlast pointer as the variable pointer. (You can call end () in the user's code and call the Prev () function to achieve this effect.) )

Pnext
and PLast
Generate the "Potentially conflicting value list" I mentioned above. The Arbucket array stores the bucket of the first possible value. If the bucket does not have the correct key value, PHP looks for the bucket that the pnext points to. It will always point to the back of the bucket until you find the right bucket. Plast is also the same principle in reverse order.

As you can see, PHP's hash table implementation is quite complex. This is the price it pays to use a super flexible array type. How the hash table is used.

Zend engine defines a large number of API functions for use in hash tables. A low-level hash table function preview can be
Found inside the zend_hash.h file. In addition Zend engine defines a slightly more advanced API in zend_api.h files.

We don't have enough time to talk about all the functions, but we can at least look at some instance functions to see how it works. We will use Array_fill_keys as an instance function.

Using the techniques mentioned in the second section you can easily find a function in the
The ext/standard/array.c file is defined inside. Now let's take a quick look at this function.
Like most functions, the top of a function has a definition of a bunch of variables, and then calls the Zend_parse_parameters
Function:

Zval Keys, Val, **entry;
Hashposition POS;
if (Zend_parse_parameters (Zend_num_args () tsrmls_cc, "Az", &keys, &val) = = failure) {
Return
}

Obviously, the AZ parameter indicates that the first argument type is an array (that is, the variable keys), and the second argument is any zval (that is, the variable val).

When the parameter is parsed, the returned array is initialized:

Array_init_size (Return_value,zend_hash_num_elements (z_arrval_p (keys));

This line contains the three important parts of the array API:

Z_arrval_p macros extract values from the Zval to the hash table.

Zend_hash_num_elements extracts the number of hash table elements (Nnumofelements property).

Array_init_size Initializes an array using the size variable.

Therefore, this line initializes the array to the Return_value variable with the same size as an array of key values.

The size here is just an optimization scheme. Function can also call only the
Array_init (Return_value) so that as more and more elements are added to the array, PHP resets the size of the array multiple times. By specifying a specific size, PHP allocates the correct memory space at the outset.
After the array is initialized and returned, the function uses the same code structure as the following, using the while loop variable keys array:

ZEND_HASH_INTERNAL_POINTER_RESET_EX (Z_arrval_p (keys), &pos);
while (ZEND_HASH_GET_CURRENT_DATA_EX (z_arrval_p (keys), (void * *) &entry, &pos) = = SUCCESS) {
ZEND_HASH_MOVE_FORWARD_EX (Z_arrval_p (keys), &pos);
}

This can be easily translated into PHP code:

Reset ($keys);
while (null!== $entry = current ($keys)) {
Next ($keys);
}

Just like the following:

foreach ($keys as $entry) {
Some code
}

The only difference is that C's traversal does not use an internal array pointer, but uses its own POS variable to store the current location.

The code inside the loop is divided into two branches: one for numeric keys and another for other key values. The branch of a numeric key value has only two lines of code:

Zval_add_ref (&val);
Zend_hash_index_update (Z_arrval_p (Return_value),
Z_LVAL_PP (Entry), &val,
sizeof (Zval *), NULL);

This looks too straightforward: first the reference to the value increases (adding a value to the hash table means adding another reference to it), and the value is inserted into the hash table. The parameters of the Zend_hash_index_update macro are the hash table z_arrval_p (return_value) that need to be updated, and the integer subscript
Z_LVAL_PP (entry), value &val, value size sizeof (Zval *), and target pointer (which we are not concerned about, therefore null).

A branch that is not a digital subscript is slightly more complicated:

Zval key, key_ptr = entry;
if (z_type_pp (entry)!= is_string) {
key = *entry;
Zval_copy_ctor (&key);
Convert_to_string (&key);
Key_ptr = &key;
}
Zval_add_ref (&val);
Zend_symtable_update (Z_arrval_p (Return_value), z_strval_p (key_ptr), Z_strlen_p (key_ptr) + 1, &val, sizeof (Zval), NULL);
if (key_ptr!= *entry) {
Zval_dtor (&key);
}

First, use convert_to_string to convert the key value to a string (unless it is already a string). Before that, entry is copied to the new key variable. Key = **entry this line implementation. Other than that
Zval_copy_ctor functions are invoked, otherwise complex structures (such as strings or arrays) are not copied correctly.

The above copy operation is very necessary because it is guaranteed that the type conversion does not change the original array. Without the copy operation, casting not only modifies local variables, but also modifies values in the array of key values (obviously, this is very unexpected for the user).

Obviously, after the loop completes, the copy operation needs to be removed again, Zval_dtor (&key)
That's the job. The difference between Zval_ptr_dtor and Zval_dtor is that zval_ptr_dtor only destroys zval variables at the refcount variable 0 o'clock, and Zval_dtor destroys it immediately, rather than relying on
The value of the refcount. That's why you see Zval_pte_dtor using the "normal" variable and zval_dtor
Using temporary variables, these temporary variables are not used elsewhere. Moreover, Zval_ptr_dtor
will release the contents of the zval after the destruction and Zval_dtor will not. Because we do not have malloc () anything, so we do not need free (), so in this respect, Zval_dtor made the right choice.

Now look at the two remaining lines (important two lines ^ ^):

Zval_add_ref (&val);
Zend_symtable_update (Z_arrval_p (Return_value), z_strval_p (key_ptr), Z_strlen_p (key_ptr) + 1, &val, sizeof (Zval *) , NULL);

This is very similar to the operation of a numeric key branch when it is completed. The difference is that the call is now
Zend_symtable_update instead of zend_hash_index_update, passing the key-value string and its length.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.