In the previous article, we introduced the implementation of variables in PHP7 (I). This article will continue to introduce the internal implementation knowledge of php7, if you are interested, learn through this article. In the previous article, we introduced the implementation of variables in PHP7 (I). This article will continue to introduce you to the internal implementation of php7, if you are interested, learn through this article.
The first and second sections of this article are translated from the blog of Nikita Popov (nikic, member of the PHP Official Development Team, a student from the Berlin University of Science and Technology. In order to better conform to Chinese reading habits, this document does not translate words by words.
To understand this article, you should have some knowledge about the implementation of variables in PHP5. The focus of this article is to explain the zval changes in PHP 7.
The first part describes the basic implementation and changes of variables in PHP5 and PHP7. Repeat it here. The major change is that zval no longer allocates memory separately, but does not store reference counts by itself. Simple types such as integer float are directly stored in zval. Complex types point to an independent struct through pointers.
The complex zval data value has a common header, and its structure is defined by zend_refcounted:
struct _zend_refcounted { uint32_t refcount; union { struct { ZEND_ENDIAN_LOHI_3( zend_uchar type, zend_uchar flags, uint16_t gc_info) } v; uint32_t type_info; } u;};
This header stores the refcount (reference count), the type of the value, the information gc_info related to loop collection, and the type flag flags.
Next, the implementation of each complex type is analyzed separately and compared with the implementation of PHP5. References are also complex types, but the previous section has already been introduced, so we will not repeat them here. In addition, the resource type is not mentioned here (because the author thinks there is nothing to say about the resource type ).
String
PHP7 defines a new struct zend_string used to store string variables:
struct _zend_string { zend_refcounted gc; zend_ulong h; /* hash value */ size_t len; char val[1];};
In addition to the reference count header, the string also contains the hash cache h, String Length len, and string value val. The existence of the hash cache is to prevent the hash value from being repeatedly calculated when the string is used as the key of hashtable. Therefore, it is initialized before use.
If you do not have a deep understanding of the C language, you may find the val definition somewhat strange: this declaration has only one element, but obviously, the reimbursement of the string we want to store must be greater than the length of one character. Here we actually use a "black" method of the struct: only one element is defined when the array is declared, but enough memory is allocated when zend_string is actually created to store the entire string. In this way, we can still access the complete string through val.
Of course, this is an unconventional implementation method, because the actual read and write content exceeds the boundary of a single character array. But the C-language compiler does not know that you are doing this. Although C99 has clearly defined support for "flexible arrays", thanks to our good friend Microsoft, no one can ensure C99 consistency on different platforms (so this method is used to solve the support problem of flexible Arrays on Windows platforms ).
The new string structure is more convenient to use than the native C string: the first is because the length of the string is directly stored, so that it does not need to be calculated every time it is used. The second is that the string also has a reference counter header, so that you can share the string itself in different places without using zval. Share the key of hashtable.
However, the new string type is also quite bad: although it is easy to retrieve the C string from zend_string (use str-> val), but in turn, if you change the C string to zend_string, You need to allocate the memory required by zend_string, and then copy the string to zend_string. This is not very convenient in actual use.
The character string also has some unique signs (stored in the GC flag bit ):
#define IS_STR_PERSISTENT (1<<0) /* allocated using malloc */#define IS_STR_INTERNED (1<<1) /* interned string */#define IS_STR_PERMANENT (1<<2) /* interned string surviving request boundary */
The memory required by the persistent string is directly allocated from the system rather than zend Memory Manager (ZMM), so that it can always exist rather than be valid only in a single request. Mark this special allocation so that zval can use persistent strings. This is not the case in PHP5. It is copied to ZMM before use.
The reserved character (interned strings) is a bit special. It will exist until the end of the request, so no reference count is required. Retained strings cannot be repeated (duplicate). Therefore, when creating a new retained character, the system checks whether the same character already exists. All the unchangeable strings in PHP source code are reserved characters (including string constants, variable name letters, and so on ). A persistent string is also a reserved character that has been created before the request starts. However, normal reserved characters will be destroyed after the request ends, but persistent strings always exist.
If opcache is used, the reserved characters will be stored in the shared memory (SHM) so that they can be shared among all PHP processes. In this case, the persistence string has no meaning, because the reserved characters will not be destroyed.
Array
Because the new array implementation has been mentioned in the previous article, we will not describe it in detail here. Although the previous descriptions are not very accurate due to some recent changes, the basic concepts are consistent.
Here we will talk about the array-related concepts not mentioned in the previous article: immutable arrays. Essentially, it is similar to the reserved characters: there is no reference count and it exists until the request ends (or it may exist after the request ends ).
For some convenience of memory management, immutable arrays are only used when opcache is enabled. Let's take a look at the actual example, first look at the following script:
<?phpfor ($i = 0; $i < 1000000; ++$i) { $array[] = ['foo'];}var_dump(memory_get_usage());
When opcache is enabled, the above Code uses 32 MB memory. If not enabled, because $ array copies a ['foo'] for each element, 390 MB is required. Here we will perform a complete copy instead of increasing the reference count value because it prevents Shared Memory Errors During zend VM operator execution. I hope that the problem of memory surges without using opcache will be improved in the future.
Objects in PHP5
To understand the implementation of PHP 7 objects in a straight line, let's take a look at PHP5 and see the efficiency issues. Zval in PHP5 stores a zend_object_value structure, which is defined as follows:
typedef struct _zend_object_value { zend_object_handle handle; const zend_object_handlers *handlers;} zend_object_value;
Handle is the unique ID of an object and can be used to search for object data. Handles is a virtual function table pointer that saves various object attribute methods. Generally, PHP objects have the same handler table, but PHP extension objects can also be customized through Operator overloading.
The object handle (handler) is used as an index for "Object Storage". The object storage itself is an array of storage containers (buckets). The bucket is defined as follows:
typedef struct _zend_object_store_bucket { zend_bool destructor_called; zend_bool valid; zend_uchar apply_count; union _store_bucket { struct _store_object { void *object; zend_objects_store_dtor_t dtor; zend_objects_free_object_storage_t free_storage; zend_objects_store_clone_t clone; const zend_object_handlers *handlers; zend_uint refcount; gc_root_buffer *buffered; } obj; struct { int next; } free_list; } bucket;} zend_object_store_bucket;
This struct contains many things. The first three members are just some common metadata (whether the object's destructor has been called, whether the bucke has been used, and how many times the object has been called recursively ). The following consortium is used to identify whether a bucket is in use or idle state. The most important of the above structure is the struct _ store_object sub-struct:
The first member object is a pointer to the actual object (that is, the final storage location of the object. Objects are not directly embedded in the bucket of OSS because they are not fixed length. The following are three handler operations for managing object destruction, release, and cloning ). Note that PHP destroys and releases objects in different steps. The former may be skipped (not completely released) in some cases ). In fact, the clone operation is almost never used, because the operation contained here is not a part of the normal object itself, so (Anytime) They will be separately copied in each object (duplicate) one copy instead of sharing.
These OSS operation handles are followed by a normal object handlers pointer. This data is stored because objects may be destroyed when zval is unknown (these operations are generally performed on zval ).
The bucket also contains the refcount field, but this behavior is somewhat strange in PHP5, because zval itself has stored the reference count. Why do we still need an extra count? The problem is that, although zval's "copy" behavior generally simply increases the reference count, there may also be occasional deep replication, for example, create a new zval and save the same zend_object_value. In this case, two different zval buckets use the same object storage bucket, so the bucket itself also needs to reference and count. This "double count" approach is inherent in the implementation of PHP5. The buffered pointer in the GC root buffer must be fully replicated for the same reason (duplicate ).
Now let's look at the structure of the actual object pointed to by the pointer in Oss. Generally, the object definition at the user level is as follows:
typedef struct _zend_object { zend_class_entry *ce; HashTable *properties; zval **properties_table; HashTable *guards;} zend_object;
The zend_class_entry Pointer Points to the class prototype of the object implementation. The following two elements store object attributes in different ways. All dynamic attributes (added at runtime rather than defined in the class) exist in properties, but only simple matching of attribute names and values.
However, here is an optimization for declared attributes: during compilation, each attribute is specified with an index and the attribute itself is stored in the properties_table index. Matching of attribute names and indexes is stored in hashtable of the class prototype. In this way, the memory usage of each object is prevented from exceeding the hashtable limit, and the attribute index will be cached in multiple places at runtime.
The hash table of guards is used to implement recursive behavior of magic methods, such as _ get. We will not discuss it in detail here.
In addition to the double count mentioned above, this implementation also has a problem that a minimum object with only one attribute also needs 136 bytes of memory (which is not counted as the memory required by zval ). In addition, there are many indirect access actions in the middle: for example, to retrieve an element from the object zval, You need to first retrieve the object Storage bucket, then the zend object, then, the object Attribute Table and zval can be found through the pointer. In this way, there are at least four levels of indirect access (and at least seven layers may be required in actual use ).
Objects in PHP 7
PHP 7 tries to solve the above problems, including removing the double reference count, reducing memory usage, and indirect access. The new zend_object struct is as follows:
struct _zend_object { zend_refcounted gc; uint32_t handle; zend_class_entry *ce; const zend_object_handlers *handlers; HashTable *properties; zval properties_table[1];};
It can be seen that this struct is almost all of the content of an object: zend_object_value has been replaced with a pointer directly pointing to the object and Object Storage, although not completely removed, but it has greatly improved.
In addition to the commonly used zend_refcounted header in PHP7, handlers of handle and object are also placed in zend_object. The properties_table here also uses the C struct tips, so that zend_object and Attribute Table will get a whole block of memory. Of course, attribute tables are directly embedded in zval instead of pointers.
Now there is no guards table in the object struct. If necessary, the value of this field will be stored in the first place of properties_table, that is, when _ get and other methods are used. However, if the magic method is not used, the guards table will be omitted.
The three operation handles dtor, free_storage, and clone were previously stored in the object operation bucket. Now they exist directly in the handlers table. Their struct definitions are as follows:
struct _zend_object_handlers { /* offset of real object header (usually zero) */ int offset; /* general object functions */ zend_object_free_obj_t free_obj; zend_object_dtor_obj_t dtor_obj; zend_object_clone_obj_t clone_obj; /* inpidual object functions */ // ... rest is about the same in PHP 5};
The first member of the handler table is offset, which is obviously not an operation handle. This offset must exist in the current implementation, because although the internal object is always embedded in the standard zend_object, there is always a need to add some members. To solve this problem in PHP5, add some content to the standard object:
struct custom_object { zend_object std; uint32_t something; // ...};
In this way, you can easily add zend_object * To struct custom_object. This is also a common method of Structure Inheritance in C language. However, this implementation in PHP 7 has a problem: Because zend_object uses the structured hack technique when storing attribute tables, the PHP attribute stored at the end of zend_object will overwrite the internal members added later. Therefore, the PHP 7 Implementation adds the added members to the front of the standard object structure:
struct custom_object { uint32_t something; // ... zend_object std;};
However, this means that you cannot directly convert zend_object * and struct custom_object *, because both of them are separated by an offset. Therefore, this offset needs to be stored in the first element of the handler table of the object, so that the specific offset value can be determined through the offsetof () Macro during compilation.
Maybe you will be curious that since zend_object pointer has been stored directly (in zend_value) Now, you do not need to search for objects in the object storage, why do PHP 7 objects retain the handle field?
This is because the object storage service still exists. Although it is greatly simplified, it is necessary to keep handle. Now it is just an array of pointers to objects. When an object is created, a pointer is inserted into the object storage and its indexes are saved in handle. When the object is released, the index is also removed.
So why do we still need object storage? Because a node exists at the end of the request, it is not safe to execute the user code and retrieve the pointer data. To avoid this situation, PHP will execute the destructor of all objects on earlier nodes and will no longer have such operations, so a list of active objects is required.
Handle is also very useful for debugging. It gives each object a unique ID, so it is easy to distinguish whether two objects are the same or only have the same content. Although HHVM does not have the concept of object storage, it also stores the handle of the object.
Compared with PHP5, there is only one reference count in the current implementation (zval does not count itself), and the memory usage has been greatly reduced: 40 bytes for basic objects, each attribute requires 16 bytes, and this is still after zval. Indirect access has also been significantly improved, because the structure of the intermediate layer is removed or directly embedded. Therefore, only one layer of access to an attribute is read, instead of four layers.
Indirect zval
Now we have mentioned all the normal zval types, but there are also a pair of special types for some specific cases, one of which is the newly added IS_INDIRECT in PHP7.
Indirect zval means that the actual value is stored elsewhere. Note that the IS_REFERENCE type is different. Indirect zval points directly to another zval instead of embedding zval like the zend_reference struct.
To understand when this happens, let's take a look at the implementation of variables in PHP (in fact, the storage of object properties is the same ).
All variables known during compilation are specified with an index and their values are stored in the corresponding position of the compiled variable (CV) table. However, PHP also allows you to dynamically reference variables, whether local variables or global variables (such as $ GLOBALS). In this case, PHP creates a symbol table for scripts or functions, this includes the ing between the variable names and their values.
But the question is: how can we achieve simultaneous access to two tables? We need to be able to access common variables in the CV table and compile variables in the symbol table. In PHP5, the CV table uses the dual pointer zval **. Generally, these pointers point to the zval * table in the middle. zval * finally points to the actual zval:
+------ CV_ptr_ptr[0]| +---- CV_ptr_ptr[1]| | +-- CV_ptr_ptr[2]| | || | +-> CV_ptr[0] --> some zval| +---> CV_ptr[1] --> some zval+-----> CV_ptr[2] --> some zval
When you need to use a symbolic table, the intermediate table storing zval * is actually useless, and the zval ** pointer will be updated to the hashtable buckets response position. Assume that there are three variables $ a, $ B, and $ c. The following is simple:
CV_ptr_ptr[0] --> SymbolTable["a"].pDataPtr --> some zvalCV_ptr_ptr[1] --> SymbolTable["b"].pDataPtr --> some zvalCV_ptr_ptr[2] --> SymbolTable["c"].pDataPtr --> some zval
However, this problem does not occur in PHP 7 usage, because the hashtable bucket becomes invalid when the hashtable size in PHP 7 changes. Therefore, PHP7 adopts the opposite policy: to access the variables stored in the CV table, INDIRECT is stored in the symbol table to direct to the CV table. The CV table will not be re-allocated within the life cycle of the symbol table, so there will be no invalid pointers.
Therefore, you have a function and $ a, $ B, and $ c in the CV table, and a dynamically allocated variable $ d. The structure of the symbol table looks like this:
SymbolTable["a"].value = INDIRECT --> CV[0] = LONG 42SymbolTable["b"].value = INDIRECT --> CV[1] = DOUBLE 42.0SymbolTable["c"].value = INDIRECT --> CV[2] = STRING --> zend_string("42")SymbolTable["d"].value = ARRAY --> zend_array([4, 2])
Indirect zval can also be a pointer to zval of the IS_UNDEF type. This happens when hashtable does not have a key associated with it. Therefore, when you use unset ($ a) to mark the type of CV [0] As UNDEF, it will determine that the symbol table does not have data whose key value is.
Constant and AST
There are also two special types of IS_CONSTANT and IS_CONSTANT_AST in PHP5 and PHP7. To learn about them, let's take a look at the following example:
<?phpfunction test($a = ANSWER, $b = ANSWER * ANSWER) { return $a + $b;}define('ANSWER', 42);var_dump(test()); // int(42 + 42 * 42)·
The default values of the two parameters of the test () function are composed of the constant ANSWER, but the constant value of the function declaration is not defined yet. The specific value of a constant is only known when it is defined by define.
Due to the above problem, the default values, constants, and other parameters and attributes that accept the "static expression" support "delayed binding" until the first use.
Constants (or static attributes of classes) that require "delayed binding" are the places where IS_CONSTANT type zval is most often used. If the value is an expression, zval of the IS_CONSTANT_AST type is used to point to the abstract syntax tree (AST) of the expression ).
Here we end the analysis of variable implementation in PHP 7. I may write two more articles later to introduce some virtual machine optimization, new naming conventions, and some compiler infrastructure optimization content (this is the author's original saying ).