In the previous article to introduce the variable in the PHP7 internal implementation (a), this article continues to introduce PHP7 internal implementation of relevant knowledge, interested friends through this article study together.
The first and second sections of this article are translated from the blogs of Nikita Popov (member of the Nikic,php official development group, students at the University of Science and Technology, Berlin). In order to better conform to the Chinese reading habits, the text does not translate verbatim.
To understand this article, you should have some understanding of the implementation of variables in PHP5, the focus of this article is to explain the zval changes in PHP7.
The first part is about the realization and change of the most basic variables in PHP5 and PHP7. Here again, the main change is that Zval no longer allocates memory alone and does not store reference counts on its own. Simple types, such as Integer float, are stored directly in the Zval. A complex type points to a separate struct by its pointer.
Complex Zval data values have a common header whose structure is defined by the zend_refcounted:
struct _zend_refcounted {uint32_t refcount; union { struct { zend_endian_lohi_3 ( zend_uchar type, Zend_uchar flags, uint16_t gc_info) } v; uint32_t Type_info; } u;};
This header stores the RefCount (reference count), the type of the value, and the related information about the Recycle Gc_info as well as the type flag bit flags.
The implementation of each complex type is then analyzed separately and compared to the implementation of PHP5. Although the reference is also a complex type, the previous section has already been introduced, which is not covered here. In addition, the resource type is not mentioned here (because the author thinks there is nothing to say about the resource type).
String
A new struct zend_string is defined in PHP7 to store string variables:
struct _zend_string {zend_refcounted gc; Zend_ulong H; /* Hash value */size_t len; char val[1];};
In addition to the header of the reference count, the string also contains the hash cache h, the string length len, and the value of the string Val. The existence of a hash cache is to prevent a key that uses a string as hashtable from repeatedly calculating its hash value when it is found, so this is initialized before it is used.
If you don't know much about C, you might find Val's definition somewhat odd: the declaration has only one element, but obviously we want to store a string that is definitely more than the length of a single character. This is actually a "black" approach to the struct: only one element is defined when the array is declared, but when the zend_string is actually created, it allocates enough memory to store the entire string. This way we can still access the full string via Val.
Of course, this is an unconventional implementation, because we actually read and write more than the bounds of the single-character array. But the C compiler doesn't know what you're doing. Although C99 also explicitly supported the "flexible array", but thanks to our good friend Microsoft, no one can guarantee the consistency of C99 on different platforms (so this is to solve the problem of support for flexible arrays under the Windows platform).
The structure of the new string type is more convenient to use than the native C string: The first is because the length of the string is stored directly, so that it does not have to be computed each time it is used. The second is that the string also has a reference count header, so that you can share the string itself in different places without using Zval. A common place to use is to share Hashtable key.
But the new string type also has a very bad place: although it is convenient to remove the C string from zend_string (using Str->val), but conversely, if the C string into zend_string, you need to first allocate the zend_string required Memory, and then copy the string into the zend_string. This is not very convenient in the actual use of the process.
The string also has some unique flags (stored in the GC's flag bit):
#define Is_str_persistent (1<<0)/* Allocated using malloc */#define is_str_interned (1<<1)/* Interned String */#define Is_str_permanent (1<<2)/* Interned string surviving request boundary */
The persisted string requires memory to be allocated directly from the system itself rather than the Zend Memory Manager (ZMM) so that it can persist and not be valid only in a single request. Marking this particular assignment makes it easy for zval to use persisted strings. This is not handled in PHP5, but is copied to ZMM before use.
The reserved character (interned strings) is a bit special, it persists until the request is finished, so there is no reference count. The reserved string is also non-repeatable (duplicate), so it is also checked for the presence of the same character when creating a new reserved character. All the immutable strings in the PHP source are reserved characters (including string constants, variable name function names, etc.). The persisted string is also a reserved character that has been created before the request begins. However, the normal reserved character is destroyed at the end of the request, and the persisted string is always present.
If Opcache is used, reserved characters are stored in shared memory (SHM) so that they can be shared across all PHP processes. In this case, the persisted string is meaningless because the reserved characters are not destroyed.
Array
Because the previous article has talked about a new array implementation, it is not described in detail here. Although some recent changes have caused the previous descriptions to be inaccurate, the basic concepts are consistent.
Here is an array-related concept that was not mentioned in the previous article: immutable groups. It is essentially similar to reserved characters: There is no reference count and persists until the request ends (and may also exist after the request has ended).
Because of the convenience of some memory management, the immutable array will only be used when the Opcache is turned on. Let's take a look at the actual use of the example, first look at the following script:
<?phpfor ($i = 0; $i < 1000000; + + $i) {$array [] = [' foo '];} Var_dump (Memory_get_usage ());
When Opcache is turned on, the above code will use 32MB of memory, without opening the case because $array each element will copy a [' foo '], so 390MB is required. The reason for full replication instead of increasing the reference count value here is to prevent shared memory errors when the Zend virtual machine operator executes. I hope that the problem of memory explosion will be improved when I don't use Opcache.
Objects in the PHP5
In understanding the object implementation line in PHP7 we'll take a look at the PHP5 and see what the efficiency problem is. The Zval in PHP5 stores a zend_object_value structure, which is defined as follows:
typedef struct _ZEND_OBJECT_VALUE {zend_object_handle handle; const zend_object_handlers *handlers;} zend_object_value ;
Handle is the unique ID of an object that can be used to find object data. Handles is a virtual function table pointer that holds various property methods for an object. PHP objects typically have the same handler table, but objects created by PHP extensions can also be customized for their behavior through operator overloading.
The object handle (handler) is an array of storage containers (buckets) that are used as an index for "object storage", and the bucket is defined as follows:
typedef struct _ZEND_OBJECT_STORE_BUCKET {zend_bool destructor_called; zend_bool valid; Zend_uchar apply_count; Union _s tore_bucket { struct _store_object { void *object; zend_objects_store_dtor_t dtor; zend_objects_free_object_storage_t Free_storage; zend_objects_store_clone_t clone; Const Zend_object_handlers *handlers; Zend_uint RefCount; Gc_root_buffer *buffered; } obj; struct { int next; } free_list;} bucket;} zend_object_store_bucket;
This structure contains a lot of things. The first three members are just plain metadata (whether the object's destructor has been called, whether Bucke has been used, and how many times the object has been called recursively). The next union is used to differentiate whether the bucket is in use or idle state. The most important structure in the above is the struct _store_object sub-structure body:
The first member object is a pointer to the actual object (that is, where the object is ultimately stored). The object is not actually embedded directly into the bucket of the object store, because the object is not fixed-length. Below the object pointer are three operation handles (handler) for managing object destruction, Deallocation, and cloning. It is important to note that PHP destroys and frees objects in different steps, which in some cases may be skipped (not fully released). Cloning operations are practically almost never used, because the operations contained here are not part of the normal object itself, so (at any time) they will be copied (duplicate) one copy rather than shared in each object.
These object store operation handles are followed by a normal object handlers pointer. This data is stored because it is sometimes possible to destroy objects when Zval is unknown (typically these operations are for zval).
Buckets also contain refcount fields, but this behavior is somewhat odd in PHP5, because zval itself already stores reference counts. Why do you need an extra count? The problem is that while the "copy" behavior of Zval is generally simple to increase the reference count, there are occasional deep replication situations, such as creating a new zval but saving the same zend_object_value. In this case, two different zval use the same object bucket, so the bucket itself also needs to be reference counted. This "double counting" approach is an inherent problem in PHP5 implementation. The buffered pointer in the GC root buffer is also required for full replication (duplicate) for the same reason.
Now look at the structure of the actual object that the pointer is pointing to in the objects store, and typically the user-level object is defined as follows:
typedef struct _ZEND_OBJECT {zend_class_entry *ce; HashTable *properties; Zval **properties_table; HashTable *guards;} Zend_object;
The Zend_class_entry pointer points to the class prototype of the object implementation. The next two elements are to store object properties in different ways. Dynamic properties (which are added at run time instead of defined in a class) all exist in properties, but simply match the property name and value.
However, there is an optimization for declared properties: Each property is assigned an index during compilation and the property itself is stored in the properties_table index. The match for the property name and index is stored in the hashtable of the class prototype. This prevents each object from using more memory than the Hashtable limit, and the index of the property has multiple caches at run time.
Guards's hash table is used to implement the recursive behavior of magic methods, such as __get, which we do not discuss in depth.
In addition to the double counting problem mentioned above, one of the problems with this implementation is that a minimal object with only one attribute also requires 136 bytes of memory (this is not zval required memory). And there are many indirect access actions in the middle: for example, to take an element from an object Zval, you first need to remove the object storage bucket, then the Zend object, and then you can find the object property sheet and Zval through the pointer. This way there is at least 4 layers of indirect access (and may require a minimum of seven layers in actual use).
Objects in the PHP7
The implementation of PHP7 attempts to address these issues, including removing the double reference count, reducing memory usage, and indirect access. The new zend_object structure is as follows:
struct _zend_object {zend_refcounted gc; uint32_t handle; zend_class_entry *ce; Const zend_object_handlers * handlers; HashTable *properties; zval properties_table[1];};
As you can see, the structure is almost all of an object: Zend_object_value has been replaced with a pointer to object and object storage, although not completely removed, it is a big boost.
In addition to the usual zend_refcounted head in PHP7, the handlers of handle and objects are now placed in Zend_object. The properties_table also uses the C-struct technique, so that the Zend_object and the property sheet get a whole chunk of memory. Of course, the property sheet is now embedded directly in the zval instead of the pointer.
Now that there is no guards table in the object structure, the value of this field will be stored in the first bit of properties_table if needed, that is, when using methods such as __get. However, if the Magic method is not used, the Guards table will be omitted.
Dtor, Free_storage, and clone three operation handles are stored in the object operation Bucket and are now directly present in the handlers table, with the structure defined as follows:
struct _zend_object_handlers {/* offset of real object header (usually zero) */int offset;/* General Object Functio NS */zend_object_free_obj_t free_obj; zend_object_dtor_obj_t dtor_obj; zend_object_clone_obj_t Clone_ Obj /* Individual Object Functions *///... rest is on the same in PHP 5};
The first member of the handler table is offset, which is obviously not an action handle. This offset is a must in the present implementation, because although the internal objects are always embedded in the standard Zend_object, there is always a need to add some members in. The way to solve this problem in PHP5 is to add some content to the standard object behind it:
struct Custom_object {zend_object std; uint32_t something;//...};
This way, if you can easily add zend_object* to the struct custom_object*. This is also commonly used in C language structure inheritance practices. However, there is a problem with this implementation in PHP7: Because Zend_object uses the technique of structure hack when storing the attribute table, the PHP properties stored in the Zend_object end overwrite the internal members that are added later. So the implementation of PHP7 adds the members you add to the front of the standard object structure:
struct Custom_object {uint32_t something;//... zend_object std;};
This means, however, that it is not possible to simply convert the zend_object* and struct custom_object*, since both are separated by an offset. So this offset needs to be stored in the first element of the object handler table, so that the specific offset value can be determined at compile time through the Offsetof () macro.
Perhaps you will be curious now that you have stored the Zend_object pointer directly (in Zend_value), now you do not need to go to the object store to find the object, why the PHP7 object still retains the handle field?
This is because object storage is still present, and although it has been greatly simplified, it is still necessary to retain the handle. Now it's just an array of pointers to objects. When an object is created, a pointer is inserted into the object store and its index is saved in handle, and the index is removed when the object is disposed.
So why do we need object storage now? Because a node is present at the end of the request, it is not safe to execute the user code after that and fetch the pointer data. To avoid this scenario, PHP executes destructors for all objects on earlier nodes and then no longer has such operations, so a list of active objects is required.
And handle is also useful for debugging, which allows each object to have a unique ID, so it's easy to tell if two objects are the same or just have the same content. Although HHVM has no concept of object storage, it also saves the object's handle.
Compared to PHP5, there is only one reference count in the implementation now (Zval itself does not count), and the memory usage is greatly reduced: 40 bytes for the base object, 16 bytes for each property, and this is after the zval. There has also been a significant improvement in the situation of indirect access, as the structure of the middle tier is either removed or directly embedded, so now reading an attribute has only one layer of access and no longer four layers.
Indirect zval
By now we have basically mentioned all the normal zval types, but there are a couple of special types for certain situations, one of which is PHP7 newly added Is_indirect.
Indirect zval means that the real value is stored somewhere else. Note that this is_reference type is different, and the indirect zval is directly pointing to another zval instead of embedding zval like a zend_reference struct.
To understand when this happens, let's take a look at the implementation of variables in PHP (in fact, the same is true for object property storage).
All variables known during compilation are assigned an index and their values are placed in the corresponding position in the compilation variable (CV) table. But PHP also allows you to dynamically reference variables, whether local or global (such as $GLOBALS), whenever this happens, PHP creates a symbol table for the script or function that contains the mapping between the variable names and their values.
But the question is: how can you achieve simultaneous access to two tables? We need to be able to access the normal variables in the CV table, and we need to be able to access the compilation variables in the symbol table. In PHP5, the CV table uses a double pointer zval**, usually pointing to the zval* table in the middle, and zval* the final point is the actual zval:
+------cv_ptr_ptr[0]| +----Cv_ptr_ptr[1]| | +--cv_ptr_ptr[2]| | | | | +-> Cv_ptr[0]-some zval| +---> cv_ptr[1]-some zval+-----> cv_ptr[2]-some zval
When you need to use the symbol table, the intermediate table that stores the zval* is not used, and the zval** pointer is updated to the Hashtable buckets response location. We assume that there are three variables for $a, $b, and $c, and the following is simple:
Cv_ptr_ptr[0]-symboltable["a"].pdataptr--some zvalcv_ptr_ptr[1]--symboltable["B"].pdataptr-- Some zvalcv_ptr_ptr[2]--symboltable[' C '].pdataptr--some zval
But there is no such problem in PHP7 's usage, because Hashtable buckets fail when the size of the hashtable in PHP7 changes. So PHP7 used the opposite strategy: in order to access the variables stored in the CV table, the symbol table stores INDIRECT to point to the CV table. The CV table is not reassigned during the life cycle of the symbol table, so there is no problem with invalid pointers.
So if you have a function and you have $a, $b, and $c in the CV table, and there is a dynamically assigned variable $d, the structure of the symbol table might look like this:
Symboltable["a"].value = INDIRECT to cv[0] = LONG 42symboltable["b"].value = INDIRECT--cv[1] = DOUBLE 42.0Symbol table["C"].value = INDIRECT to cv[2] = STRING-to-zend_string ("the") symboltable["D"].value = ARRAY-to-Zend_array ([4, 2])
Indirect zval can also be a pointer to the IS_UNDEF type Zval, which occurs when the Hashtable does not have a key associated with it. Therefore, when you use unset ($a) to mark the type of cv[0] as UNDEF, you determine that the symbol table does not have data with a key value of a.
Constants and AST
There are also two special types of is_constant and is_constant_ast that need to be said to be present in PHP5 and PHP7. To understand them, let's look at the following example:
<?phpfunction Test ($a = ANSWER, $b = ANSWER * ANSWER) {return $a + $b;} Define (' ANSWER '); Var_dump (Test ()); Int (42 + 42 * 42) ·
The default value of the two parameters of the test () function is made up of constant ANSWER, but the value of the function declaration constant is not yet defined. The specific value of a constant is only known when it is defined by define ().
Because of these problems, the default values for parameters and properties, constants, and other things that accept "static expressions" support "delay binding" until first use.
Constants (or static properties of a class) These need to be "time-bound" data is the most commonly used in the Is_constant type Zval place. If this value is an expression, the zval of the Is_constant_ast type is used to point to the abstract syntax tree (AST) of the expression.
We're done here. Analysis of the implementation of variables in PHP7. I might also write two articles later to introduce some virtual machine optimizations, new naming conventions, and some optimizations for the compiler infrastructure (this is the author's exact story).