In the previous article to introduce the variable in the PHP7 internal implementation (i), this article continues to introduce PHP7 internal realization of relevant knowledge, interested friends through this article to study together.
The first and second translations of this article are from Nikita Popov (nikic,php, a member of the official development Group, a student at the Berlin University of Science and Technology). In order to better fit the Chinese reading habits, the text does not translate verbatim.
To understand this article, you should have some understanding of the implementation of variables in PHP5, the focus of this article is to explain the zval changes in PHP7.
The first part is about the realization and change of the basic variables in PHP5 and PHP7. Again, the main change is that Zval no longer allocates memory separately, and does not store reference counts on its own. Simple types such as integral floating-point type are stored directly in the Zval. A complex type points to an independent structure by using the pointer.
Complex Zval data values have a common header whose structure is defined by zend_refcounted:
struct _zend_refcounted {
uint32_t refcount;
Union {
struct {
zend_endian_lohi_3 (
zend_uchar type,
zend_uchar flags,
uint16_t gc_info)
} V;
uint32_t type_info;
} u;
};
This header stores the RefCount (reference count), the type of the value and the associated information for the recycle gc_info, and the type flags bit flags.
The implementation of each complex type is then individually analyzed and compared with the implementation of the PHP5. Although the reference is also a complex type, the previous section has been introduced, and here is no longer to repeat. In addition, the resource type is not mentioned here (because the author feels there is nothing to say about the resource type).
String
PHP7 defines a new struct body zend_string used to store string variables:
struct _zend_string {
zend_refcounted gc;
Zend_ulong H; /* Hash Value * *
size_t len;
Char val[1];
In addition to the header of the reference count, the string contains the hash cache h, Len of the string length, and the value of the string Val. The existence of a hash cache is intended to prevent a key that uses a string as hashtable to compute its hash value repeatedly when it is searched, so this is initialized before it is used.
If you don't know much about C, you might find the definition of Val a little odd: there is only one element to this statement, but obviously the string we want to store is more than the length of a character. This is actually a "black" approach to the structure: When you declare an array, you define only one element, but when you actually create the zend_string, you allocate enough memory to store the entire string. So we can still access the full string through Val.
This is, of course, an unconventional means of implementation, as we actually read and write beyond the bounds of single character arrays. But the C language compiler does not know that you are doing so. Although C99 has clearly defined the support of "flexible array", but thanks to our good friend Microsoft, no one can be on different platforms to ensure C99 consistency (so this is to solve the Windows platform flexible array support problem).
The new string type structure is easier to use than the native C string: The first is because the length of the string is stored directly so that it does not have to be computed every time it is used. The second is that the string also has a reference count head, so that the string itself can be shared in different places without using Zval. A common place to use is to share Hashtable key.
But the new string type also has a very bad place: although it is easy to remove the C string from the zend_string (using Str->val), but conversely, if the C string into zend_string, you need to first assign the zend_string required Memory, and then copy the string into the zend_string. This is not very convenient in the actual use of the process.
The string also has some unique flags (stored in the GC's flag bit):
#define Is_str_persistent (1<<0)/* Allocated using malloc *
/#define is_str_interned (1<<1)/ * Interned string
/#define Is_str_permanent (1<<2)/* Interned string surviving request boundary * *
The persisted string of memory needs to be allocated directly from the system itself rather than the Zend Memory Manager (ZMM) so that it can exist instead of being valid only in a single request. Marking this particular assignment makes it easy for zval to use a persisted string. This is not done in PHP5, and is copied to the zmm before use.
Reserved characters (interned strings) are somewhat special, and will persist until the end of the request is destroyed, so there is no need for reference counting. The reserved string is also not repeatable (duplicate), so the new reserved character is also checked first to see if the same character already exists. All the immutable strings in the PHP source code are reserved characters (including string constants, variable name function names, and so on). The persisted string is also a reserved character that was created before the request was started. However, normal reserved characters are destroyed after the request is finished, but persistent strings are always present.
If Opcache is used, reserved characters are stored in shared memory (SHM) so that they can be shared in all PHP processes. In this case, the persisted string is meaningless because the reserved character is not destroyed.
Array
Because the previous article has talked about the new array implementation, so this is no longer described in detail. Although some recent changes have led to the previous description is not very accurate, but the basic concept is consistent.
Here is an array-related concept that is not mentioned in the previous article: an immutable group. It is essentially similar to a reserved character: There is no reference count and persists until the end of the request (and possibly after the request has ended).
Because of the convenience of some memory management, the immutable group will only be used when opening opcache. Let's take a look at the actual use of the example, first look at the following script:
<?php for
($i = 0; $i < 1000000 + + $i) {
$array [] = [' foo '];
}
Var_dump (Memory_get_usage ());
When the Opcache is turned on, the above code will use 32MB of memory, do not open because $array each element will be copied a copy [' foo '], so need 390MB. The reason for the full replication here instead of increasing the reference count value is to prevent shared memory errors when the Zend virtual machine operator executes. I hope that the problem of memory explosion without using Opcache can be improved later.
Objects in the PHP5
In understanding the objects in PHP7 to achieve a straight line let's take a look at the PHP5 and see what the efficiency problem is. The Zval in PHP5 stores a zend_object_value structure, which is defined as follows:
typedef struct _ZEND_OBJECT_VALUE {
zend_object_handle handle;
Const zend_object_handlers *handlers;
} Zend_object_value;
Handle is the unique ID of an object and can be used to find object data. Handles is a virtual function table pointer that holds various property methods for an object. Normally, PHP objects have the same handler table, but the object created by the PHP extension can also be customized by operator overloading.
An object handle (handler) is used as an index for "object storage", and the object store itself is an array of storage containers (bucket), bucket defined as follows:
typedef struct _ZEND_OBJECT_STORE_BUCKET {
zend_bool destructor_called;
Zend_bool valid;
Zend_uchar Apply_count;
Union _store_bucket {
struct _store_object {
void *object;
zend_objects_store_dtor_t dtor;
zend_objects_free_object_storage_t Free_storage;
zend_objects_store_clone_t clone;
Const Zend_object_handlers *handlers;
Zend_uint RefCount;
Gc_root_buffer *buffered;
} obj;
struct {
int next;} free_list;}
bucket;
} zend_object_store_bucket;
This structure contains a lot of things. The first three members are just plain metadata (whether the object's destructor was invoked, whether the Bucke was used, and how many times the object was called recursively). The next consortium is used to distinguish between the state in use or the idle state of the bucket. The most important of the above structure is the struct _store_object substructure:
The first member object is a pointer to the actual object (that is, the location of the object's final store). objects are not actually embedded directly into the bucket of object storage, because objects are not fixed-length. Below the object pointer are three action handles (handler) for managing object destruction, release, and cloning. Note that PHP destroys and frees objects in a different step, which in some cases may be skipped (not fully released). Cloning is virtually impossible to use because the operations contained here are not part of the ordinary object itself, so (at any point) they are copied (duplicate) in each object individually rather than shared.
These objects store an action handle followed by a normal object handlers pointer. This data is stored because it is sometimes possible to destroy objects in the event of unknown zval (these operations are typically done against zval).
Bucket also contains refcount fields, but this behavior is somewhat odd in PHP5, because the zval itself has stored reference counts. Why do I need an extra count? The problem is that although Zval's "copy" behavior is typically a simple addition to the reference count, there are occasional deep replication situations, such as creating a new zval but preserving the same zend_object_value. In this case, two different zval use the same object to store the bucket, so bucket itself also need to do reference counting. This "double counting" approach is an intrinsic problem of PHP5 implementation. The buffered pointer in the GC root buffer is also required for full replication (duplicate) for the same reason.
Now look at the structure of the actual object that the pointer points to in the objects store, typically the user-level object is defined as follows:
typedef struct _ZEND_OBJECT {
zend_class_entry *ce;
HashTable *properties;
Zval **properties_table;
HashTable *guards;
} Zend_object;
The Zend_class_entry pointer points to the class prototype of the object implementation. The next two elements are storing object properties in different ways. Dynamic properties (which are added at run time rather than defined in the class) all exist in properties, but only simple matches of property names and values.
There is, however, an optimization for declared attributes: Each attribute is assigned an index during compilation and the property itself is stored in the properties_table index. The matching of attribute names and indexes is stored in the hashtable of the class prototype. This prevents each object from using more memory than the Hashtable limit, and the index of the property is cached at run time.
The Guards hash table is used to implement the recursive behavior of the Magic method, such as __get, which we do not discuss in depth.
In addition to the double counting problem mentioned above, this implementation also has the problem that a minimal object with only one attribute also requires 136 bytes of memory (which is not the memory required for zval). And there are a lot of indirect access actions: for example, to take an element out of an object zval, you first need to remove the object store bucket, then Zend object, before you can find the object property sheet and Zval through the pointer. This allows at least 4 levels of indirect access (and may require a minimum of seven layers in actual use).
Objects in the PHP7
The implementation of PHP7 attempts to address these issues, including removing the double reference count, reducing memory usage, and indirect access. The new zend_object structure is as follows:
struct _zend_object {
zend_refcounted gc;
uint32_t handle;
Zend_class_entry *ce;
Const Zend_object_handlers *handlers;
HashTable *properties;
Zval properties_table[1];
You can see that the structure is now almost the entire content of an object: Zend_object_value has been replaced by a pointer to object and object storage, although not completely removed, but it has been a great boost.
In addition to the usual zend_refcounted heads in PHP7, the handlers of handle and objects are now placed in Zend_object. The properties_table also uses the C-struct trick so that the Zend_object and the property sheet get an entire block of memory. Of course, the property sheet is now embedded directly into the zval rather than the pointer.
Now that the guards table is not in the object structure, the value of the field is now stored in the first place in the properties_table, that is, using __get, and so on. But if the Magic method is not used, the Guards table will be omitted.
The Dtor, Free_storage, and clone three action handles are stored in the object operations bucket and are now directly present in the handlers table, whose structure is defined as follows:
struct _zend_object_handlers {
/* offset of real object header (usually zero) */
int offset;
/* General Object Functions * *
zend_object_free_obj_t free_obj;
zend_object_dtor_obj_t dtor_obj;
zend_object_clone_obj_t clone_obj;
/* Individual Object Functions *
//... rest is about the same in PHP 5
};
The first member of the handler table is offset, which is obviously not an action handle. This offset is a must in the present implementation, because although internal objects are always embedded in the standard Zend_object, there is always the need to add some members in. The way to solve this problem in PHP5 is to add something to the back of the standard object:
struct Custom_object {
zend_object std;
uint32_t something;
// ...
};
This way, if you can easily add zend_object* to the struct custom_object*. This is also commonly used in C language structure inheritance practices. But there is a problem with this implementation in PHP7: Because Zend_object uses the hack of the structure to store the property sheet, the PHP properties that are stored at the end of the Zend_object cover the internal members that are added later. So PHP7 's implementation adds its own members to the front of the standard object structure:
struct Custom_object {
uint32_t something;
// ...
Zend_object std;
};
This means, however, that it is not possible to simply convert the zend_object* and struct custom_object* now, since both are separated by an offset. So this offset needs to be stored in the first element of the object handler table, so that the specific offset value can be determined at compile time through the Offsetof () macro.
Maybe you'll be curious. Now that the Zend_object pointer is stored directly (in Zend_value), there is no need to find objects in the object store anymore, so why do PHP7 objects retain handle fields?
This is because object storage still exists, and although it has been greatly simplified, it is still necessary to preserve handle. Now it's just an array of pointers to objects. When an object is created, a pointer is inserted into the object store and its index is saved in handle, and the index is removed when the object is freed.
So why do we need object storage now? Because a node is present at the end of the request, it is not safe to execute the user code after that and fetch the pointer data. To avoid this scenario, PHP performs all the destructor of the object on the earlier node and then no longer has such operations, so a list of active objects is required.
And handle is also useful for debugging, which allows each object to have a unique ID, so it's easy to tell whether two objects are the same or just have the same content. Although HHVM has no concept of object storage, it also saves the handle of objects.
Compared to PHP5, there is only one reference count in the implementation now (Zval itself does not count), and memory usage is greatly reduced: 40 bytes for the underlying object, 16 bytes for each attribute, and this is after zval. The situation with indirect access has also improved significantly, as the structure of the middle tier is either removed or embedded directly, so reading a property now is only one level of access and no longer a four level.
Indirect zval
By now we have basically mentioned all the normal zval types, but there are a couple of special types for certain situations, one of which is PHP7 newly added Is_indirect.
Indirect zval means that the real value is stored elsewhere. Note that the is_reference type is different, and the indirect zval is directed to another zval rather than embedding zval like the zend_reference structure.
To understand when this is going to happen, let's take a look at the implementation of variables in PHP (in fact, the storage of object properties is the same).
All variables known during compilation are assigned an index and their values are placed in the corresponding location in the compilation variable (CV) table. But PHP also allows you to dynamically reference variables, whether local or global (such as $GLOBALS), and whenever this happens, PHP creates a symbol table for the script or function that contains the mapping between the variable names and their values.
But the question is: how do you achieve simultaneous access to both tables? We need to be able to access common variables in the CV table, and we need to be able to access the compiled variables in the symbol table. In PHP5, the CV table uses a double pointer zval**, which usually points to the zval* table in the middle, zval* the actual zval is the end point:
+------Cv_ptr_ptr[0]
| +----Cv_ptr_ptr[1] | |
+--cv_ptr_ptr[2] | | | |
| +-> cv_ptr[0]--> some Z Val
| +---> cv_ptr[1]--> some zval
+-----> cv_ptr[2]--> some zval
The intermediate tables that store zval* when you need to use the symbol table are not actually used, and the zval** pointer is updated to the response location of the Hashtable buckets. We assume that there are $a, $b, and $c three variables, and here's a simple schematic:
Cv_ptr_ptr[0]--> symboltable["A"].pdataptr--> some zval cv_ptr_ptr[1
]--> symboltable["B"].pDataPtr- > Some zval
cv_ptr_ptr[2]--> symboltable["C"].pdataptr-->, some zval
However, there is no such problem with the use of PHP7, because Hashtable bucket fails when the size of the hashtable in the PHP7 is changed. So PHP7 used a reverse strategy: to access the variables stored in the CV table, store the INDIRECT in the symbol table to point to the CV table. The CV table does not redistribute within the life cycle of the symbol table, so there is no problem with invalid pointers.
So join you have a function and there is $a, $b and $c in the CV table, and there is also a dynamically assigned variable $d, the structure of the symbol table looks like this:
Symboltable["a"].value = INDIRECT--> cv[0] = LONG
symboltable["b"].value = INDIRECT--> cv[1] = DOUBLE 42.0< c2/>symboltable["C"].value = INDIRECT--> cv[2] = STRING--> zend_string ("a")
symboltable["d"].value = ARRAY- -> Zend_array ([4, 2])
An indirect zval can also be a pointer to a is_undef type Zval, which occurs when Hashtable does not have a key associated with it. So when you use unset ($a) to mark the type of cv[0] as UNDEF, you determine that the symbol table does not have data with the key value of a.
Constants and AST
There are also two special types of is_constant and is_constant_ast that need to be mentioned in both PHP5 and PHP7. To get to know them, let's take a look at the following examples:
<?php
function Test ($a = ANSWER,
$b = ANSWER * ANSWER) {return
$a + $b;
}
Define (' ANSWER ',);
Var_dump (Test ()); Int (42 + 42 * 42) ·
The default values for the two parameters of the test () function are composed of constant ANSWER, but the value of the function declaration constant is not yet defined. The exact value of a constant is known only by the define () definition.
Because of the problems above, the default values for parameters and properties, constants, and other things that accept "static expressions" support "deferred binding" until it is first used.
Constants (or static properties of a class) these require "delayed binding" data that is most often needed to use the Is_constant type Zval. If this value is an expression, the zval of the Is_constant_ast type is used to point to the abstract syntax tree (AST) of the expression.
We're done here. Analysis of the implementation of variables in PHP7. I might also write two more articles on virtual machine optimizations, new naming conventions, and some optimizations for compiler infrastructure (which is the author's exact words).