The implementation of PHP7 variables in the interior

Source: Internet
Author: User
Tags foreach constant hash php code php script php source code unique id zend


To understand this article, you should have some understanding of the implementation of variables in PHP5, the focus of this article is to explain the zval changes in PHP7.
Due to a large number of detail descriptions, this article will be divided into two parts: the first part mainly describes the implementation of Zval (Zend value) in PHP5 and PHP7 and the implementation of the reference. The second section will analyze the details of the individual types (strings, objects).
The Zval in the PHP5

The zval structure in PHP5 is defined as follows:


typedef struct _ZVAL_STRUCT {


Zvalue_value value;


Zend_uint refcount__gc;


Zend_uchar type;


Zend_uchar is_ref__gc;


} Zval;


As above, Zval contains a value, a type, and a field of two __gc suffixes. Value is a federation that stores different types of values:


typedef Union _ZVALUE_VALUE {


Long lval; For type bool, integral type, and resource type


Double Dval; For floating-point types


struct {//For strings


Char *val;


int Len;


} str;


HashTable *ht; For arrays


Zend_object_value obj; For objects


Zend_ast *ast; For constant expressions (PHP5.6 only)


} Zvalue_value;


The C Language consortium features that only one member is valid at a time and that the allocated memory matches the member that requires the most memory (also consider memory alignment). All members are stored in the same location in memory, and different values are stored as needed. When you need to lval, it stores a signed shape, when you need Dval, you store double-precision floating-point numbers.


It should be noted that the data type currently stored in the federation is logged to the Type field, marked with an integral type:


#define IS_NULL 0/* doesn ' t use value * *


#define IS_LONG 1/* Uses lval * *


#define IS_DOUBLE 2/* Uses dval * *


#define IS_BOOL 3/* Uses Lval with values 0 and 1 * *


#define IS_ARRAY 4/* Uses HT * *


#define IS_OBJECT 5/* Uses obj */


#define IS_STRING 6/* Uses STR * *


#define IS_RESOURCE 7/* Uses Lval, which is the RESOURCE ID * * *

/* Special types used for late-binding of constants * *
#define Is_constant 8
#define IS_CONSTANT_AST 9
Reference count in PHP5

In PHP5, Zval memory is allocated separately from the heap (heap), and PHP needs to know which zval are in use and which ones need to be released. So this requires reference counting: The REFCOUNT__GC value in Zval is used to save the number of times the zval itself is referenced, such as $a = $b = 42 statement, 42 is referenced by two variables, so its reference count is 2. If the reference count turns to 0, it means that the variable is no longer in use and memory can be freed.
Note that the reference count mentioned here refers not to references in PHP code (using &), but to the number of times a variable is used. The next two need to be at the same time will use the "PHP reference" and "reference" to distinguish between the two concepts, here first ignore the PHP part. The
one concept that is closely related to reference counting is "write-time Replication": For multiple references, the zaval is shared only if there is no change, and once one of the references changes the value of the Zval, a copy ("separated") is required to duplicate the Zval, and then modify the replicated Zval.
The following is an example of "write-time Replication" and Zval destruction:
$a = 42;  //$a         -> Zval_1 (Type=is_long, value=42, refcount=1)
$b = $a;  //$a, $b     -> zval_1 (type= Is_long, value=42, refcount=2)
$c = $b;  //$a, $b, $c-> zval_1 (Type=is_long, value=42, refcount=3)

The following lines are about Zval separation.
$a + 1; $b, $c-> zval_1 (Type=is_long, value=42, refcount=2)
$a-> zval_2 (Type=is_long, value=43, Refcount=1)

Unset ($b); $c-> zval_1 (Type=is_long, value=42, Refcount=1)
$a-> zval_2 (Type=is_long, value=43, Refcount=1)

Unset ($c);//Zval_1 is destroyed, because Refcount=0
           //$a-> zval_2 (Type=is_long, value=43, refcount=1)
Reference count has a fatal problem: the circular reference (memory used) cannot be checked and freed. To solve this problem, PHP uses the method of recycling. When a zval count is reduced, it is possible to be part of the loop, where the Zval is written to the root buffer. When the buffer is full, the potential loops are marked and recycled.
Because to support recycling, the actual structure of the zval actually used is as follows:
typedef struct _ZVAL_GC_INFO {
    zval z;
  &NB Sp Union {
        gc_root_buffer       * buffered;
        struct _zval_gc_info *next;
   } u;
} zval_gc_info;
A normal zval structure is embedded in the ZVAL_GC_INFO structure, and two pointer parameters are added, but it is all part of the same union u, so only one pointer is useful in actual use. The buffered pointer is used to store the reference address of the zval in the root buffer, so if Zval has been destroyed before the loop is executed, the field may be removed. Next is used when retrieving the destroyed value, and this is not going to go deep.
Modify Motivation

Here's what it says about memory usage, which is all about 64-bit systems. First, because Str and obj occupy the same size, Zvalue_value This consortium consumes 16 bytes (bytes) of memory. The entire zval structure occupies 24 bytes (taking into account memory alignment) and the size of the Zval_gc_info is 32 bytes. In a comprehensive sense, the amount of memory allocated to Zval in the heap (relative to the stack) requires an additional 16 bytes, so each zval is required to use 48 bytes in a different place (to understand that the above calculations need to be aware that each pointer also needs to occupy 8 bytes on a 64-bit system).
In this regard, regardless of the aspects of the consideration can be considered zval this design efficiency is very low. For example, the Zval only needs 8 bytes when storing the integral type, even if it is necessary to save some additional information and memory alignment, an additional 8 bytes should be sufficient.
When you store an integral type, you actually need 16 bytes, but there are actually 16 bytes for reference counting and 16 bytes for recycling. Therefore, zval memory allocation and release is a very large operation, we need to optimize it.
Think from this perspective: does an integral type of data really need to store reference counts, recycle information, and allocate memory separately on the heap? The answer is of course no, this approach is not good at all.
Here is a summary of the main problems in the implementation of Zval in PHP5:
Zval always allocates memory separately from the heap;
Zval always store reference counts and recycled information, even for integers that may not require data of this type of information;
When using an object or resource, a direct reference causes two counts (the reason is in the next section);
Some indirect access requires a better approach. For example, now accessing objects stored in a variable uses an indirect four pointers (the length of the pointer chain is four). This issue is also discussed in the next section;
A direct count also means that values can only be shared between Zval. It is not possible to share a string between Zval and Hashtable key (unless Hashtable key is also Zval).
The Zval in the PHP7

In the PHP7, Zval has a new way of implementation. The most fundamental change is that the memory required for Zval is no longer allocated separately from the heap, and the reference count is no longer stored on its own. Reference counts for complex data types, such as strings, arrays, and objects, are stored by themselves. This implementation has the following benefits:


Simple data types do not need to allocate memory separately, nor do they need to be counted;


There will be no more than two times counting. In an object, only the count of the object's own storage is valid;


Since the count is now stored by the value itself, it can be shared with the zval structure, such as between Zval and Hashtable key;


The number of pointers required for indirect access has decreased.


Let's look at the definition of the ZVAL structure now (now in the Zend_types.h file):


struct _zval_struct {


Zend_value value; /* Value * *


Union {


struct {


Zend_endian_lohi_4 (


Zend_uchar type,/* Active type * *


Zend_uchar Type_flags,


Zend_uchar Const_flags,


Zend_uchar reserved)//Call info for EX (this) * *


} V;


uint32_t Type_info;


} U1;


Union {


uint32_t var_flags;


uint32_t Next; * * Hash Collision Chain * *


uint32_t Cache_slot; /* Literal cache slot * *


uint32_t Lineno; /* line number (for AST nodes) * *


uint32_t Num_args; /* Arguments number for EX (this) * *


uint32_t Fe_pos; /* foreach Position * *


uint32_t Fe_iter_idx; /* foreach Iterator Index * *


} U2;


};


The first element of the structure does not change much, and is still a value union. The second member is a union of the type information and a struct that contains four character variables (you can ignore the Zend_endian_lohi_4 macro, which is used to solve the problem across the platform size side). The more important part of this substructure is type (similar to the previous one) and type_flags, which is explained next.


There's a small problem with this place: value should be 8 bytes, but because of memory alignment, even just one byte is actually occupied by 16 bytes (which means that you need an extra 8 bytes in one byte). But obviously we don't need 8 bytes to store a type field, so we added a consortium called U2 at the back of the U1. is not available by default and can be used to store 4 bytes of data when used. This consortium can meet the needs of different scenarios.


The structure of value in PHP7 is defined as follows:


typedef Union _ZEND_VALUE {


Zend_long lval; /* Long Value * *


Double Dval; /* Double Value * *


Zend_refcounted *counted;


Zend_string *str;


Zend_array *arr;


Zend_object *obj;


Zend_resource *res;


Zend_reference *ref;


Zend_ast_ref *ast;


Zval *zv;


void *ptr;


Zend_class_entry *ce;


Zend_function *func;


struct {


uint32_t W1;


uint32_t W2;


WW;


} Zend_value;


The first thing to note is that the value consortium now needs 8 bytes instead of 16 of memory. It simply stores the integer (lval) or floating-point (dval) data, in other cases pointers (mentioned above, the pointer occupies 8 bytes and the bottom structure consists of two 4-byte unsigned integers). All of the above pointer types (except for special tags) have an identical header (zend_refcounted) to store the reference count:


typedef struct _ZEND_REFCOUNTED_H {


uint32_t RefCount; /* Reference Counter 32-bit * *


Union {


struct {


Zend_endian_lohi_3 (


Zend_uchar type,


Zend_uchar flags,/* Used for Strings & objects * *


uint16_t gc_info)/* keeps GC root number (or 0) and color * *


} V;


uint32_t Type_info;


} u;


} Zend_refcounted_h;


Now, this structure will certainly contain a field that stores the reference count. In addition, there are type, flags and Gc_info. Type stores the same content as the type in Zval, so that the GC uses reference counts alone without storing zval. Flags have different uses in different data types, which are put in the next section.


The buffered function in Gc_info and PHP5 is the same, but it is no longer a pointer to the root buffer, but an index number. Since the size of the previous root buffer is fixed (10,000 elements), it is sufficient to use a 16-bit (2-byte) number instead of 64-bit (8-byte) pointers. The Gc_info also contains a "color" bit that is used to mark nodes when recycling.


Zval Memory Management

As mentioned above, the memory required by Zval is no longer allocated separately from the heap. But obviously there's always a place to store it, so where does it exist? In fact, most of the time it's still in the heap (so the point in the previous article is not the heap, but it is allocated separately), but it is embedded in other data structures, such as Hashtable and bucket now directly have a zval field instead of a pointer. So the function table compilation variables and object properties are stored as a zval array and get an entire chunk of memory rather than a zval pointer scattered everywhere. The previous Zval * now become zval.


Previously, when Zval was used in a new place, a copy of the Zval * was duplicated and the reference count was added once. The Zval value is now copied directly (ignoring U2), and in some cases may increase the reference count (if counted) that its structure pointer points to.


So how does PHP know if Zval is counting? Not all data types can be known, because some types (such as strings or arrays) do not always need reference counting. So the Type_info field is used to record whether the zval is being counted, and the value of this field has the following:


#define IS_TYPE_CONSTANT (1/* Special * *


#define IS_TYPE_IMMUTABLE (1/* Special * *


#define IS_TYPE_REFCOUNTED (1


#define IS_TYPE_COLLECTABLE (1


#define IS_TYPE_COPYABLE (1


#define IS_TYPE_SYMBOLTABLE (1/* Special * *


Note: In the official version of 7.0.0, the above macro-defined annotations are for zval.u1.v.type_flags use. This should be a comment error because the above field is the Zend_uchar type.


The three main properties of Type_info are "refcounted", "recyclable" (collectable), and "replicable" (copyable). The question of counting has been mentioned above. "Recyclable" is used to mark whether Zval participates in loops, as strings are usually counted, but you can't make a circular reference to strings.


Whether it can be replicated to indicate whether it is necessary to make a copy at the time of replication (the "duplication" used in the original language, expressed in Chinese, may not be very well understood) an identical entity. "Duplication" is a deep copy, for example, when copying an array, it is more than simply adding an array's reference count, but creating an array of new values. However, some types (such as objects and resources) can only increase the reference count even if "duplication" is a type that is not replicable. This also matches the existing semantics of objects and resources (existing, PHP7, not just PHP5).


The following table shows which tags the different types will use (X-labeled Attributes). Simple types refers to integers or Boolean types that do not use pointers to a struct body. The following table also has the "immutable" (immutable) tag, which is used to mark the immutable group, which is detailed in the next section.


Interned string (reserved character) has not been mentioned before, in fact, is the function name, variable name, such as the need not count, not repeatable strings.


| refcounted | Collectable | copyable | Immutable


----------------+------------+-------------+----------+----------


Simple Types |             |          | |


string |             x |     | x |


Interned string |             |          | |


Array |      x |     x | x |


Immutable Array |             |          |     | X


Object |      x |          x | |


Resource |             x |          | |


Reference |             x |          | |


To understand this, we can look at a few examples to better understand how zval memory management works.


Here is the integer behavior pattern, which is based on the PHP5 example above:


$a = 42; $a = Zval_1 (Type=is_long, value=42)

$b = $a; $a = Zval_1 (Type=is_long, value=42)
$b = zval_2 (Type=is_long, value=42)

$a + 1; $a = Zval_1 (Type=is_long, value=43)
$b = zval_2 (Type=is_long, value=42)

unset ($a); $a = Zval_1 (type=is_undef)
$b = zval_2 (Type=is_long, value=42)
This process is actually quite simple. Now that the integers are no longer shared, the variables are separated directly into two separate zval, and since the Zval is now embedded, there is no need to allocate memory separately, so the comment here uses = to represent rather than the pointer symbol->,unset when the variable is marked as Is_undef. Let's look at more complex situations:
$a = []; $a = Zval_1 (Type=is_array)-> zend_array_1 (Refcount=1, value=[])

$b = $a; $a = Zval_1 (Type=is_array)-> zend_array_1 (refcount=2, value=[])
$b = zval_2 (type=is_array)---^

Zval separation is carried out here
$a [] = 1//$a = Zval_1 (Type=is_array)-> zend_array_2 (Refcount=1, value=[1])
$b = zval_2 (Type=is_array)-> zend_array_1 (Refcount=1, value=[])

unset ($a); $a = Zval_1 (type=is_undef), zend_array_2 destroyed
$b = zval_2 (Type=is_array)-> zend_array_1 (Refcount=1, value=[])
In this case, each variable variable has a separate zval, but is a struct that points to the same (with reference count) Zend_array. Copying the value of one of the arrays is not replicated. This is similar to the PHP5 situation.
Type (Types)

Let's take a look at what types of PHP7 support (type tags used by zval):


/* Regular data types * *


#define IS_UNDEF 0


#define IS_NULL 1


#define IS_FALSE 2


#define IS_TRUE 3


#define Is_long 4


#define Is_double 5


#define Is_string 6


#define Is_array 7


#define Is_object 8


#define Is_resource 9


#define Is_reference 10

/* Constant Expressions * *
#define Is_constant 11
#define IS_CONSTANT_AST 12

/* Internal types */
#define is_indirect                 
#define is_ptr                      
This list is similar to the one used by PHP5, but adds a few items:
Is_undef The zval pointer used to mark the previously null (and Is_null does not conflict). For example, in the above example, use unset to unregister a variable;
Is_bool is now split into Is_false and is_true two items. The boolean-type tag is now recorded directly into type, which optimizes type checking. However, whether this change is transparent to the user or only a "boolean" type of data (in PHP script).
PHP references are no longer marked with is_ref, but with is_reference types. This should also be put in the next section;
Is_indirect and is_ptr are special internal tags.
There should actually be two fake types in the list above, which is ignored. The
Is_long type represents a Zend_long value, rather than a LONG type of native C language. The reason is that the long type on the Windows 64-bit system (LLP64) has only a bit depth of 32 bits. So PHP5 can only use 32 digits on Windows. PHP7 allows you to use 64-bit numbers on 64-bit operating systems, even on Windows. The contents of the
zend_refcounted will be in the next section. Here's a look at the implementation of the PHP reference.
Reference

PHP7 uses a completely different approach than PHP5 to handle PHP & symbol references (This change is also the root cause of many bugs in the PHP7 development process). Let's start with the way PHP references are implemented in PHP5.
Typically, the write-time copy principle means that you need to detach a zval before you modify it to ensure that the value of a PHP variable is always modified. This is what the value call means.
But this rule does not apply when using PHP references. If a PHP variable is a PHP reference, it means you want to point multiple PHP variables to the same value. The is_ref tag in PHP5 is used to indicate whether a PHP variable is not a PHP reference and need not be separated when modifying. For example:
$a = []; //$a     -> zval_1 (Type=is_array, Refcount=1, is_ref=0)-> hashtable_1 ( Value=[])
$b =& $a//$a, $b-> zval_1 (Type=is_array, refcount=2, is_ref=1)-> hashtable_1 (value=[))

$b [] = 1;//$a = $b = Zval_1 (Type=is_array, refcount=2, is_ref=1)-> hashtable_1 (value=[1))
   & nbsp;     //Because the value of Is_ref is 1, PHP will not detach the Zval
but a big problem with this design is that it cannot be used in a PHP reference variable and PHP is not referenced to change Share the same value between quantities. For example, the following scenario:
$a = []; //$a         -> zval_1 (Type=is_array, Refcount=1, is_ref=0)-> hashtable_1 (value=[])
$b = $a; //$a, $b     -> zval_1 (type= Is_array, refcount=2, is_ref=0)-> hashtable_1 (value=[])
$c = $b   //$a, $b, $c-> zval_1 (Type=is_arr AY, refcount=3, is_ref=0)-> hashtable_1 (value=[])

$d =& $c; $a, $b-> zval_1 (Type=is_array, refcount=2, is_ref=0)-> hashtable_1 (value=[))
$c, $d-> zval_1 (Type=is_array, refcount=2, is_ref=1)-> hashtable_2 (value=[))
$d is a reference to $c, but it is not a $a $b, so here Zval still need to replicate
So we have two zval, a is_ref value of 0, and a is_ref value of 1.

$d [] = 1; $a, $b-> zval_1 (Type=is_array, refcount=2, is_ref=0)-> hashtable_1 (value=[))


$c, $d-> zval_1 (Type=is_array, refcount=2, is_ref=1)-> hashtable_2 (value=[1))


Because there are two separated zval, statements $d [] = 1 Do not modify the values of $a and $b.


This behavior also causes the use of references in PHP to be slower than normal values. For example, the following example:


$array = Range (0, 1000000);


$ref =& $array;


Var_dump (Count ($array)); //


Because count () only accepts a call to a value, but $array is a PHP reference, count () actually has a complete copy of the array before it is executed. This will not happen if $array is not a reference.


Now let's look at the implementation of PHP references in PHP7. Because Zval no longer allocate memory alone, there is no way to use the same implementation as in PHP5. So a is_reference type is added and the zend_reference is used specifically to store the reference value:


struct _zend_reference {


Zend_refcounted GC;


Zval Val;


};


Essentially zend_reference only adds a reference count of Zval. All reference variables store a zval pointer and are marked as is_reference. Val has the same behavior as other zval, especially if it can share pointers to the complex variables it stores, such as arrays that can be shared between reference variables and value variables.


Let's look at the example, this time the semantics in PHP7. To make it clear that there is no longer writing zval alone, show only the structures they point to:


$a = []; $a-> zend_array_1 (refcount=1, value=[])


$b =& $a; $a, $b-> zend_reference_1 (refcount=2)-> zend_array_1 (Refcount=1, value=[])

$b [] = 1; $a, $b-> zend_reference_1 (refcount=2)-> zend_array_1 (Refcount=1, value=[1])
The reference pass in the example above creates a zend_reference, noting that its reference count is 2 (since two variables are using this PHP reference). But the reference count of the value itself is 1 (because zend_reference only has a pointer to it). Here's a look at the combination of references and non-references:
$a = []; $a-> zend_array_1 (refcount=1, value=[])
$b = $a; $a, $b,-> zend_array_1 (refcount=2, value=[])
$c = $b//$a, $b, $c-> zend_array_1 (refcount=3, value=[])

$d =& $c;//$a, $b                                  - > zend_array_1 (refcount=3, value=[])
         //$c, $d-> Zend_reference_1 (refcount=2)---^
         //Note all variables share the same zend_ Array, even if some are PHP references and some are not

$d [] = 1; $a, $b-> zend_array_1 (refcount=2, value=[])
$c, $d-> zend_reference_1 (refcount=2)-> zend_array_2 (Refcount=1, value=[1])
Zend_array is only assigned when the assignment is made at this time
The biggest difference here and PHP5 is that all variables can share the same array, even if some are PHP references or not. The array is separated only when one of the parts is modified. This also means that when count () is used, it is safe to pass a large reference array to it, and no more copying is done. However, references are still slower than normal values because there is a reason to allocate memory (indirection) to the zend_reference structure and the engine itself to handle this together.

To understand this article, you should have some understanding of the implementation of variables in PHP5, the focus of this article is to explain the zval changes in PHP7.


The first part is about the realization and change of the basic variables in PHP5 and PHP7. Again, the main change is that Zval no longer allocates memory separately, and does not store reference counts on its own. Simple types such as integral floating-point type are stored directly in the Zval. A complex type points to an independent structure by using the pointer.


Complex Zval data values have a common header whose structure is defined by zend_refcounted:


struct _zend_refcounted {


uint32_t RefCount;


Union {


struct {


Zend_endian_lohi_3 (


Zend_uchar type,


Zend_uchar flags,


uint16_t gc_info)


} V;


uint32_t Type_info;


} u;


};


This header stores the RefCount (reference count), the type of the value and the associated information for the recycle gc_info, and the type flags bit flags.


The implementation of each complex type is then individually analyzed and compared with the implementation of the PHP5. Although the reference is also a complex type, the previous section has been introduced, and here is no longer to repeat. In addition, the resource type is not mentioned here (because the author feels there is nothing to say about the resource type).


String

PHP7 defines a new struct body zend_string used to store string variables:


struct _zend_string {


Zend_refcounted GC;


Zend_ulong h; /* Hash Value * *


size_t Len;


Char val[1];


};


In addition to the header of the reference count, the string contains the hash cache h, Len of the string length, and the value of the string Val. The existence of a hash cache is intended to prevent a key that uses a string as hashtable to compute its hash value repeatedly when it is searched, so this is initialized before it is used.


If you don't know much about C, you might find the definition of Val a little odd: there is only one element to this statement, but obviously the string we want to store is more than the length of a character. This is actually a "black" approach to the structure: When you declare an array, you define only one element, but when you actually create the zend_string, you allocate enough memory to store the entire string. So we can still access the full string through Val.


This is, of course, an unconventional means of implementation, as we actually read and write beyond the bounds of single character arrays. But the C language compiler does not know that you are doing so. Although C99 has clearly defined the support of "flexible array", but thanks to our good friend Microsoft, no one can be on different platforms to ensure C99 consistency (so this is to solve the Windows platform flexible array support problem).


The new string type structure is easier to use than the native C string: The first is because the length of the string is stored directly so that it does not have to be computed every time it is used. The second is that the string also has a reference count head, so that the string itself can be shared in different places without using Zval. A common place to use is to share Hashtable key.


But the new string type also has a very bad place: although it is easy to remove the C string from the zend_string (using Str->val), but conversely, if the C string into zend_string, you need to first assign the zend_string required Memory, and then copy the string into the zend_string. This is not very convenient in the actual use of the process.


The string also has some unique flags (stored in the GC's flag bit):


#define Is_str_persistent (1/* allocated using malloc * *


#define IS_STR_INTERNED (1/* interned String */


#define Is_str_permanent (1/* interned string surviving request boundary * *


The persisted string of memory needs to be allocated directly from the system itself rather than the Zend Memory Manager (ZMM) so that it can exist instead of being valid only in a single request. Marking this particular assignment makes it easy for zval to use a persisted string. This is not done in PHP5, and is copied to the zmm before use.


Reserved characters (interned strings) are somewhat special, and will persist until the end of the request is destroyed, so there is no need for reference counting. The reserved string is also not repeatable (duplicate), so the new reserved character is also checked first to see if the same character already exists. All the immutable strings in the PHP source code are reserved characters (including string constants, variable name function names, and so on). The persisted string is also a reserved character that was created before the request was started. However, normal reserved characters are destroyed after the request is finished, but persistent strings are always present.


If Opcache is used, reserved characters are stored in shared memory (SHM) so that they can be shared in all PHP processes. In this case, the persisted string is meaningless because the reserved character is not destroyed.


Array

Because the previous article has talked about the new array implementation, so this is no longer described in detail. Although some recent changes have led to the previous description is not very accurate, but the basic concept is consistent.
Here is an array-related concept that is not mentioned in the previous article: an immutable group. It is essentially similar to a reserved character: There is no reference count and persists until the end of the request (and possibly after the request has ended).
Because of the convenience of some memory management, the immutable group will only be used when opening opcache. Let's take a look at the actual use of the example, first look at the following script:
for ($i = 0; $i 1000000 + + $i) {
$array [] = [' foo '];
}
Var_dump (Memory_get_usage ());
When the Opcache is turned on, the above code will use 32MB of memory, do not open because $array each element will be copied a copy [' foo '], so need 390MB. The reason for the full replication here instead of increasing the reference count value is to prevent shared memory errors when the Zend virtual machine operator executes. I hope that the problem of memory explosion without using Opcache can be improved later.
Objects in the PHP5

In understanding the objects in PHP7 to achieve a straight line let's take a look at the PHP5 and see what the efficiency problem is. The Zval in PHP5 stores a zend_object_value structure, which is defined as follows:


typedef struct _ZEND_OBJECT_VALUE {


Zend_object_handle handle;


Const Zend_object_handlers *handlers;


} Zend_object_value;


Handle is the unique ID of an object and can be used to find object data. Handles is a virtual function table pointer that holds various property methods for an object. Normally, PHP objects have the same handler table, but the object created by the PHP extension can also be customized by operator overloading.


An object handle (handler) is used as an index for "object storage", and the object store itself is an array of storage containers (bucket), bucket defined as follows:


typedef struct _ZEND_OBJECT_STORE_BUCKET {


Zend_bool destructor_called;


Zend_bool valid;


Zend_uchar Apply_count;


Union _store_bucket {


struct _store_object {


void *object;


zend_objects_store_dtor_t dtor;


zend_objects_free_object_storage_t Free_storage;


zend_objects_store_clone_t clone;


Const Zend_object_handlers *handlers;


Zend_uint RefCount;


Gc_root_buffer *buffered;


obj;


struct {


int next;


} free_list;


} bucket;


} Zend_object_store_bucket;


This structure contains a lot of things. The first three members are just plain metadata (whether the object's destructor was invoked, whether the Bucke was used, and how many times the object was called recursively). The next consortium is used to distinguish between the state in use or the idle state of the bucket. The most important of the above structure is the struct _store_object substructure:


The first member object is a pointer to the actual object (that is, the location of the object's final store). objects are not actually embedded directly into the bucket of object storage, because objects are not fixed-length. Below the object pointer are three action handles (handler) for managing object destruction, release, and cloning. Note that PHP destroys and frees objects in a different step, which in some cases may be skipped (not fully released). Cloning is virtually impossible to use because the operations contained here are not part of the ordinary object itself, so (at any point) they are copied (duplicate) in each object individually rather than shared.


These objects store an action handle followed by a normal object handlers pointer. This data is stored because it is sometimes possible to destroy objects in the event of unknown zval (these operations are typically done against zval).


Bucket also contains refcount fields, but this behavior is somewhat odd in PHP5, because the zval itself has stored reference counts. Why do I need an extra count? The problem is that although Zval's "copy" behavior is typically a simple addition to the reference count, there are occasional deep replication situations, such as creating a new zval but preserving the same zend_object_value. In this case, two different zval use the same object to store the bucket, so bucket itself also need to do reference counting. This "double counting" approach is an intrinsic problem of PHP5 implementation. The buffered pointer in the GC root buffer is also required for full replication (duplicate) for the same reason.


Now look at the structure of the actual object that the pointer points to in the objects store, typically the user-level object is defined as follows:


typedef struct _ZEND_OBJECT {


Zend_class_entry *ce;


HashTable *properties;


Zval **properties_table;


HashTable *guards;


} Zend_object;


The Zend_class_entry pointer points to the class prototype of the object implementation. The next two elements are storing object properties in different ways. Dynamic properties (which are added at run time rather than defined in the class) all exist in properties, but only simple matches of property names and values.


There is, however, an optimization for declared attributes: Each attribute is assigned an index during compilation and the property itself is stored in the properties_table index. The matching of attribute names and indexes is stored in the hashtable of the class prototype. This prevents each object from using more memory than the Hashtable limit, and the index of the property is cached at run time.


The Guards hash table is used to implement the recursive behavior of the Magic method, such as __get, which we do not discuss in depth.


In addition to the double counting problem mentioned above, this implementation also has the problem that a minimal object with only one attribute also requires 136 bytes of memory (which is not the memory required for zval). And there are a lot of indirect access actions: for example, to take an element out of an object zval, you first need to remove the object store bucket, then Zend object, before you can find the object property sheet and Zval through the pointer. This allows at least 4 levels of indirect access (and may require a minimum of seven layers in actual use).


Objects in the PHP7

The implementation of PHP7 attempts to address these issues, including removing the double reference count, reducing memory usage, and indirect access. The new zend_object structure is as follows:


struct _zend_object {


Zend_refcounted GC;


uint32_t handle;


Zend_class_entry *ce;


Const Zend_object_handlers *handlers;


HashTable *properties;


Zval Properties_table[1];


};


You can see that the structure is now almost the entire content of an object: Zend_object_value has been replaced by a pointer to object and object storage, although not completely removed, but it has been a great boost.


In addition to the usual zend_refcounted heads in PHP7, the handlers of handle and objects are now placed in Zend_object. The properties_table also uses the C-struct trick so that the Zend_object and the property sheet get an entire block of memory. Of course, the property sheet is now embedded directly into the zval rather than the pointer.


Now that the guards table is not in the object structure, the value of the field is now stored in the first place in the properties_table, that is, using __get, and so on. But if the Magic method is not used, the Guards table will be omitted.


The Dtor, Free_storage, and clone three action handles are stored in the object operations bucket and are now directly present in the handlers table, whose structure is defined as follows:


struct _zend_object_handlers {


/* Offset of real object header (usually zero) * *


int offset;


/* General Object Functions * *


zend_object_free_obj_t Free_obj;


zend_object_dtor_obj_t Dtor_obj;


zend_object_clone_obj_t Clone_obj;


/* Individual Object functions * *


... rest is about the same in PHP 5


};


The first member of the handler table is offset, which is obviously not an action handle. This offset is a must in the present implementation, because although internal objects are always embedded in the standard Zend_object, there is always the need to add some members in. The way to solve this problem in PHP5 is to add something to the back of the standard object:


struct Custom_object {


Zend_object std;


uint32_t something;


// ...


};


This way, if you can easily add zend_object* to the struct custom_object*. This is also commonly used in C language structure inheritance practices. But there is a problem with this implementation in PHP7: Because Zend_object uses the hack of the structure to store the property sheet, the PHP properties that are stored at the end of the Zend_object cover the internal members that are added later. So PHP7 's implementation adds its own members to the front of the standard object structure:


struct Custom_object {


uint32_t something;


// ...


Zend_object std;


};


This means, however, that it is not possible to simply convert the zend_object* and struct custom_object* now, since both are separated by an offset. So this offset needs to be stored in the first element of the object handler table, so that the specific offset value can be determined at compile time through the Offsetof () macro.


Maybe you'll be curious. Now that the Zend_object pointer is stored directly (in Zend_value), there is no need to find objects in the object store anymore, so why do PHP7 objects retain handle fields?


This is because object storage still exists, and although it has been greatly simplified, it is still necessary to preserve handle. Now it's just an array of pointers to objects. When an object is created, a pointer is inserted into the object store and its index is saved in handle, and the index is removed when the object is freed.


So why do we need object storage now? Because a node is present at the end of the request, it is not safe to execute the user code after that and fetch the pointer data. To avoid this scenario, PHP performs all the destructor of the object on the earlier node and then no longer has such operations, so a list of active objects is required.


And handle is also useful for debugging, which allows each object to have a unique ID, so it's easy to tell whether two objects are the same or just have the same content. Although HHVM has no concept of object storage, it also saves the handle of objects.


Compared to PHP5, there is only one reference count in the implementation now (Zval itself does not count), and memory usage is greatly reduced: 40 bytes for the underlying object, 16 bytes for each attribute, and this is after zval. The situation with indirect access has also improved significantly, as the structure of the middle tier is either removed or embedded directly, so reading a property now is only one level of access and no longer a four level.


Indirect zval

By now we have basically mentioned all the normal zval types, but there are a couple of special types for certain situations, one of which is PHP7 newly added Is_indirect.


Indirect zval means that the real value is stored elsewhere. Note that the is_reference type is different, and the indirect zval is directed to another zval rather than embedding zval like the zend_reference structure.


To understand when this is going to happen, let's take a look at the implementation of variables in PHP (in fact, the storage of object properties is the same).


All variables known during compilation are assigned an index and their values are placed in the corresponding location in the compilation variable (CV) table. But PHP also allows you to dynamically reference variables, whether local or global (such as $GLOBALS), and whenever this happens, PHP creates a symbol table for the script or function that contains the mapping between the variable names and their values.


But the question is: how do you achieve simultaneous access to both tables? We need to be able to access common variables in the CV table, and we need to be able to access the compiled variables in the symbol table. In PHP5, the CV table uses a double pointer zval**, which usually points to the zval* table in the middle, zval* the actual zval is the end point:


+------Cv_ptr_ptr[0]


| +----CV_PTR_PTR[1]


| | +--Cv_ptr_ptr[2]


| | |


| | +-> Cv_ptr[0]--> some zval


| +---> cv_ptr[1]--> some zval


+-----> cv_ptr[2]--> some zval


The intermediate tables that store zval* when you need to use the symbol table are not actually used, and the zval** pointer is updated to the response location of the Hashtable buckets. We assume that there are $a, $b, and $c three variables, and here's a simple schematic:


Cv_ptr_ptr[0]--> symboltable["A"].pdataptr--> some Zval


CV_PTR_PTR[1]--> symboltable["B"].pdataptr--> some Zval


CV_PTR_PTR[2]--> symboltable["C"].pdataptr--> some Zval


However, there is no such problem with the use of PHP7, because Hashtable bucket fails when the size of the hashtable in the PHP7 is changed. So PHP7 used a reverse strategy: to access the variables stored in the CV table, store the INDIRECT in the symbol table to point to the CV table. The CV table does not redistribute within the life cycle of the symbol table, so there is no problem with invalid pointers.


So join you have a function and there is $a, $b and $c in the CV table, and there is also a dynamically assigned variable $d, the structure of the symbol table looks like this:


Symboltable["a"].value = INDIRECT--> cv[0] = LONG 42


symboltable["B"].value = INDIRECT--> cv[1] = DOUBLE 42.0


symboltable["C"].value = INDIRECT--> cv[2] = STRING--> zend_string ("42")


symboltable["D"].value = ARRAY--> Zend_array ([4, 2])


An indirect zval can also be a pointer to a is_undef type Zval, which occurs when Hashtable does not have a key associated with it. So when you use unset ($a) to mark the type of cv[0] as UNDEF, you determine that the symbol table does not have data with the key value of a.


Constants and AST

There are also two special types of is_constant and is_constant_ast that need to be mentioned in both PHP5 and PHP7. To get to know them, let's take a look at the following examples:
function test ($a = ANSWER,
$b = ANSWER * ANSWER) {
return $a + $b;
}

Define (' ANSWER ', 42);
Var_dump (Test ()); Int (42 + 42 * 42) ·
The default values for the two parameters of the test () function are composed of constant answer, but the value of the function declaration constant is not yet defined. The exact value of a constant is known only by the define () definition.
Because of the problems above, the default values for parameters and properties, constants, and other things that accept "static expressions" support "deferred binding" until it is first used.
Constants (or static properties of a class) these require "delayed binding" data that is most often needed to use the Is_constant type Zval. If this value is an expression, the zval of the Is_constant_ast type is used to point to the abstract syntax tree (AST) of the expression.
We're done here. Analysis of the implementation of variables in PHP7. I might also write two more articles on virtual machine optimizations, new naming conventions, and some optimizations for compiler infrastructure (which is the author's exact words).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.