The implementation of variables within PHP7

Source: Internet
Author: User
The first and second sections of this article are translated from the blogs of Nikita Popov (member of the Nikic,php official development group, students at the University of Science and Technology, Berlin). In order to better conform to the Chinese reading habits, the text does not translate verbatim.

To understand this article, you should have some understanding of the implementation of variables in PHP5, the focus of this article is to explain the zval changes in PHP7.

Due to a large number of detail descriptions, this article will be divided into two parts: the first part mainly describes how the implementation of Zval (Zend value) differs in PHP5 and PHP7 and the implementation of references. The second section will analyze the details of individual types (strings, objects).

The Zval in PHP5

The zval structure in PHP5 is defined as follows:

typedef struct _ZVAL_STRUCT {    zvalue_value value;    Zend_uint refcount__gc;    Zend_uchar type;    Zend_uchar is_ref__gc;} Zval;

As above, Zval contains a value, a type, and a field of two __gc suffixes. Value is a consortium used to store different types of values:

typedef Union _ZVALUE_VALUE {    long lval;                 for bool type, integer type, and resource type    double dval;               Used for floating-point type    struct {                   //for string        char *val;        int len;    } STR;    HashTable *ht;             Used for array    zend_object_value obj;     Used for object    zend_ast *ast;             Used for constant expressions (PHP5.6 only)} Zvalue_value;

The C language consortium is characterized by that only one member is valid at a time and that the allocated memory matches the member that needs the most memory (also consider memory alignment). All members are stored in the same location in memory and store different values as needed. When you need to lval, it stores a signed shape, and it stores a double-precision floating-point number when dval is required.

It should be noted that the data type currently stored in the Union is logged to the Type field and marked with an integer type:

#define IS_NULL     0/      * doesn ' t use value */#define IS_LONG     1      */Uses lval */#define IS_DOUBLE   2/      * Uses dval */#define IS_BOOL     3/      Uses Lval with values 0 and 1 */#define IS_ARRAY    4/      * Uses HT */#define I S_object   5/      * Uses obj */#define is_string   6/      * Uses str */#define IS_RESOURCE 7//      Uses lval, WH Ich is the resource ID *//* Special types used for late-binding of constants */#define Is_constant 8#define is_constant_a ST 9

Reference count in PHP5

In PHP5, Zval's memory is allocated separately from the heap (with a few exceptions), and PHP needs to know which zval are being used and which ones need to be freed. So this requires a reference count: The value of REFCOUNT__GC in Zval is used to save the number of times the zval itself is referenced, such as $ A = $b = 42 statements, 42 is referenced by two variables, so its reference count is 2. If the reference count becomes 0, it means that the variable is no longer in use and the memory can be freed.

Note that the reference count referred to here is not a reference in PHP code (using &), but rather the number of times a variable is used. Both of the two concepts are distinguished by using "PHP reference" and "reference" when both need to be present, ignoring the parts of PHP first.

A concept closely related to reference counting is "copy-on-write": For multiple references, Zaval is shared only if there is no change, and once one of the references changes the value of zval, it is necessary to copy ("separated") a zval and then modify the copied zval.

Here is an example of "copy-on-write" and Zval destruction:

$a =;   $a         -Zval_1 (Type=is_long, value=42, refcount=1) $b = $a;   $a, $b-     zval_1 (Type=is_long, value=42, refcount=2) $c = $b;   $a, $b, $c-Zval_1 (Type=is_long, value=42, refcount=3)///The following lines are about Zval separated by $ A + = 1;   $b, $c-Zval_1 (Type=is_long, value=42, refcount=2)           //$a     zval_2 (Type=is_long, value=43, refcount=1 ) unset ($b); $c, Zval_1 (Type=is_long, value=42, refcount=1)           //$a zval_2 (Type=is_long, value=43, Refcount=1) unset ($ c); Zval_1 is destroyed, because refcount=0           //$a zval_2 (Type=is_long, value=43, Refcount=1)

The reference count has a fatal problem: the circular reference (memory used) cannot be checked and freed. To solve this problem, PHP uses a method of recycling. When a zval count is reduced, it may be part of the loop, when the Zval is written to the "root buffer". When the buffer is full, the potential loops are flagged and recycled.

In order to support circular recycling, the actual structure of the zval used is actually as follows:

typedef struct _ZVAL_GC_INFO {    zval z;    Union {        Gc_root_buffer       *buffered;        struct _zval_gc_info *next;    } U;} Zval_gc_info;

A normal zval structure is embedded in the ZVAL_GC_INFO structure, and two pointer parameters are added, but all belong to the same union u, so it is useful to have only one pointer in practice. The buffered pointer is used to store the reference address of the zval in the root buffer, so if Zval has been destroyed before the recycle is executed, the field may be removed. Next is used when recovering the destroyed value, and this will not go deep.

Change Motive

Let's talk about memory usage, which is all about 64-bit systems. First, because Str and obj occupy the same size, Zvalue_value This consortium consumes 16 bytes (bytes) of memory. The memory occupied by the entire zval struct is 24 bytes (with memory alignment in mind) and the Zval_gc_info size is 32 bytes. In summary, the memory allocated to Zval in the heap (relative to the stack) requires an additional 16 bytes, so each zval needs to use 48 bytes in different places (it is important to understand that the above calculation requires 8 bytes per pointer on a 64-bit system).

In this regard, no matter what aspect of thinking can be considered zval this design efficiency is very low. For example, when storing an integer, zval only needs 8 bytes, even if the additional 8 bytes need to be stored and the memory is aligned.

It does require 16 bytes to store the integer, but there are actually 16 bytes for the reference count and 16 bytes for the recycle. So zval memory allocation and release are very expensive operations, we need to optimize it.

Think from this point of view: Does an integer data really need to store reference counts, recycle information, and allocate memory separately on the heap? The answer is, of course, no, this is not a good way to handle it.

Here is a summary of the main problems of zval implementation in PHP5:

    • Zval always allocates memory separately from the heap;
    • Zval always stores the reference count and recycling information, even if it is an integral type that may not require such information;
    • When using objects or resources, direct references result in two counts (the reason is explained in the next section);
    • Some indirect access requires a better approach. For example, access to objects stored in variables is now indirectly using four pointers (the length of the pointer chain is four). This issue is also discussed in the next section;
    • Direct counting also means that values can only be shared between Zval. Not if you want to share a string between Zval and Hashtable key (unless Hashtable key is also Zval).

The Zval in PHP7

In the PHP7, Zval has a new way of realizing it. The most fundamental change is that the memory that Zval needs is no longer allocated separately from the heap and does not store its own reference count. Reference counts for complex data types, such as strings, arrays, and objects, are stored by themselves. This implementation has the following benefits:

    • Simple data types do not need to allocate memory separately, nor do they need to be counted;
    • There will be no more than two counts of the case. In an object, only the count stored by the object itself is valid;
    • Since the count is now stored by the value itself, it can be shared with non-zval structures, such as zval and Hashtable key;
    • The number of pointers required for indirect access has decreased.

Let's look at the definition of the Zval struct now (now in the Zend_types.h file):

struct _zval_struct {    zend_value        value;            /* Value *    /union {        struct {            zend_endian_lohi_4 (                zend_uchar    type,/         * Active type */                Zend_ Uchar    type_flags,                zend_uchar    const_flags,                zend_uchar    reserved)     /* Call info for EX ( This) */        } V;        uint32_t type_info;    } U1;    Union {        uint32_t     var_flags;        uint32_t     Next;                 /* Hash Collision chain */        uint32_t     cache_slot;           /* Literal cache slot */        uint32_t     Lineno;               /* line number (for AST nodes) */        uint32_t     Num_args;             /* Arguments number for EX (this) */        uint32_t     fe_pos;               /* foreach position */        uint32_t     fe_iter_idx;          /* foreach iterator Index */    } U2;};

The first element of the struct does not change much, and is still a value union. The second member is a consortium of integers that represent type information and a struct that contains four-character variables (you can ignore the Zend_endian_lohi_4 macro, which is only used to solve cross-platform sizing problems). The more important part of this substructure is the type (and the previous one) and the type_flags, which is explained next.

Here's a little bit of a problem: value should be 8 bytes, but because of memory alignment, even if you add only one byte, it actually takes 16 bytes (using one byte means an extra 8 bytes). But obviously we don't need 8 bytes to store a type field, so we've added a consortium called U2 in the back of U1. It is not available by default and can be used to store 4 bytes of data when it is needed. This consortium can meet the needs of different scenarios.

The structure of value in PHP7 is defined as follows:

typedef Union _ZEND_VALUE {    zend_long         lval;             /* Long value */    double            dval;             /* Double value */    zend_refcounted  *counted;    Zend_string      *str;    Zend_array       *arr;    Zend_object      *obj;    Zend_resource    *res;    Zend_reference   *ref;    Zend_ast_ref     *ast;    Zval             *zv;    void             *ptr;    Zend_class_entry *ce;    Zend_function    *func;    struct {        uint32_t W1;        uint32_t W2;    } WW;} Zend_value;

The first thing to note is that the memory required by the value Consortium now is 8 bytes instead of 16. It only stores integer (lval) or floating-point (dval) data directly, and in other cases it is a pointer (as mentioned above, the pointer occupies 8 bytes, the bottom structure consists of two 4-byte unsigned integers). All of the pointer types above (except for the special tags) have a same header (zend_refcounted) to store the reference count:

typedef struct _ZEND_REFCOUNTED_H {    uint32_t         refcount;          /* Reference Counter 32-bit *    /union {        struct {            zend_endian_lohi_3 (                zend_uchar    type,                zend_  Uchar    Flags,    /* Used for Strings & objects */                uint16_t      gc_info)  /* keeps GC root number (or 0) and color */        } V;        uint32_t type_info;    } U;} Zend_refcounted_h;

Now, the struct will definitely contain a field that stores reference counts. In addition there are type, flags and Gc_info. The type stores the same content as the type in Zval, so that the GC uses reference counts separately without storing zval. Flags have different uses for different data types, and this is put in the next section.

The buffered function in Gc_info and PHP5 is the same, but is no longer a pointer to the root buffer, but an index number. Because the size of the previous root buffer is fixed (10,000 elements), it is sufficient to use a 16-bit (2-byte) number instead of a 64-bit (8-byte) pointer. The Gc_info also contains a "color" bit for the tag node when it is recycled.

Zval Memory Management

As mentioned above, the memory required by Zval is no longer allocated separately from the heap. But obviously there is always a place to store it, so where does it exist? In fact, most of the time it's still in the heap (so the point in the previous article is not a heap, but a separate assignment), but it's embedded in other data structures, such as Hashtable and buckets that now have a zval field instead of pointers. So the function table compiler variables and object properties are stored as a zval array and get a whole block of memory instead of zval pointers scattered around. The previous Zval * has now become zval.

Previously, when Zval was used in a new place, a copy of Zval * was copied and a reference count was added once. The value of Zval is now copied directly (ignoring U2), and in some cases it may increase the reference count (if counted) that its structure pointer points to.

So how does PHP know if Zval is counting? Not all data types are known, because some types, such as strings or arrays, do not always require a reference count. So the Type_info field is used to record whether the zval is being counted, and the value of this field is as follows:

#define Is_type_constant            (1/* Special */#define IS_TYPE_IMMUTABLE           (1/* Special */#define Is_type_refcounted          (1#define is_type_collectable (         1#define is_type_copyable (1#define            is_type_symboltable         (1/* Special * *

Note: In the official version of 7.0.0, the above-mentioned macro-defined annotations are for zval.u1.v.type_flags use. This should be a comment error because the above field is the Zend_uchar type.

The three main attributes of Type_info are "count" (refcounted), "recyclable" (collectable), and "replicable" (copyable). The question of counting has been mentioned above. "Recyclable" is used to mark whether Zval participates in loops, as strings are usually counted, but you have no way of making a circular reference to a string.

Whether it can be copied to indicate whether it is necessary to make copies at copy time ("duplication" in the original text, expressed in Chinese, may not be well understood) an identical entity. "Duplication" is a deep copy, such as when copying an array, rather than simply increasing the reference count of the array, instead of creating an array of the same value as the new one. However, some types, such as objects and resources, can only increase the reference count if "duplication" is a non-replicable type. This also matches the existing semantics of objects and resources (existing, PHP7, not just PHP5).

The table below shows which tags are used by different types (the ones marked with X are the attributes). Simple types refers to an integer or Boolean type that does not use the pointer to point to a struct body type. The following table also has "immutable" (immutable) tags, which are used to mark immutable groups, which are detailed in the next section.

Interned string (reserved character) has not been mentioned before, in fact, is the function name, variable name, such as no need to count, non-repeatable string.

| refcounted | Collectable | copyable | Immutable----------------+------------+-------------+----------+----------simple types    |            |             |          | String          |      X     |             |     X    |interned String |            |             |          | Array           |      X     |      X      |     x    |immutable Array |            |             |          |     Xobject          |      X     |      X      |          | Resource        |      X     |             |          | Reference       |      X     |             |          |

To understand this, we can look at a few examples to better understand how zval memory management works.

The following is an integer behavior pattern, which is simplified on the basis of the PHP5 example above:

$a = [];   $a = Zval_1 (Type=is_array), Zend_array_1 (Refcount=1, value=[]) $b = $a;   $a = Zval_1 (Type=is_array), Zend_array_1 (refcount=2, value=[])           //$b = zval_2 (type=is_array)---^//zval separate in this  $a[] = 1   //$a = Zval_1 (Type=is_array), zend_array_2 (Refcount=1, value=[1])           //$b = zval_2 (Type=is_array) Zend_array_1 (Refcount=1, value=[]) unset ($a); $a = Zval_1 (type=is_undef),   zend_array_2 destroyed           //$b = Zval_2 (Type=is_array), Zend_array_1 (Refcount=1, Value=[])

This process is actually quite simple. Now that the integers are no longer shared, the variables will be separated into two separate zval, and since the Zval is now embedded, there is no need to allocate memory separately, so the comments in this note use = instead of the pointer symbol->,unset when the variable is marked as Is_undef. Here's a look at more complex scenarios:

$a = [];   $a = Zval_1 (Type=is_array), Zend_array_1 (Refcount=1, value=[]) $b = $a;   $a = Zval_1 (Type=is_array), Zend_array_1 (refcount=2, value=[])           //$b = zval_2 (type=is_array)---^//zval separate in this  $a[] = 1   //$a = Zval_1 (Type=is_array), zend_array_2 (Refcount=1, value=[1])           //$b = zval_2 (Type=is_array) Zend_array_1 (Refcount=1, value=[]) unset ($a); $a = Zval_1 (type=is_undef),   zend_array_2 destroyed           //$b = Zval_2 (Type=is_array), Zend_array_1 (Refcount=1, Value=[])

In this case, each variable variable has a separate zval, but is a struct that points to the same (with reference count) Zend_array. Replication occurs when the value of one of the arrays is modified. This is similar to the PHP5 situation.

Type (Types)

Let's take a look at what types PHP7 support (the type tag used by Zval):

/* Regular data types */#define IS_UNDEF                    0#define is_null                     1#define is_false                    2#define is_true 3#define                     Is_long                     4#define is_double                   5#define is_string                   6#define is_array                    7#define is_object                   8#define IS_ RESOURCE                 9#define is_reference/                * Constant Expressions * #define Is_constant 11#define                 Is_constant_ AST             */* Internal types */#define Is_indirect                 15#define is_ptr                      17

This list is similar to that used by PHP5, but adds several items:

    • The is_undef is used to mark a previously NULL zval pointer (and Is_null does not conflict). For example, in the example above, use unset to unregister the variable;
    • Is_bool is now divided into the Is_false and is_true two items. Now the token for the Boolean type is directly logged into the type, so you can optimize the type check. However, this change is transparent to the user, or only a "boolean" type of data (in PHP script).
    • PHP references are no longer tagged with is_ref, but instead use the is_reference type. This should also be put in the next part of the talk;
    • Is_indirect and is_ptr are special internal tags.

There should actually be two fake types in the list above, which is ignored.

The Is_long type represents a Zend_long value, not a LONG type of native C language. The reason is that the long type on Windows 64-bit system (LLP64) has a bit depth of only 32 bits. So PHP5 can only use 32-bit numbers on Windows. PHP7 allows you to use 64-bit numbers on 64-bit operating systems, even on Windows.

The contents of zend_refcounted will be spoken in the next section. Here's a look at the implementation of PHP references.

Reference

PHP7 uses a completely different approach to PHP5 with PHP & symbolic references (This change is also the source of a number of bugs in the PHP7 development process). Let's start with the implementation of PHP references in PHP5.

Typically, the copy-on-write principle means that when you modify a zval, you need to detach it to ensure that the value of one of the PHP variables is always modified. This is the meaning of the value call.

However, this rule does not apply when using PHP references. If a PHP variable is a PHP reference, it means that you want to point multiple PHP variables to the same value. The is_ref tag in PHP5 is used to indicate that a PHP variable is not a PHP reference, and that it needs not be separated when modified. Like what:

$a = [];  $a-     zval_1 (Type=is_array, Refcount=1, is_ref=0), Hashtable_1 (value=[]) $b =& $a;//$a, $b, Zval_ 1 (Type=is_array, refcount=2, is_ref=1), Hashtable_1 (value=[]) $b [] = 1; $a = $b = Zval_1 (Type=is_array, refcount=2, is_ref=1), Hashtable_1 (value=[1])          //Because the value of Is_ref is 1, PHP does not have Z Val to separate

But a big problem with this design is that it cannot share the same value between a PHP reference variable and a PHP non-reference variable. For example, the following situation:

$a = [];  $a         -Zval_1 (Type=is_array, Refcount=1, is_ref=0), Hashtable_1 (value=[]) $b = $a;  $a, $b-     zval_1 (Type=is_array, refcount=2, is_ref=0), Hashtable_1 (value=[]) $c = $b   //$a, $b, $c, Zval_1 (Type=is_array, refcount=3, is_ref=0), Hashtable_1 (value=[]) $d =& $c; $a, $b-Zval_1 (Type=is_array, refcount=2, is_ref=0), Hashtable_1 (value=[])          //$c, $d-Zval_1 (type=is _array, refcount=2, Is_ref=1, hashtable_2 (value=[])          //$d is a reference to the $c, but it is not $a $b, so here Zval still need to replicate          //So we There are two zval, a value of Is_ref is 0, and a value of Is_ref is 1. $d [] = 1; $a, $b-Zval_1 (Type=is_array, refcount=2, is_ref=0), Hashtable_1 (value=[])          //$c, $d-Zval_1 (type=is _array, refcount=2, is_ref=1)-hashtable_2 (value=[1])          //Because there are two separated zval, statements $d [] = 1 Do not modify the values of $a and $b.

This behavior also causes the use of references in PHP to be slower than normal values. For example, the following:

$array = Range (0, 1000000), $ref =& $array; Var_dump (count ($array)); //

Because count () only accepts call-to-value, but $array is a PHP reference, count () will actually have a complete copy of the array before execution. If $array is not a reference, this will not happen.

Now let's take a look at the implementation of PHP references in PHP7. Because Zval no longer allocates memory separately, there is no way to use the same implementation as in PHP5. Therefore, a is_reference type is added, and the reference value is specifically stored using zend_reference:

struct _zend_reference {    zend_refcounted   gc;    Zval              Val;};

Essentially zend_reference only adds zval to the reference count. All reference variables store a zval pointer and are marked as is_reference. Val behaves like other zval, especially if it can also share pointers to the complex variables it stores, such as arrays that can be shared between reference variables and value variables.

We still look at the example, this time is the semantics in PHP7. To be concise, it is no longer necessary to write zval alone, only to show the structure they point to:

$a = [];  $a                                     -zend_array_1 (Refcount=1, value=[]) $b =& $a;//$a, $b, Zend_reference_1 (refcount=2), Zend_ar Ray_1 (Refcount=1, value=[]) $b [] = 1; $a, Zend_array_1 (Refcount=1, Value=[1, refcount=2), Zend_reference_1, $b

In the example above, a zend_reference is created, noting that its reference count is 2 (since there are two variables using this PHP reference). But the reference count of the value itself is 1 (because zend_reference only has a pointer to it). Here's a look at the case of references and non-reference blends:

$a = [];  $a         -zend_array_1 (Refcount=1, value=[]) $b = $a;  $a, $b,    zend_array_1 (refcount=2, value=[]) $c = $b   //$a, $b, $c, Zend_array_1 (refcount=3, value=[]) $d =& $c; $a, $b-                                 zend_array_1 (refcount=3, value=[])          //$c, $d-zend_reference_1 (refcount=2)---^          // Note that all variables share the same zend_array, even if some PHP references are not $d [] = 1; $a, $b-                                 zend_array_1 (refcount=2, value=[])          //$c, $d-Zend_reference_1 (refcount=2), Zend_array _2 (Refcount=1, value=[1])          //The Zend_array will be assigned only at this time

The biggest difference here and PHP5 is that all variables can share the same array, even if some PHP references are not. An array is separated only if one of the parts is modified. This also means that when you use count (), it is safe to pass a large reference array to it, no more copying. However, the reference will still be slower than the normal value because there is a need to allocate memory for the zend_reference struct (indirect) and the engine itself handles the reason for this too.

Conclusion

Summing up the most important change in PHP7 is that Zval no longer allocates memory separately from the heap and does not store reference counts on its own. Complex types that require the use of zval pointers (such as strings, arrays, and objects) store reference counts on their own. This allows for fewer memory allocation operations, less indirect pointer usage, and less memory allocation.

In the second part of the article we will discuss the problem of complex types.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.