Variable implementation in PHP7 (1), _ PHP Tutorial

Source: Internet
Author: User
Variable implementation in PHP7 (1 ),. Implementation of variables in PHP7 (1), php $ arrayrange (bytes 00); $ refvar_dump (count ($ array); -- separation will be performed here due to a large amount of detailed descriptions, this article will be divided into two variables in PHP7 implementation (1 ),
<? Php $ array = range (0, 1000000); $ ref = & $ array; var_dump (count ($ array); // <-- separate here

Due to a large number of detailed descriptions, this article will be divided into two parts: the first part mainly describes the implementation of zval (zend value) in PHP5 and PHP7 and the implementation of reference. The second part will analyze the details of individual types (strings, objects.

Zval in PHP5

The zval struct in PHP5 is defined as follows:

typedef struct _zval_struct { zvalue_value value; zend_uint refcount__gc; zend_uchar type; zend_uchar is_ref__gc;} zval;

As shown above, zval contains a value, a type, and two _ gc suffixes. Value is a consortium used to store different types of values:

Typedef union _ zvalue_value {long lval; // used for the bool type, integer type, and resource type double dval; // used for the floating point type struct {// used for the string char * val; int len ;} str; HashTable * ht; // used for the array zend_object_value obj; // used for the object zend_ast * ast; // used for constant expressions (only available in PHP5.6)} zvalue_value;

The feature of C-language consortium is that only one member is valid at a time and the allocated memory matches the members that require the most memory (memory alignment should also be considered ). All members are stored in the same location of the memory and different values are stored as needed. When you need lval, it stores signed integers. when you need dval, it stores double-precision floating point numbers.

It should be noted that the data type currently stored in the consortium is recorded in the type field and marked with an integer:

# Define IS_NULL 0/* Doesn't use value */
# Define IS_LONG 1/* Uses lval */
# Define IS_DOUBLE 2/* Uses dval */
# Define IS_BOOL 3/* Uses lval with values 0 and 1 */
# Define IS_ARRAY 4/* Uses ht */
# Define IS_OBJECT 5/* Uses obj */
# Define IS_STRING 6/* Uses str */
# Define IS_RESOURCE 7/* Uses lval, which is the resource ID */
/* Special types used for late-binding of constants */
# Define IS_CONSTANT 8
# Define IS_CONSTANT_AST 9

Reference count in PHP5

In PHP5, zval memory is allocated separately from the heap (with a few exceptions). PHP needs to know which zval is in use and which need to be released. Therefore, the reference count is used: the refcount _ gc value in zval is used to save the number of times zval itself is referenced. for example, in the $ a = $ B = 42 statement, 42 is referenced by two variables, so its reference count is 2. If the reference count is changed to 0, the variable is useless and the memory can be released.

Note that the reference count mentioned here is not the reference (use &) in PHP code, but the number of times the variable is used. When both of the following concepts need to appear at the same time, "PHP reference" and "reference" will be used to distinguish between the two concepts. here, the part of PHP is ignored first.

A concept closely related to the reference count is "write-time replication": for multiple references, zaval is shared only when there is no change, once one of the references changes the zval value, you need to copy ("separated") a zval and then modify the zval after replication.

The following is an example of "copy at Write" and zval destruction:

<? Php $ a = 42; // $ a-> zval_1 (type = IS_LONG, value = 42, refcount = 1) $ B = $ a; // $, $ B-> zval_1 (type = IS_LONG, value = 42, refcount = 2) $ c = $ B; // $ a, $ B, $ c-> zval_1 (type = IS_LONG, value = 42, refcount = 3) // the following lines are about zval separation $ a + = 1; // $ B, $ c-> zval_1 (type = IS_LONG, value = 42, refcount = 2) // $ a-> zval_2 (type = IS_LONG, value = 43, refcount = 1) unset ($ B); // $ c-> zval_1 (type = IS_LONG, value = 42, refcount = 1) // $ a-> zval_2 (type = IS_LONG, value = 43, refcount = 1) unset ($ c); // zval_1 is destroyed, because refcount = 0 // $ a-> zval_2 (type = IS_LONG, value = 43, refcount = 1)

There is a fatal issue with reference count: it is impossible to check and release cyclic references (memory used ). To solve this problem, PHP uses the recycling method. When the count of a zval is reduced by one time, it may be part of the loop, and zval is written to the "root buffer. When the buffer is full, the potential loop is marked and recycled.

To support recycling, the actual structure of zval is actually as follows:

typedef struct _zval_gc_info { zval z; union {  gc_root_buffer  *buffered;  struct _zval_gc_info *next; } u;} zval_gc_info;

A normal zval structure is embedded in the zval_gc_info struct, and two pointer parameters are added, but they belong to the same consortium u. Therefore, only one pointer is useful in actual use. The buffered pointer is used to store the reference address of zval in the root buffer. Therefore, if zval has been destroyed before the cyclic recycle execution, this field may be removed. Next is used when the destruction value is recycled. it will not be used here.

Motivation for modification

Next we will talk about the memory usage in a 64-bit system. First, because str and obj occupy the same size, zvalue_value occupies 16 bytes of memory. The memory occupied by the entire zval struct is 24 bytes (considering memory alignment), and zval_gc_info is 32 bytes. In summary, the memory allocated to zval in the heap (relative to the stack) requires an additional 16 bytes, therefore, each zval requires a total of 48 bytes in different places (to understand the above calculation method, note that each pointer also requires 8 bytes in a 64-bit system ).

In this regard, we can think that zval's design efficiency is very low. For example, zval only needs 8 bytes to store integer data. even if additional information and memory alignment are required, it is sufficient to add 8 bytes.

It actually requires 16 bytes to store integer data, but there are still 16 bytes for reference counting and 16 bytes for recycling. Therefore, zval memory allocation and release all consume a lot of operations, and we need to optimize it.

From this point of view: Does an integer data really need to store reference counting and recycling information and allocate memory separately on the stack? The answer is, of course, no. this processing method is not good at all.

Here we will summarize the main problems in zval implementation in PHP5:

Zval always allocates memory separately from the heap;

Zval always stores reference counting and recycling information, even data that may not need such information, such as integer data;
Directly referencing an object or resource causes two counts (the reason is described in the next section );
Some indirect accesses require a better processing method. For example, four pointers are indirectly used to access objects stored in variables (the length of the pointer chain is four ). This issue is also discussed in the next section;
Directly counting means that the values can only be shared between zval values. If you want to share a string between zval and hashtable key, it will not work (unless the hashtable key is also zval ).

Zval in PHP 7

In PHP7, zval has a new implementation method. The most basic change is that zval no longer requires memory allocation from the heap, and no longer stores reference counts by itself. The reference count of complex data types (such as strings, arrays, and objects) is stored by itself. This implementation method has the following benefits:

For simple data types, you do not need to allocate memory separately or count;
There will be no more two counts. In an object, only the count stored by the object itself is valid;
Because the current count is stored by the numeric value itself, data can be shared with non-zval structures, such as between zval and hashtable key;
The number of pointers required for indirect access is reduced.

Let's take a look at the definition of the zval struct (now in the zend_types.h file ):

struct _zval_struct { zend_value  value;   /* value */ union {  struct {   ZEND_ENDIAN_LOHI_4(    zend_uchar type,   /* active type */    zend_uchar type_flags,    zend_uchar const_flags,    zend_uchar reserved)  /* call info for EX(This) */  } v;  uint32_t type_info; } u1; union {  uint32_t  var_flags;  uint32_t  next;     /* hash collision chain */  uint32_t  cache_slot;   /* literal cache slot */  uint32_t  lineno;    /* line number (for ast nodes) */  uint32_t  num_args;    /* arguments number for EX(This) */  uint32_t  fe_pos;    /* foreach position */  uint32_t  fe_iter_idx;   /* foreach iterator index */ } u2;};

The first element of the struct remains a value consortium. The second member is a consortium composed of an integer representing the type information and a struct containing four character variables (the ZEND_ENDIAN_LOHI_4 macro can be ignored, which is only used to solve the cross-platform size problem ). The more important parts of this sub-structure are type (similar to the previous one) and type_flags, which will be explained later.

The above also has a small problem: value should have occupied 8 bytes, but even if only one byte is added due to memory alignment, it actually occupies 16 bytes (an additional 8 bytes is required when one byte is used ). But obviously we don't need 8 bytes to store a type field, so we added a Union named u2 after U1. It is not used by default and can be used to store 4 bytes of data when needed. This combination can meet the needs of different scenarios.

The value structure in PHP 7 is defined as follows:

typedef union _zend_value { zend_long   lval;    /* long value */ double   dval;    /* double value */ zend_refcounted *counted; zend_string  *str; zend_array  *arr; zend_object  *obj; zend_resource *res; zend_reference *ref; zend_ast_ref  *ast; zval    *zv; void    *ptr; zend_class_entry *ce; zend_function *func; struct {  uint32_t w1;  uint32_t w2; } ww;} zend_value;

Note that the memory required by the value consortium is 8 bytes instead of 16. It only directly stores integer (lval) or floating point (dval) data. In other cases, it is a pointer (as mentioned above, the pointer occupies 8 bytes, the bottom struct consists of two 4-byte unsigned integers ). All the above pointer types (except those with special tags) have the same header (zend_refcounted) used to store the reference count:

typedef struct _zend_refcounted_h { uint32_t   refcount;   /* reference counter 32-bit */ union {  struct {   ZEND_ENDIAN_LOHI_3(    zend_uchar type,    zend_uchar flags, /* used for strings & objects */    uint16_t  gc_info) /* keeps GC root number (or 0) and color */  } v;  uint32_t type_info; } u;} zend_refcounted_h;

Now, this struct will certainly contain a field that stores the reference count. In addition, there are type, flags, and gc_info. Type stores the same content as type in zval, so that GC uses reference count independently without storing zval. Flags have different purposes in different data types. here we will talk about it in the next section.

The buffered function in gc_info and PHP5 is the same, but it is no longer a pointer in the root buffer, but an index number. Because the size of the original root buffer is fixed (10000 elements), it is sufficient to use a 16-bit (2 bytes) number to replace the 64-bit (8 bytes) pointer. Gc_info also contains a "color" bit used to mark nodes when recycling.

Zval memory management

As mentioned above, the memory required by zval is no longer allocated separately from the stack. But it is clear that there is always a place to store it, so where will it exist? In fact, most of the time it is still located in the heap (so the point mentioned earlier is not the heap, but it is allocated separately), but it is embedded in other data structures, for example, hashtable and bucket now have a zval field instead of a pointer. Therefore, the function table compilation variables and object attributes are stored as a zval array and get a whole block of memory instead of zval pointers scattered everywhere. The previous zval * is now zval.

Previously, when zval is used in a new place, zval * is copied and a reference count is added. Copy the zval value directly now (ignore u2). In some cases, it may increase the reference count pointed to by its structure pointer (if counting ).

So how does PHP know whether zval is being counted? Not all data types are known, because some types (such as strings or arrays) do not always need to be referenced for counting. Therefore, the type_info field is used to record whether zval is counting. The value of this field has the following situations:

#define IS_TYPE_CONSTANT   (1<<0) /* special */#define IS_TYPE_IMMUTABLE   (1<<1) /* special */#define IS_TYPE_REFCOUNTED   (1<<2)#define IS_TYPE_COLLECTABLE   (1<<3)#define IS_TYPE_COPYABLE   (1<<4)#define IS_TYPE_SYMBOLTABLE   (1<<5) /* special */

Note: In the official version of 7.0.0, the macro-defined annotations mentioned above are used by zval. u1.v. type_flags. This should be an annotation error because the above field is of the zend_uchar type.

The three main attributes of type_info are refcounted, collectable, and copyable ). The counting problem has been mentioned above. "Recyclability" is used to mark whether zval is involved in a loop. it is not as good as that the string is usually count, but you cannot create a circular reference for the string.

Whether the copy can be used to indicate whether the copy needs to be made during the copy (the original article is expressed by "duplication", which may not be well understood in Chinese. "Duplication" is a type of in-depth replication. for example, when copying an array, it creates an array with the same new values instead of simply adding the reference count of the array. However, some types (such as objects and resources), even "duplication", can only increase the reference count. This type is not reproducible. This also matches the existing semantics of objects and resources (the same is true for PHP7, not just PHP5 ).

The following table lists the tags used by different types (x indicates all features ). Simple types refer to integer or Boolean types that do not point to a struct using pointers. The table below also contains the immutable mark, which is used to mark the immutable array, which will be detailed in the next section.

Interned string (reserved character) has not been mentioned before. it is actually a function name, variable name, and other non-counting and non-repeating strings.

| Refcounted | collectable | copyable | immutable
---------------- + ------------ + ------------- + ----------
Simple types |
String | x |
Interned string |
Array | x |
Immutable array | x
Object | x |
Resource | x |
Reference | x |

To understand this, let's look at several examples to better understand how zval memory management works.

The following is an integer behavior pattern, which is simplified based on the PHP5 example above:

<?php$a = 42; // $a = zval_1(type=IS_LONG, value=42)$b = $a; // $a = zval_1(type=IS_LONG, value=42)   // $b = zval_2(type=IS_LONG, value=42)$a += 1; // $a = zval_1(type=IS_LONG, value=43)   // $b = zval_2(type=IS_LONG, value=42)unset($a); // $a = zval_1(type=IS_UNDEF)   // $b = zval_2(type=IS_LONG, value=42)

This process is actually quite simple. Now integers are no longer shared, and variables are directly separated into two separate zval values. because zval is embedded, you do not need to allocate memory separately, therefore, the annotation here uses = instead of the pointer symbol->. When unset is used, the variable is marked as IS_UNDEF. Let's take a look at the more complex situation:

<? Php $ a = []; // $ a = zval_1 (type = IS_ARRAY)-> zend_array_1 (refcount = 1, value = []) $ B = $; // $ a = zval_1 (type = IS_ARRAY)-> zend_array_1 (refcount = 2, value = []) // $ B = zval_2 (type = IS_ARRAY) --- ^ // zval separation $ a [] = 1 // $ a = zval_1 (type = IS_ARRAY)-> zend_array_2 (refcount = 1, value = [1]) // $ B = zval_2 (type = IS_ARRAY)-> zend_array_1 (refcount = 1, value = []) unset ($ ); // $ a = zval_1 (type = IS_UNDEF), zend_array_2 destroyed // $ B = zval_2 (type = IS_ARRAY)-> zend_array_1 (refcount = 1, value = [])

In this case, each variable has a single zval, but it is a struct pointing to the same (with reference count) zend_array. Only when the value of one of the arrays is modified. This is similar to PHP5.

Type (Types)

Let's take a look at the types supported by PHP7 (type tags used by zval ):

/* regular data types */#define IS_UNDEF     0#define IS_NULL      1#define IS_FALSE     2#define IS_TRUE      3#define IS_LONG      4#define IS_DOUBLE     5#define IS_STRING     6#define IS_ARRAY     7#define IS_OBJECT     8#define IS_RESOURCE     9#define IS_REFERENCE    10/* constant expressions */#define IS_CONSTANT     11#define IS_CONSTANT_AST    12/* internal types */#define IS_INDIRECT     15#define IS_PTR      17

This list is similar to that used by PHP5, but several items are added:

IS_UNDEF is used to mark the zval pointer that was previously NULL (it does not conflict with IS_NULL ). For example, in the preceding example, unset is used to deregister a variable;
IS_BOOL is now divided into IS_FALSE and IS_TRUE. Now the Boolean mark is directly recorded in the type, so we can optimize the type check. However, this change is transparent to users, and there is only one "boolean" type of data (in PHP scripts ).

The is_ref type is used for PHP reference instead of is_ref. This should also be discussed in the next part;
IS_INDIRECT and IS_PTR are special internal tags.

In fact, there should be two fake types in the above list, which is ignored here.

The IS_LONG type indicates the value of zend_long, rather than the long type of native C language. The reason is that the long type on Windows 64-bit system (LLP64) has only 32-bit depth. Therefore, PHP5 can only use 32-bit numbers on Windows. PHP7 allows you to use 64-bit numbers on 64-bit operating systems, even on Windows.

The content of zend_refcounted will be discussed in the next section. Let's take a look at the implementation of PHP reference.

Reference

PHP 7 uses a completely different method from PHP5 to handle PHP & symbol reference (this change is also the root cause of a large number of bugs in PHP 7 development ). Let's start with the implementation method referenced by PHP in PHP5.

Generally, the copy-on-write principle means that before you modify a zval, you need to separate it to ensure that only the value of a certain PHP variable is always modified. This is the meaning of the value transfer call.

However, this rule does not apply when PHP is used for reference. If a PHP variable is referenced by PHP, it means that you want to direct multiple PHP variables to the same value. The is_ref mark in PHP5 is used to indicate whether a PHP variable is referenced by PHP. you do not need to separate it when modifying the variable. For example:

<? Php $ a = []; // $ a-> zval_1 (type = IS_ARRAY, refcount = 1, is_ref = 0)-> HashTable_1 (value = []) $ B = & $ a; // $ a, $ B-> zval_1 (type = IS_ARRAY, refcount = 2, is_ref = 1)-> HashTable_1 (value = []) $ B [] = 1; // $ a = $ B = zval_1 (type = IS_ARRAY, refcount = 2, is_ref = 1)-> HashTable_1 (value = [1]) // because the is_ref value is 1, PHP will not separate zval.


However, a major problem with this design is that it cannot share the same value between a PHP reference variable and a PHP non-reference variable. For example:

<? Php $ a = []; // $ a-> zval_1 (type = IS_ARRAY, refcount = 1, is_ref = 0)-> HashTable_1 (value = []) $ B = $ a; // $ a, $ B-> zval_1 (type = IS_ARRAY, refcount = 2, is_ref = 0)-> HashTable_1 (value = []) $ c = $ B // $ a, $ B, $ c-> zval_1 (type = IS_ARRAY, refcount = 3, is_ref = 0)-> HashTable_1 (value = []) $ d = & $ c; // $ a, $ B-> zval_1 (type = IS_ARRAY, refcount = 2, is_ref = 0)-> HashTable_1 (value = []) // $ c, $ d-> zval_1 (type = IS_ARRAY, refcount = 2, is_ref = 1)-> HashTable_2 (value = []) // $ d is a reference of $ c, but it is not $ B of $ a, so zval still needs to be copied here // so we have two zval, the value of an is_ref is 0, and the value of an is_ref is 1. $ d [] = 1; // $ a, $ B-> zval_1 (type = IS_ARRAY, refcount = 2, is_ref = 0)-> HashTable_1 (value = []) // $ c, $ d-> zval_1 (type = IS_ARRAY, refcount = 2, is_ref = 1)-> HashTable_2 (value = [1]) // because there are two separated zval statements, $ d [] = 1 won't change the values of $ a and $ B.

This behavior also causes the reference in PHP to be slower than the normal value. For example:

<? Php $ array = range (0, 1000000); $ ref = & $ array; var_dump (count ($ array); // <-- separate here

Because count () only accepts the call to pass the value, but $ array is a PHP reference, so count () actually has a complete copy process for the array before execution. If $ array is not referenced, this will not happen.

Now let's take a look at the implementation of PHP reference in PHP 7. Because zval does not allocate memory separately, zval cannot use the same implementation as PHP5. Therefore, an IS_REFERENCE type is added and zend_reference is used to store reference values:

struct _zend_reference { zend_refcounted gc; zval    val;};

In essence, zend_reference only adds the zval of the reference count. All referenced variables store a zval pointer and are marked as IS_REFERENCE. Val is the same as other zval operations, especially when it shares the pointer to the complex variables it stores. for example, an array can be shared between referenced variables and value variables.

Let's look at the example. this is the semantics in PHP 7. To make it simple and clear, zval is no longer written separately. it only shows the struct to which they direct:

<?php$a = []; // $a          -> zend_array_1(refcount=1, value=[])$b =& $a; // $a, $b -> zend_reference_1(refcount=2) -> zend_array_1(refcount=1, value=[])$b[] = 1; // $a, $b -> zend_reference_1(refcount=2) -> zend_array_1(refcount=1, value=[1])

In the above example, a zend_reference will be created during reference transfer. Note that its reference count is 2 (because two variables are being referenced using this PHP ). But the reference count of the value itself is 1 (because zend_reference only has a pointer pointing to it ). Let's take a look at the mixed situations of reference and non-reference:

<? Php $ a = []; // $ a-> zend_array_1 (refcount = 1, value = []) $ B = $ a; // $ a, $ B, -> zend_array_1 (refcount = 2, value = []) $ c = $ B // $ a, $ B, $ c-> zend_array_1 (refcount = 3, value = []) $ d = & $ c; // $ a, $ B-> zend_array_1 (refcount = 3, value = []) // $ c, $ d-> zend_reference_1 (refcount = 2) --- ^ // note that all variables share the same zend_array, even if some of the variables referenced by PHP are not $ d [] = 1; // $, $ B-> zend_array_1 (refcount = 2, value = []) // $ c, $ d-> zend_reference_1 (refcount = 2)-> zend_array_2 (refcount = 1, value = [1]) // assign a value to zend_array only when the value is assigned.

The biggest difference between this and PHP5 is that all variables can share the same array, even if some are referenced by PHP. Arrays are separated only when a part of them is modified. This also means that when count () is used, even if a large referenced array is passed to it, it is safe and will not be copied. However, references are still slower than normal values, because the zend_reference struct needs to be allocated memory (indirectly) and the engine itself cannot process the issue.

Conclusion

To sum up, the most important change in PHP 7 is that zval no longer allocates memory separately from the heap and does not store reference counts by itself. Complex types that require zval pointers (such as strings, arrays, and objects) store reference counts by themselves. In this way, less memory allocation operations, less indirect pointer usage, and less memory allocation can be performed.

In the next article, we will introduce the implementation of variables in PHP 7 (2). If you are interested, continue to pay attention to them.

Articles you may be interested in:
  • Migrate PHP to PHP7
  • PHP7.0 remarks
  • Compile and install PHP7 development environment on Mac
  • PHP7.0 installation notes
  • Major new features of PHP 7
  • In-depth analysis of new features of PHP7.0 (five new features)
  • PHP 7 official version test, amazing performance!
  • Variable implementation in PHP7 (2)

Internal Implementation of PHP7 (1), php $ array = range (0, 1000000); $ ref = var_dump (count ($ array )); // -- this will be separated. due to a large number of detailed descriptions, this article will be divided into two parts...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.