Implementation of variable in PHP7 (i.) _php instance

Source: Internet
Author: User
Tags constant data structures memory usage php script reserved
<?php
$array = range (0, 1000000);
$ref =& $array;
Var_dump (Count ($array)); <--will be separated here.

Due to a large number of detail descriptions, this article will be divided into two parts: the first part mainly describes the implementation of Zval (Zend value) in PHP5 and PHP7 and the implementation of the reference. The second section will analyze the details of the individual types (strings, objects).

The Zval in the PHP5

The zval structure in PHP5 is defined as follows:

typedef struct _ZVAL_STRUCT {
 zvalue_value value;
 Zend_uint refcount__gc;
 Zend_uchar type;
 Zend_uchar is_ref__gc;
} Zval;

As above, Zval contains a value, a type, and a field of two __gc suffixes. Value is a federation that stores different types of values:

typedef Union _ZVALUE_VALUE {
 long lval;     for bool type, integer, and resource type
 double dval;    Used for floating-point type
 struct {     //for string
  char *val;
  int len;
 } STR;
 HashTable *ht;    For array
 zend_object_value obj;  Used for object
 zend_ast *ast;    For constant expressions (PHP5.6 only)
} Zvalue_value;

The C Language consortium features that only one member is valid at a time and that the allocated memory matches the member that requires the most memory (also consider memory alignment). All members are stored in the same location in memory, and different values are stored as needed. When you need to lval, it stores a signed shape, when you need Dval, you store double-precision floating-point numbers.

It should be noted that the data type currently stored in the federation is logged to the Type field, marked with an integral type:

#define IS_NULL 0/* doesn ' t use value * *
#define IS_LONG 1/* Uses lval * *
#define IS_DOUBLE 2/* Uses dval * *
#define IS_BOOL 3/* Uses Lval with values 0 and 1 * *
#define IS_ARRAY 4/* Uses HT * *
#define IS_OBJECT 5/* Uses obj */
#define IS_STRING 6/* Uses STR * *
#define IS_RESOURCE 7/* Uses Lval, which is the RESOURCE ID * * *
/* Special types used for late-binding of constants * *
#define Is_constant 8
#define IS_CONSTANT_AST 9

Reference count in PHP5

In PHP5, Zval memory is allocated separately from the heap (heap), and PHP needs to know which zval are in use and which ones need to be released. So this requires reference counting: The REFCOUNT__GC value in Zval is used to save the number of times the zval itself is referenced, such as $a = $b = 42 statement, 42 is referenced by two variables, so its reference count is 2. If the reference count turns to 0, it means that the variable is no longer in use and memory can be freed.

Note that the reference count mentioned here refers not to references in PHP code (using &), but to the number of variables used. The next two need to be at the same time will use the "PHP reference" and "reference" to distinguish between the two concepts, here first ignore the PHP part.

One concept that is closely related to reference counting is "write-time Replication": For multiple references, Zaval is shared only if there is no change, and once one of the references changes the value of the Zval, a copy ("separated") is required to zval, and then the zval after replication is modified.

Here is an example of "write-time Replication" and Zval destruction:

<?php
$a = $a   -> zval_1 (Type=is_long, value=42, refcount=1)
$b = $a;//$a, $b  -> zval_1 (t Ype=is_long, value=42, refcount=2)
$c = $b//$a, $b, $c-> zval_1 (Type=is_long, value=42, refcount=3)

//following lines is about Zval separation
$a + 1;//$b, $c-> zval_1 (Type=is_long, value=42, refcount=2)
   //$a  -> zval_2 (Type=is_lo NG, value=43, refcount=1)

unset ($b);//$c-> zval_1 (Type=is_long, value=42, refcount=1)
   //$a-> zval_2 ( Type=is_long, value=43, refcount=1)

unset ($c);//Zval_1 is destroyed, because refcount=0
   //$a-> zval_2 (ty Pe=is_long, value=43, Refcount=1)

The reference count has a fatal problem: the circular reference (memory used) cannot be checked and freed. To solve this problem, PHP uses the method of recycling. When a zval count is reduced, it is possible to be part of the loop, where the Zval is written to the root buffer. When the buffer is full, the potential loops are marked and recycled.

To support recycling, the actual structure of the zval actually used is as follows:

typedef struct _ZVAL_GC_INFO {
 zval z;
 Union {
  Gc_root_buffer  *buffered;
  struct _zval_gc_info *next;
 } u;
} Zval_gc_info;

A normal zval structure is embedded in the ZVAL_GC_INFO structure, and two pointer parameters are added, but all belong to the same union u, so only one pointer is useful in actual use. The buffered pointer is used to store the reference address of the zval in the root buffer, so if Zval has been destroyed before the loop is executed, the field may be removed. Next is used when retrieving the destroyed value, and this is not going to go deep.

Revise motive

Here's what it says about memory usage, which is all about 64-bit systems. First, because Str and obj occupy the same size, Zvalue_value This consortium consumes 16 bytes (bytes) of memory. The entire zval structure occupies 24 bytes (taking into account memory alignment) and the size of the Zval_gc_info is 32 bytes. In a comprehensive sense, the amount of memory allocated to Zval in the heap (relative to the stack) requires an additional 16 bytes, so each zval is required to use 48 bytes in a different place (to understand that the above calculations need to be aware that each pointer also needs to occupy 8 bytes on a 64-bit system).

In this regard, regardless of the aspects of the consideration can be considered zval this design efficiency is very low. For example, the Zval only needs 8 bytes when storing the integral type, even if it is necessary to save some additional information and memory alignment, an additional 8 bytes should be sufficient.

When you store an integral type, you actually need 16 bytes, but there are actually 16 bytes for reference counting and 16 bytes for recycling. Therefore, zval memory allocation and release is a very large operation, we need to optimize it.

Think from this perspective: does an integral type of data really need to store reference counts, recycle information, and allocate memory separately on the heap? The answer is of course no, this approach is not good at all.

Here is a summary of the main problems in the implementation of Zval in PHP5:

Zval always allocates memory separately from the heap;

Zval always store reference counts and recycled information, even for integers that may not require data of this type of information;
When using an object or resource, a direct reference causes two counts (the reason is in the next section);
Some indirect access requires a better approach. For example, now accessing objects stored in a variable uses an indirect four pointers (the length of the pointer chain is four). This issue is also discussed in the next section;
A direct count also means that values can only be shared between Zval. It is not possible to share a string between Zval and Hashtable key (unless Hashtable key is also Zval).

The Zval in the PHP7

In the PHP7, Zval has a new way of implementation. The most fundamental change is that the memory required for Zval is no longer allocated separately from the heap, and the reference count is no longer stored on its own. Reference counts for complex data types, such as strings, arrays, and objects, are stored by themselves. This implementation has the following benefits:

Simple data types do not need to allocate memory separately, nor do they need to be counted;
There will be no more than two times counting. In an object, only the count of the object's own storage is valid;
Since the count is now stored by the value itself, it can be shared with the zval structure, such as between Zval and Hashtable key;
The number of pointers required for indirect access has decreased.

Let's look at the definition of the ZVAL structure now (now in the Zend_types.h file):

struct _zval_struct {
 zend_value  value;   /* Value
 /union {
  struct {
   zend_endian_lohi_4 (
    zend_uchar type,/   * Active Type/*
    Zend_uchar Type_flags,
    Zend_uchar const_flags,
    zend_uchar reserved)  */call info for EX (this)/
  V;
  uint32_t type_info;
 } U1;
 Union {
  uint32_t  var_flags;
  uint32_t  Next;     * * Hash Collision chain * *
  uint32_t  cache_slot;   /* Literal cache slot * *
  uint32_t  Lineno;    /* line number (for AST nodes) */
  uint32_t  Num_args;    /* Arguments number for EX (this) * * *
  uint32_t  fe_pos;    /* foreach position * *
  uint32_t  fe_iter_idx;   /* foreach iterator Index *
 /} U2;


The first element of the structure does not change much, and is still a value union. The second member is a union of the type information and a struct that contains four character variables (you can ignore the Zend_endian_lohi_4 macro, which is used to solve the problem across the platform size side). The more important part of this substructure is type (similar to the previous one) and type_flags, which is explained next.

There's a small problem with this place: value should be 8 bytes, but because of memory alignment, even just one byte is actually occupied by 16 bytes (which means that you need an extra 8 bytes in one byte). But obviously we don't need 8 bytes to store a type field, so we added a consortium called U2 at the back of the U1. is not available by default and can be used to store 4 bytes of data when used. This consortium can meet the needs of different scenarios.

The structure of value in PHP7 is defined as follows:

typedef Union _ZEND_VALUE {
 zend_long   lval;    /* Long Value
 *   /double dval;    /* Double value *
 /zend_refcounted *counted;
 Zend_string  *str;
 Zend_array  *arr;
 Zend_object  *obj;
 Zend_resource *res;
 Zend_reference *ref;
 Zend_ast_ref  *ast;
 Zval    *zv;
 void    *ptr;
 Zend_class_entry *ce;
 Zend_function *func;
 struct {
  uint32_t W1;
  uint32_t W2;
 } WW;
} Zend_value;

The first thing to note is that the value consortium now needs 8 bytes instead of 16 of memory. It simply stores the integer (lval) or floating-point (dval) data, in other cases pointers (mentioned above, the pointer occupies 8 bytes and the bottom structure consists of two 4-byte unsigned integers). All of the above pointer types (except for special tags) have an identical header (zend_refcounted) to store the reference count:

typedef struct _ZEND_REFCOUNTED_H {
 uint32_t   refcount;   /* Reference counter 32-bit
 /union {
  struct {
   zend_endian_lohi_3 (
    zend_uchar type,
    Zend_uchar Flags,/* Used for Strings & objects *
    uint16_t  gc_info)/* keeps GC root number (or 0) and color/
  } V;
  uint32_t type_info;
 } u;
} Zend_refcounted_h;

Now, this structure will certainly contain a field that stores the reference count. In addition, there are type, flags and Gc_info. Type stores the same content as the type in Zval, so that the GC uses reference counts alone without storing zval. Flags have different uses in different data types, which are put in the next section.

The buffered function in Gc_info and PHP5 is the same, but it is no longer a pointer to the root buffer, but an index number. Since the size of the previous root buffer is fixed (10,000 elements), it is sufficient to use a 16-bit (2-byte) number instead of 64-bit (8-byte) pointers. The Gc_info also contains a "color" bit that is used to mark nodes when recycling.

Zval Memory Management

As mentioned above, the memory required by Zval is no longer allocated separately from the heap. But obviously there's always a place to store it, so where does it exist? In fact, most of the time it's still in the heap (so the point in the previous article is not the heap, but it is allocated separately), but it is embedded in other data structures, such as Hashtable and bucket now directly have a zval field instead of a pointer. So the function table compilation variables and object properties are stored as a zval array and get an entire chunk of memory rather than a zval pointer scattered everywhere. The previous Zval * now become zval.

Previously, when Zval was used in a new place, a copy of the Zval * was duplicated and the reference count was added once. The Zval value is now copied directly (ignoring U2), and in some cases may increase the reference count (if counted) that its structure pointer points to.

So how does PHP know if Zval is counting? Not all data types can be known, because some types (such as strings or arrays) do not always need reference counting. So the Type_info field is used to record whether the zval is being counted, and the value of this field has the following:

#define Is_type_constant   (1<<0)/* Special * *
#define is_type_immutable   (1<<1)/* Special *
#define Is_type_refcounted   (1<<2)
#define Is_type_collectable   (1<<3)
#define Is_type_copyable   (1<<4)
#define Is_type_ Symboltable   (1<<5)/* Special *

Note: In the official version of 7.0.0, the above macro-defined annotations are for zval.u1.v.type_flags use. This should be a comment error because the above field is the Zend_uchar type.

The three main properties of Type_info are "refcounted", "recyclable" (collectable), and "replicable" (copyable). The question of counting has been mentioned above. "Recyclable" is used to mark whether Zval participates in loops, as strings are usually counted, but you can't make a circular reference to strings.

Whether it can be replicated to indicate whether it is necessary to make a copy at the time of replication (the "duplication" used in the original language, expressed in Chinese, may not be very well understood) an identical entity. "Duplication" is a deep copy, for example, when copying an array, it is more than simply adding an array's reference count, but creating an array of new values. However, some types (such as objects and resources) can only increase the reference count even if "duplication" is a type that is not replicable. This also matches the existing semantics of objects and resources (existing, PHP7, not just PHP5).

The following table shows which tags the different types will use (X-labeled Attributes). Simple types refers to integers or Boolean types that do not use pointers to a struct body. The following table also has the "immutable" (immutable) tag, which is used to mark the immutable group, which is detailed in the next section.

Interned string (reserved character) has not been mentioned before, in fact, is the function name, variable name, such as the need not count, not repeatable strings.

| refcounted | Collectable | copyable | Immutable
----------------+------------+-------------+----------+----------
Simple Types |             |          | |
string |             x |     | x |
Interned string |             |          | |
Array |      x |     x | x |
Immutable Array |             |          |     | X
Object |      x |          x | |
Resource |             x |          | |
Reference |             x |          | |

To understand this, we can look at a few examples to better understand how zval memory management works.

Here is the integer behavior pattern, which is based on the PHP5 example above:

<?php
$a = $a = Zval_1 (Type=is_long, value=42)
$b = $a;//$a = Zval_1 (Type=is_long, value=42)
   //$b = Zval_2 (Type=is_long, value=42)
$a + = 1;//$a = Zval_1 (Type=is_long, value=43)
   //$b = Zval_2 (Type=is_long, Val ue=42)
unset ($a);//$a = Zval_1 (type=is_undef)
   //$b = Zval_2 (Type=is_long, value=42)

This process is actually quite simple. Now that the integers are no longer shared, the variables are separated directly into two separate zval, and since the Zval is now embedded, there is no need to allocate memory separately, so the comment here uses = to represent rather than the pointer symbol->,unset when the variable is marked as Is_undef. Let's look at more complex situations:

<?php
$a = [];//$a = Zval_1 (Type=is_array)-> zend_array_1 (Refcount=1, value=[])
$b = $a;//$a = Zval_1 (t Ype=is_array)-> zend_array_1 (refcount=2, value=[])
   //$b = zval_2 (type=is_array)---^
//Zval separate here for
$ A[] = 1//$a = Zval_1 (Type=is_array)-> zend_array_2 (Refcount=1, value=[1])
   //$b = zval_2 (type=is_array)-> ze Nd_array_1 (Refcount=1, value=[])
unset ($a);//$a = Zval_1 (type=is_undef), zend_array_2 destroyed
   //$b = Zval_2 (type= Is_array)-> zend_array_1 (Refcount=1, value=[])

In this case, each variable variable has a separate zval, but is a struct that points to the same (with reference count) Zend_array. Copying the value of one of the arrays is not replicated. This is similar to the PHP5 situation.

Type (Types)

Let's take a look at what types of PHP7 support (type tags used by zval):

/* Regular data types/
#define IS_UNDEF     0
#define IS_NULL      1
#define IS_FALSE     2
#define Is_true      3
#define Is_long      4
#define Is_double     5
#define Is_string     6
#define Is_array     7
#define IS_OBJECT     8
#define IS_RESOURCE     9
#define Is_reference    10 /
* constant expressions/
#define     is_constant
#define IS_CONSTANT_AST    /
* Internal types */
#define     is_indirect
#define IS_PTR      17

This list is similar to the one used by PHP5, but adds a few items:

Is_undef is used to mark previously null Zval pointers (and Is_null does not conflict). For example, in the above example use unset cancellation variable;
Is_bool is now divided into Is_false and is_true two items. The boolean-type tag is now recorded directly into type, which optimizes type checking. However, whether this change is transparent to the user or only a "boolean" type of data (in PHP script).

PHP references are no longer marked with is_ref, but with is_reference types. This should also be put to the next part of the story;
Is_indirect and is_ptr are special internal markings.

There should actually be two fake types in the list above, which is ignored.

The Is_long type represents a Zend_long value rather than a LONG type of native C language. The reason is that the long type on the Windows 64-bit system (LLP64) has only a bit depth of 32 bits. So PHP5 can only use 32 digits on Windows. PHP7 allows you to use 64-bit numbers on 64-bit operating systems, even on Windows.

Zend_refcounted's content will be in the next section. Here's a look at the implementation of the PHP reference.

Reference

PHP7 uses a completely different approach to PHP5 to handle PHP & symbol references (This change is also the root cause of a lot of bugs in the PHP7 development process). Let's start with the way PHP references are implemented in PHP5.

In general, the write-time copy principle means that you need to detach a zval before you modify it to ensure that the value of a PHP variable is always modified. This is what the value call means.

However, this rule does not apply when using PHP references. If a PHP variable is a PHP reference, it means you want to point multiple PHP variables to the same value. The is_ref tag in PHP5 is used to indicate whether a PHP variable is not a PHP reference and need not be separated when modifying. Like what:

<?php
$a = [];//$a  -> zval_1 (Type=is_array, Refcount=1, is_ref=0)-> hashtable_1 (value=[))
$b =& Amp $a; $a, $b-> zval_1 (Type=is_array, refcount=2, is_ref=1)-> hashtable_1 (value=[))

$b [] = 1;//$a = $b = Zval_1 ( Type=is_array, refcount=2, is_ref=1)-> hashtable_1 (value=[1))
   //Because the value of Is_ref is 1, PHP does not separate zval


But a big problem with this design is that it doesn't share the same value between a PHP reference variable and a PHP-referenced variable. For example, the following situation:

<?php
$a = [];//$a   -> zval_1 (Type=is_array, Refcount=1, is_ref=0)-> hashtable_1 (value=[))
$b = $a ; $a, $b  -> zval_1 (Type=is_array, refcount=2, is_ref=0)-> hashtable_1 (value=[))
$c = $b//$a, $b, $c ; Zval_1 (Type=is_array, refcount=3, is_ref=0)-> hashtable_1 (value=[))
$d =& $c;//$a, $b-> zval_1 (type=IS_ ARRAY, refcount=2, is_ref=0)-> hashtable_1 (value=[])
   //$c, $d-> zval_1 (Type=is_array, refcount=2, is_ref=1) -> hashtable_2 (value=[])
   //$d is a reference to the $c, but it is not a $a $b, so here Zval still need to replicate
   //So we have two zval, a is_ref value of 0, a The value of a is_ref is 1.
$d [] = 1; $a, $b-> zval_1 (Type=is_array, refcount=2, is_ref=0)-> hashtable_1 (value=[])
   //$c, $d-> zval_1 (Type=I S_array, refcount=2, is_ref=1)-> hashtable_2 (value=[1])
   //Because there are two separate zval, statements $d [] = 1 Do not modify the values of $a and $b.

This behavior also causes the use of references in PHP to be slower than normal values. For example, the following example:

<?php
$array = range (0, 1000000);
$ref =& $array;
Var_dump (Count ($array)); <--will be separated here.

Because count () only accepts a call to a value, but $array is a PHP reference, count () actually has a complete copy of the array before it is executed. This will not happen if $array is not a reference.

Now let's look at the implementation of PHP references in PHP7. Because Zval no longer allocate memory alone, there is no way to use the same implementation as in PHP5. So a is_reference type is added and the zend_reference is used specifically to store the reference value:

struct _zend_reference {
 zend_refcounted gc;
 Zval    Val;

Essentially zend_reference only adds a reference count of Zval. All reference variables store a zval pointer and are marked as is_reference. Val has the same behavior as other zval, especially if it can share pointers to the complex variables it stores, such as arrays that can be shared between reference variables and value variables.

Let's look at the example, this time the semantics in PHP7. To make it clear that there is no longer writing zval alone, show only the structures they point to:

<?php
$a = [];//$a          -> zend_array_1 (refcount=1, value=[])
$b =& $a;/$a, $b-> zend_reference_ 1 (refcount=2)-> zend_array_1 (Refcount=1, value=[])
$b [] = 1;//$a, $b-> zend_reference_1 (refcount=2)-> ze Nd_array_1 (Refcount=1, value=[1])

The reference pass in the example above creates a zend_reference, noting that its reference count is 2 (since two variables are using this PHP reference). But the reference count of the value itself is 1 (because zend_reference only has a pointer to it). Here's a look at the combination of references and non-references:

<?php
$a = [];//$a   -> zend_array_1 (refcount=1, value=[])
$b = $a;//$a, $b,-> zend_array_1 (Refcou nt=2, value=[])
$c = $b//$a, $b, $c-> zend_array_1 (refcount=3, value=[])
$d =& $c;//$a, $b         -> Z End_array_1 (refcount=3, value=[])
   //$c, $d-> zend_reference_1 (refcount=2)---^
   //Note all variables share the same zend_array, Even if some PHP references are not
$d [] = 1;//$a, $b         -> zend_array_1 (refcount=2, value=[))
   //$c, $d-> zend_reference _1 (refcount=2)-> zend_array_2 (Refcount=1, value=[1])
   //Only assigns values to the Zend_array when the assignment is made

The biggest difference here and PHP5 is that all variables can share the same array, even if some are PHP references or not. The array is separated only when one of the parts is modified. This also means that when count () is used, it is safe to pass a large reference array to it, and no more copying is done. However, references are still slower than normal values because there is a reason to allocate memory (indirection) to the zend_reference structure and the engine itself to handle this together.

Conclusion

To sum up, the most important change in PHP7 is that Zval no longer allocates memory separately from the heap and does not store reference counts on its own. Complex types that require the use of zval pointers, such as strings, arrays, and objects, store reference counts themselves. This allows for less memory allocation, less indirect pointer use, and less memory allocation.

In the next article to introduce variables in the PHP7 internal implementation (ii), interested friends continue to pay attention.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.