PHP7 has been released, such as commitment, I also want to start the writing of this series of articles, today I would like to talk with you about zval changes. Before we talk about zval change, let's see what zval looks like under PHP5.
PHP5
Zval Review
At the time of PHP5, Zval is defined as follows:
struct _zval_struct {union {long lval; double dval; struct {char *val; int len; } str; HashTable *ht; Zend_object_value obj; Zend_ast *ast; } value; Zend_uint refcount__gc; Zend_uchar type; Zend_uchar is_ref__gc;};
Students who know about the PHP5 kernel should be familiar with this structure, because zval can represent all data types in PHP, so it contains a type field that represents what type of value the Zval stores, and the common possible options are Is_null, Is_long, Is_ STRING, Is_array, Is_object and so on.
Depending on the value of the Type field, we need to interpret value in a different way, which is a consortium, for example, if the type is is_string, then we should use VALUE.STR to interpret the Zval.value field, and if type is Is_ LONG, then we will use Value.lval to interpret.
In addition, we know that PHP is a reference count to do basic garbage collection, so there is a refcount__gc field in Zval, which represents the number of references to this zval, but here is one to illustrate, before 5.3, the name of this field is also called RefCount, 5.3 later, In the introduction of a new garbage collection algorithm to counter the circular reference count, the author added a large number of macros to operate RefCount, in order to make the error faster appearance, so renamed to REFCOUNT__GC, forcing everyone to use the macro to operate the RefCount.
Similarly, is_ref, this value indicates whether a type in PHP is a reference, and here we can see if the reference is a flag bit.
This is the PHP5 era of Zval, in 2013 we do PHP5 Opcache JIT, because the JIT in the actual project poor performance, we turn to realize the structure of a lot of problems. And the Phpng project starts with rewriting the structure.
Problems that exist
PHP5 's zval definition was born with the Zend Engine 2, and over time, the limitations of the design were becoming more apparent:
First the size of this struct is (in 64-bit system) 24 bytes, we look closely at this Zval.value consortium, where Zend_object_value is the largest long plate, it causes the entire value to need 16 bytes, this should be easy to optimize off, For example, move it out and use a pointer instead, because Is_object is not the most common type after all.
Second, each field of this struct has a clear meaning definition, no custom fields are reserved, which leads to a lot of optimizations in the PHP5 era, when it is necessary to store some and zval related information, we have to use other structure mapping, or external packaging to patch the way to expand Zval, For example, when a new GC is introduced specifically to solve circular references at 5.3, it should not use the following comparison hack practices:
/* The following macroses override macroses from Zend_alloc.h */#undef alloc_zval#define alloc_zval (z) \ Do { \
(z) = (zval*) emalloc (sizeof (Zval_gc_info)); \ Gc_zval_init (z); \ } while (0)
It hijacked the distribution of zval with Zval_gc_info:
typedef struct _ZVAL_GC_INFO { Zval z; Union { Gc_root_buffer *buffered; struct _zval_gc_info *next; } u ;} Zval_gc_info;
Then use Zval_gc_info to expand the zval, so in fact, we apply for a zval in the PHP5 era is actually allocated 32 bytes, but in fact the GC only need to care about Is_array and is_object type, which leads to a lot of memory waste.
Also like the taint extension I've done before, I need to store some tokens for some strings, there's no place in zval to use them, so I have to use the very means:
Z_STRVAL_PP (Ppzval) = Erealloc (Z_strval_pp (Ppzval), z_strlen_pp (ppzval) + 1 + php_taint_magic_length); Php_taint_mark (*ppzval, php_taint_magic_possible);
is to extend the length of the string to an int, and then use the magic number to mark the back, so that the security and stability of the practice is not technically guaranteed
Third, most of PHP's zval are passed by value, copy the value when writing, but there are two exceptions, that is, objects and resources, they are always passed by reference, which creates a problem, objects and resources in addition to the reference count in Zval, but also need a global reference count, This ensures that the memory can be recycled. So in the era of PHP5, for example, it has two sets of reference counts, one is Zval, the other is the count of obj itself:
typedef struct _ZEND_OBJECT_STORE_BUCKET { zend_bool destructor_called; Zend_bool valid; Union _store_bucket {struct _store_object {void *object; zend_objects_store_dtor_t dtor; zend_objects_free_object_ storage_t Free_storage; zend_objects_store_clone_t clone; Const Zend_object_handlers *handlers; Zend_uint RefCount; Gc_root_buffer *buffered; } Obj struct {int next; } free_list; } bucket;} zend_object_store_bucket;
In addition to the two sets of references mentioned above, if we are going to get an object, we need to do this in the following way:
EG (Objects_store). Object_buckets[z_obj_handle_p (Z)].bucket.obj
After a lengthy memory read, the real Objec object itself can be obtained. Efficiency imaginable.
All this is because the Zend engine was originally designed, and did not take into account later objects. A good design, once the accident, will cause the entire structure to become complex, the maintenance of reduced, this is a good example.
Finally, we know that in PHP, a lot of computations are string-oriented, but since the reference count is Zval, it will cause us to copy the string only if we want to copy a string type of zval. When we add a zval string as a key to an array, we have no choice but to copy the string. Although we introduced the interned STRING in PHP5.4, we could not solve the problem at all.
Also for example, PHP a large number of structures are based on the implementation of Hashtable, adding and removing changes to the operation of Hashtable occupy a lot of CPU time, and the string to find the first request its hash value, theoretically we can completely calculate a string hash value, then save it, Avoid recalculation, etc.
Five, this is about the quoted, PHP5 era, we use write-time separation, but the combination to the reference here has a classic performance problem:
When we call Array_count, it's just a simple pass-through, but since $b is a reference, it has to be separated, causing the array to replicate, which greatly drags the chronic energy, here's a simple test:
We run this example at 5.6 and get the following result:
$ php-5.6/sapi/cli/php/tmp/1.phpused 0.00045204162597656SUsed 4.2051479816437S
10,000 times times the difference. This makes it possible to trigger this problem if, in a large piece of code, I accidentally turn a variable into a reference (such as foreach as & $v), which can cause serious performance problems, but it is difficult to troubleshoot.
Six, and the most important one, why is it important? Because this contributed to a lot of performance improvements, we used to call Make_std_zval in the PHP5 era to allocate a zval on the heap memory, and then manipulate it, and finally pass the Zval value "copy" to Return_value by Return_zval , and then destroy the Zval, such as the PathInfo function:
Php_function (pathinfo) {..... Make_std_zval (TMP); Array_init (TMP), ... if (opt = = Php_pathinfo_all) {return_zval (tmp, 0, 1); } else {...}
This TMP variable is completely a temporary variable, why should we allocate it in heap memory? Make_std_zval/alloc_zval in the PHP5, everywhere, is a very common usage, if we can put this variable stack allocation, whether it is memory allocation, or cache friendly, are very advantageous
There are many, I do not list in detail, but I believe you have the same idea as we did, Zval must be changed, right?
PHP7
Now the Zval
In the PHP7, Zval became the following structure, to illustrate that this is the structure of the present, and phpng time have some differences, because we added some new explanations (Union of Fields), but the overall size, structure, and phpng time consistent:
struct _zval_struct {union {Zend_long lval;/* Long value */double dval;/* Double value */Zend_refcoun Ted *counted; Zend_string *str; Zend_array *arr; Zend_object *obj; Zend_resource *res; Zend_reference *ref; Zend_ast_ref *ast; Zval *zv; void *ptr; Zend_class_entry *ce; Zend_function *func; struct {uint32_t w1; uint32_t W2; } ww; } value; Union {struct {zend_endian_lohi_4 (Zend_uchar type,/* Active type */Zend_uchar type_flags, Zend_uchar const_flags, Zend_uchar reserved)/* Call info for EX (this) */} V; uint32_t Type_info; } U1; Union {uint32_t var_flags; uint32_t next;/* Hash collision chain */uint32_t cache_slot;/* Literal cache slot */UInt32 _t Lineno; /* line number (for AST nodes) */uint32_t Num_args; /* Arguments number for EX (this) */uint32_t Fe_pos; /* foreach position */uint32_t Fe_iter_idx; /* foreach iterator Index */} U2;};
Although it seems to be very big, but you look closely, all is a consortium, this new Zval in 64-bit environment, now only requires 16 bytes (2 pointer size), it is divided into two parts, value and extension fields, and the extension field is divided into U1 and U2 two parts, Where U1 is type info, U2 is a variety of auxiliary fields.
Where the value part, is a size_t size (a pointer size), can hold a pointer, or a long, or a double.
The Type Info section preserves the zval. The extension helper field will be used in several other places, such as Next, used in place of the original zipper pointer in Hashtable, this part will be introduced in the later Hashtable of the time to explain.
Type
The type of zval in PHP7 has been adjusted to a larger extent, with the following 17 types in general:
/* Regular data types */#define IS_UNDEF 0 #define IS_NULL 1 #define IS_FALSE 2 #define IS_TRUE 3 #define IS_LONG 4 #defi NE is_double 5 #define is_string 6 #define Is_array 7 #define IS_OBJECT 8 #define IS_RESOURCE 9 #define IS_REFERENCE 10/* Constant expressions */#define IS_CONSTANT #define IS_CONSTANT_AST//FAKE types * * #define _IS_BOOL #define IS _callable/* Internal types */#define IS_INDIRECT #define IS_PTR 17
The Is_bool type, which PHP5, is now split into Is_false and is_true types. Whereas the original reference is a flag bit, now the reference is a new type.
For Is_indirect and Is_ptr, these two types are used in-house retention types, which are not perceived by the user, and are described later in the Hashtable.
Starting with PHP7, for the values that can be saved in the Value field of Zval, they are no longer referenced, but are assigned directly at the time of copy, thus eliminating a large number of reference count related operations, some of which are:
Is_longis_double
Of course, there is no value at all, only type types, and no reference counting is required:
Is_nullis_falseis_true
For complex types, where a size_t is not preserved, then we use value to hold a pointer to this specific value, and the reference count is used for that value instead of acting on zval. Take Is_array as an example:
struct _zend_array { Zend_refcounted_h gc; Union {struct {zend_endian_lohi_4 ( Zend_uchar flags, Zend _uchar napplycount, zend_uchar niteratorscount, Zend_uchar reserve ) } v; uint32_t flags; } U uint32_t Ntablemask; Bucket *ardata; uint32_t nnumused; uint32_t nnumofelements; uint32_t ntablesize; uint32_t ninternalpointer; Zend_long nnextfreeelement; dtor_func_t pdestructor;};
Zval.value.arr will point to a struct of the above, which actually holds an array, and the reference count portion is stored in the ZEND_REFCOUNTED_H structure:
typedef struct _ZEND_REFCOUNTED_H {uint32_t refcount;/* Reference Counter 32-BIT */union {struct {zend_endian_lohi_3 ( Zend_uchar type, Zend_uchar flags,/* Used for Strings & objects */uint16_t gc_info)/* keeps GC Roo T number (or 0) and color */} V; uint32_t type_info; } U;} Zend_refcounted_h;
All definitions of complex types start with the ZEND_REFCOUNTED_H structure, which, in addition to the reference count, has a GC-related structure. So when GC recycling is done, the GC does not need to be concerned about what the specific type is, and all of it can be treated as a zend_refcounted* structure.
Another need to explain is that you may be curious about the Zend_endian_lohi_4 macro, the role of this macro is to simplify the assignment, it will be guaranteed on the big or small end of the machine, it defines the fields are in the same order of storage, so that when we assign value, we do not need to assign a value to its fields, Instead, you can assign values uniformly, for example, for the above array structure, by:
Arr1.u.flags = Arr2.u.flags;
An assignment sequence equivalent to the following is done once:
Arr1.u.v.flags = Arr2.u.v.flags;arr1.u.v.napplycount = Arr2.u.v.napplycount;arr1.u.v.niteratorscount = Arr2.u.v.niteratorscount;arr1.u.v.reserve = Arr2.u.v.reserve;
Another possible question is, why not put type types in front of the zval type, because we know that when we go with a zval, the 1th must first go to get its type. One reason for this is that the difference between the two is not significant, and the other is to consider that if the later JIT, the type of zval if it can be obtained through the type deduction, there is no need to read its type value.
Flag bit
adjourned