This article brings you the content is about PHP7 Source: PHP Virtual Machine Detailed analysis, there is a certain reference value, the need for a friend can refer to, I hope you have some help.
1. Speaking from the physical machine
Virtual machine is also a computer, design ideas and physical machines have a lot of similarities;
1.1 von Neumann architecture
Von Neumann is a well-deserved father of digital computers, the current computer is based on the von Neumann architecture, design ideas mainly include the following aspects:
The instructions and the data are mixed in the same memory, both of which are in-memory data. Modern CPU protection mode, each memory segment has a segment descriptor, this descriptor records the access rights of this memory segment (readable, writable, executable). This in disguise specifies what memory is stored in the instruction which is the data);
Memory is a one-dimensional structure of linear addresses accessed by address, and the number of bits per unit is fixed;
The data is expressed in binary notation;
The instruction is composed of the operation code and the operand. The opcode indicates the type of operation of this instruction, and the operand specifies the address of the operand itself or the operand. The operand itself does not have a data type, its data type is determined by the opcode, and any architecture computer will provide a set of instructions to the outside.
The operator directly emits control signals to control the operation of the computer by executing instructions. The instruction counter indicates the memory address where the instruction is to be executed. The instruction counter has only one, generally ascending in order, but the execution order may change because of the result of the operation or the external conditions at that time;
1.2 Introduction to assembly language
Computers of any architecture provide a set of instructions;
The instruction consists of the operation code and the operand, the operation code is the operation type, the operand can be an immediate number or a storage address, each instruction can have 0, 1 or 2 operands;
The instruction is a string of binary; assembly language is the textual form of the binary instruction;
Push %ebxmov %eax, [%esp+8]mov %ebx, [%esp+12]add %eax,%ebxpop %ebx
Push, MOV, add, pop, etc. is the operation code;
%EBX register; [%esp+12] memory address;
The operand is only a storage area of the accessible data, and the operand itself has no data type, and its data type is determined by the opcode;
such as MOVB transfer byte, MOVW transfer Word, MOVL transmission double word, etc.
1.3 Function Call stacks
Process (function) is the encapsulation of the code, the external exposure is only a set of parameters and an optional return value, you can call this function in different places in the program, assuming that the process P call procedure Q,q execution after the return process p; In order to achieve this function, three points need to be considered:
Instruction Jump: When entering the process Q, the program counter must be set to the start address of the code of Q, and on return, the program counter needs to be set to the address of the instruction after the call Q in P;
Data transfer: P can provide one or more parameters to Q, Q can return a value to P;
Memory allocation and deallocation: When Q starts executing, it may be necessary to allocate memory space for local variables, and to release these memory spaces before returning;
Most language procedure calls use the memory management mechanism provided by the stack data structure, as shown in:
The function call and return is a series of stack and stack operations;
When the function executes, it will have its own private stack frame, and the local variable is allocated on the function private stack frame;
Usually encountered stack overflow is because the call function level too deep, constantly into the stack caused;
2.PHP Virtual Machine
Virtual machine is also a computer, reference physical machine design, design a virtual machine, the first should consider three elements: instruction, data storage, function stack frame;
The following three points from the detailed analysis of the PHP virtual machine design ideas;
2.1 Fingers
2.1.1 Instruction Type
Computers of any architecture need to provide a set of instructions externally, which represents a set of operation types supported by the computer;
PHP virtual Machine provides 186 kinds of instructions, defined in the Zend_vm_opcodes.h file;
Add, subtract, multiply, divide, etc # # # Zend_add 1#define zend_sub 2#define zend_mul 3#define zend_p 4#define zend_mod 5#define zend_sl 6#define zend_sr 7#define zend_concat 8#define zend_bw_or 9#define ZEND_BW_ and 10 ........ ..........
2.1.2 directive
Representation of the 2.1.2.1 directive
The instruction consists of the operation code and the operand, and the operation code indicates the operation type of this instruction, and the operand indicates the address of the operand itself or the operand;
The PHP virtual machine definition instruction format is: opcode operand 1 operand 2 return value, which uses struct _zend_op to represent an instruction:
struct _ZEND_OP { const void *handler; Pointer, which points to the execution function of the current instruction znode_op OP1; Operand 1 znode_op op2; Operand 2 znode_op result; return value uint32_t extended_value;//extended uint32_t Lineno; Line number Zend_uchar opcode; Instruction type Zend_uchar op1_type; The type of operand 1 (this type does not represent a data type such as a string, an array, etc.) Zend_uchar op2_type , which means that the operand is a constant, a temporary variable, a compilation variable, and so on; Type of operand 2 Zend_uchar result_type;//type of return value};
Representation of the 2.1.2.2 operand
As you can see from the above, the operands are represented by the struct ZNODE_OP, defined as follows:
Constant, VAR, num, etc. are all uint32_t types, how does this represent an operand? (Neither the pointer can represent an address, nor can it represent all data types);
In fact, the majority of the operation of the relative address representation, constant and so on is relative to the stacks frame of the initial address of the offset;
In addition, there is a zval *zv field in the _znode_op struct, which can also represent an operand, which is a pointer to the Zval struct, and all data types supported by the PHP virtual machine are represented by the zval structure;
typedef Union _ZNODE_OP { uint32_t constant; uint32_t var; uint32_t num; uint32_t Opline_num; #if zend_use_abs_jmp_addr zend_op *jmp_addr; #else uint32_t jmp_offset; #endif #if zend_use_abs_const_addr zval *zv; #endif} znode_op;
2.2 Data storage
PHP VMS support multiple data types: integer, float, string, array, object, etc. how do PHP virtual machines store and represent multiple data types?
The 2.1.2.2 section indicates that the struct _znode_op represents an operand, the operand can be an offset (computes an address, the first address of the zval struct), or a zval pointer; the PHP virtual machine uses the Zval structure to represent and store a variety of data;
struct _zval_struct {zend_value value; Stores the actual value values Union {struct {//some flag bits zend_endian_lohi_4 (ZEND _uchar type,//important; Indicates variable type Zend_uchar type_flags, Zend_uchar const_flags, ZEND_UCHAR reserved)/* Call info for EX (this) */} V; uint32_t Type_info; } U1; Union {//other useful information uint32_t next; /* Hash Collision chain */uint32_t Cache_slot; /* Literal cache slot */uint32_t Lineno; /* line number (for AST nodes) */uint32_t Num_args; /* Arguments number for EX (this) */uint32_t Fe_pos; /* foreach position */uint32_t fe_iter_idx; /* foreach iterator Index */uint32_t access_flags; /* Class constant Access Flags */uint32_t PropErty_guard; /* Single Property Guard */} U2;};
Zval.u1.type represents a data type, the Zend_types.h file defines the following types:
#define IS_UNDEF 0#define is_null 1#define is_false 2#define is_true 3#define is_long 4#define Is_double 5#define is_string 6#define is_array 7#define is_object 8#define is_resource 9# Define Is_reference 10 ......
Zend_value stores specific data content, and the structure is defined as follows:
_zend_value accounts for 16 bytes of memory; a long, double type is stored directly in the struct; references, strings, arrays, and so on are stored using pointers;
In the code, according to the Zval.u1.type field, determine the data type, which determines the operation of _zend_value structure which field;
As you can see, the string is represented using zend_string, and the array uses Zend_array to represent ...
typedef Union _ZEND_VALUE { zend_long lval; Double dval; zend_refcounted *counted; Zend_string *str; Zend_array *arr; Zend_object *obj; Zend_resource *res; Zend_reference *ref; Zend_ast_ref *ast; Zval *zv; void *ptr; Zend_class_entry *ce; Zend_function *func; struct { uint32_t W1; uint32_t W2; } WW;} Zend_value;
For example, the string structure diagram in PHP7:
2.3 Talk about instructions again
2.1.2.1 points out that the instruction is represented by a struct _zend_op, the most important of which is 2 properties: The operation function, the operand (two operands and a return value);
The type of operand (constant, temporary variable, etc.) is different, the handler function corresponding to the same instruction will be different; the operand type is defined in the Zend/zend_compile.h file:
Constant # define Is_const (1<<0)//temporary variable for intermediate result of operation; handler # # Is_tmp_var (1<<1) cannot be reused by other directives This variable is not a variable declared in PHP code, it is common to return temporary variables, such as $a=time (), the type of the function time return value is Is_var, this type of variable is a handler that can be used by other instructions to reuse a # define Is_var (1<<2) #define Is_unused (1<<3) /* Unused variable *//compiler variable; the variable declared in PHP; #define IS_CV (1<<4)/ * Compiled variable */
The operation function naming rules are: zend_[opcode]_spec_ (operand 1 type) _ (operand 2 type) _ (return value type) _handler
For example, an assignment statement has the following kinds of operation functions:
Zend_assign_spec_var_const_retval_unused_handler,zend_assign_spec_var_tmp_retval_unused_handler,zend_assign_ Spec_var_var_retval_unused_handler,zend_assign_spec_var_cv_retval_unused_handler,...
For $a=1, the Operation function is: Zend_assign_spec_cv_const_retval_unused_handler; the function is implemented as:
Static Zend_opcode_handler_ret Zend_fastcall Zend_assign_spec_cv_const_retval_unused_handler (ZEND_OPCODE_HANDLER_ ARGS) { use_opline zval *value; Zval *variable_ptr; Save_opline (); Gets the value corresponding to the OP2, which is 1 value = ex_constant (OPLINE->OP2); Get the position of OP1 in Execute_data, that is, $ A (execute_data similar function stack frame, after detailed analysis) variable_ptr = _get_zval_ptr_cv_undef_bp_var_w ( Execute_data, Opline->op1.var); Assignment Value = zend_assign_to_variable (variable_ptr, value, is_const); if (unexpected (0)) { zval_copy (Ex_var (Opline->result.var), value); } Zend_vm_next_opcode_check_exception ();}
2.4 Function Stack Frame
2.4.1 Instruction Set
The structure and representation of the instruction are analyzed above, and the PHP virtual machine uses _zend_op_array to represent the collection of instructions:
struct _zend_op_array { ... ... Last indicates the total number of instructions; opcodes is an array of stored instructions; uint32_t last; Zend_op *opcodes; The variable type is the number of IS_CV int last_var; The variable type is Is_var and the number of Is_temp_var uint32_t t; An array of IS_CV type variables is stored zend_string **vars; ....... Static variable HashTable *static_variables; The number of constants; An array of constants int last_literal; Zval *literals; ...};
Note: Last_var represents the number of IS_CV type variables, which are stored in the VARs array, and each time a variable of type IS_CV is encountered (similar to $something) in the entire compilation process, the VARs array is traversed to check if it already exists, and if it does not exist , it is inserted into VARs and the value of Last_var is set to the operand of the variable, and if present, the previously assigned operand is used
2.4.2 Function Stack Frame
The PHP virtual machine implements a function stack frame structure similar to the 1.3-section physical machine.
Use _zend_vm_stack to represent the stack structure, using the Prev field between multiple stacks to form a unidirectional linked list, and top and end pointing to the lower stack and the top of the stack, respectively, to the zval type of pointer;
struct _zend_vm_stack { zval *top; Zval *end; Zend_vm_stack prev;};
Consider how to design the frame structure of the function execution time: When the current function executes, it needs to store the function compiled instruction, need to store the local variables inside the function (2.1.2.2 section indicates that the operand is represented by the struct ZNODE_OP, its internal use uint32_t represents the operand, At this point is the current zval variable relative to the current function stack frame of the initial address of the offset);
The PHP virtual machine uses the struct _zend_execute_data to store the data required for the current function execution;
struct _zend_execute_data { //current instruction directive const ZEND_OP *opline; The current function executes the stack frame zend_execute_data *call; function returns data zval *return_value; Zend_function *func; Zval this ; /* this + call_info + Num_args * ///Call the stack frame of the current function zend_execute_data *prev_execute_data; Symbol table zend_array *symbol_table; #if zend_ex_use_run_time_cache void **run_time_cache; #endif # if Zend_ex_use_literals //constant array zval *literals; #endif};
When the function starts executing, it is necessary to assign the function stack frame to the stack, and the code is as follows:
Static Zend_always_inline zend_execute_data *zend_vm_stack_push_call_frame (uint32_t call_info, Zend_function *func, uint32_t Num_args, Zend_class_entry *called_scope, Zend_object *object) {//Calculate the current function stack frame requires a memory space size uint32_t used_stack = ze Nd_vm_calc_used_stack (Num_args, func); Allocates space according to the stack frame size, into the stack return zend_vm_stack_push_call_frame_ex (Used_stack, Call_info, func, Num_args, Called_scope, obj ECT);} Calculate function stack frame size static zend_always_inline uint32_t zend_vm_calc_used_stack (uint32_t Num_args, zend_function *func) {//_zend _execute_data size (80 bytes/16 bytes =5) + number of parameters uint32_t Used_stack = Zend_call_frame_slot + Num_args; if (Expected (Zend_user_code (Func->type))) {//Current function Temp variable number used_stack + = Func->op_array.last_var + func ->op_array. T-min (Func->op_array.num_args, Num_args); }//multiplied by 16 bytes return used_stack * sizeof (zval);} into the stack static zend_always_inline zend_execute_data *zend_vm_stack_push_call_frame_ex (uint32_t used_stack, uint32_t call _info, Zend_function *func, uint32_t Num_args, Zend_class_entry *called_scope, Zend_object *object) {//previous function stack frame address Zend_execute_data *cal L = (zend_execute_data*) EG (vm_stack_top); Move function call stack top pointer EG (vm_stack_top) = (zval*) ((char*) calls + Used_stack); Initializes the current function stack frame zend_vm_init_call_frame (call, Call_info, Func, Num_args, Called_scope, object); Returns the current function stack frame first address return call;
From the above analysis you can get the function stack frame structure diagram as follows:
Summarize
PHP virtual machine is also a computer, there are three points we need to focus on: instruction set (including instruction processing function), data storage (zval), function stack frame;
At this point the virtual machine can accept the instruction and execute the instruction code;
However, the PHP virtual machine is dedicated to execute PHP code, how the PHP code can be converted to the PHP virtual machine can be recognized by the instructions-compile;
The PHP virtual machine also provides a compiler that converts the PHP code into a collection of instructions that it can recognize;
Theoretically you can customize any language, as long as the implementation of the compiler, you can convert your own language to PHP can be recognized by the code, can be executed by the PHP virtual machine;
Related articles recommended:
A summary of the new grammatical features in PHP7.0 and php7.1
How to save session to database and use (with code) in PHP
Explanation of the principle of time function strtotime () function in PHP