PHP is an interpreted language. such as Java, Python, Ruby, Javascript, and other interpreted languages, the code we write will not be compiled into machine code to run, but will be compiled by the intermediate code running on the virtual machine (VM). Running PHP virtual machine, called Zend virtual machine, today we will drill into the kernel, explore the principle of Zend virtual machine operation.
OPCODE
What is OPCODE? It is an instruction that a virtual machine can identify and process. Zend virtual machine contains a series of OPCODE, through OPCODE virtual machine can do a lot of things, listing a few OPCODE examples:
ZEND_ADD
Adds the two operands.
ZEND_NEW
Create a PHP object.
ZEND_ECHO
The content is output to standard output.
ZEND_EXIT
Quit PHP.
Such operations, PHP defined 186 (with PHP update, will certainly support more kinds of OPCODE), all the definition and implementation of OPCODE can be in the source zend/zend_vm_def.h
file (the content of this file is not the original C code, but a template, explained in the following section).
Let's look at how PHP designs OPCODE Data Structures:
struct _ZEND_OP {const void *handler;znode_op op1;znode_op op2;znode_op result;uint32_t extended_value;uint32_t Lineno; Zend_uchar Opcode;zend_uchar Op1_type;zend_uchar Op2_type;zend_uchar result_type;};
Careful observation of the OPCODE data structure, is not able to find the sense of assembly language. Each OPCODE contains two operands, op1
and op2
the handler
pointer points to the function that performs the OPCODE operation, and the result of the function processing is saved in result
.
Let's give a simple example:
<?php$b = 1; $a = $b + 2;
We see through the VLD extension that, after compiling, the code above generates the OPCODE of the zend_add instruction.
Compiled VARs: !0 = $b,! 1 = $aline #* E I O op fetch ext return Operands------------------------------------------------------------------------------------- 2 0 E > ASSIGN ! 0, 1 3 1 ADD ~ ! 0, 2 2 ASSIGN ! 1, to 8 3 > RETURN 1
Where the second line is ZEND_ADD
the OPCODE of the instruction. We see that it receives 2 operands, op1
is a variable $b
, op2
is a numeric constant of 1, and the returned result is stored in a temporary variable. In the zend/zend_vm_def.h
file, we can find the corresponding function implementation of the ZEND_ADD instruction:
Zend_vm_handler (1, Zend_add, const| tmpvar| CV, const| tmpvar| CV) {use_oplinezend_free_op free_op1, Free_op2;zval *op1, *op2, *result;op1 = Get_op1_zval_ptr_undef (BP_VAR_R); op2 = GET _op2_zval_ptr_undef (Bp_var_r), if (Expected (z_type_info_p (OP1) = = Is_long)) {if (expected (z_type_info_p) = = OP2 LONG) {result = Ex_var (Opline->result.var); Fast_long_add_function (result, OP1, OP2); Zend_vm_next_opcode ();} else if (Expected (z_type_info_p (OP2) = = is_double)) {result = Ex_var (Opline->result.var); Zval_double (Result, ((DOUBLE) z_lval_p (OP1)) + z_dval_p (OP2)); Zend_vm_next_opcode ();}} else if (Expected (z_type_info_p (OP1) = = is_double)) {...}
The code above is not a native C code, but a template.
Why did you do that? Because PHP is a weakly typed language, its implementation of C is a strongly typed language. The weakly typed language supports automatic type matching, and the implementation of automatic type matching, like the code above, handles different types of parameters by judgement. Imagine that if each OPCODE process needs to determine the type of parameters passed in, then performance is bound to be a huge problem (the OPCODE that a request needs to handle can reach thousands).
Is there any way to do that? We found that at compile time, we were able to determine the type of each operand (possibly a constant or a variable). So, when PHP actually executes the C code, the different types of operands are divided into different functions for the virtual machine to call directly. This part of the code is put in zend/zend_vm_execute.h
, and the expanded file is quite large, and we notice that there is this code:
if (Is_const = = IS_CV) {
It doesn't mean anything at all, does it? However, the compiler of C will automatically optimize this judgment. In most cases, we want to understand the logic of a OPCODE processing, or it's easier to read a template file zend/zend_vm_def.h
. By the way, the program that generates C code from a template is implemented in PHP.
Execution process
To be exact, PHP's execution is divided into two parts: Compile and execute. Here I will not expand the compilation part in detail, but focus on the process of execution.
After a series of compiling processes, such as syntax and lexical analysis, we get a data named Oparray, which is structured as follows:
struct _zend_op_array {/* Common elements */zend_uchar Type;zend_uchar arg_flags[3];/* Bitset of Arg_info.pass_by_refere nce */uint32_t fn_flags;zend_string *function_name;zend_class_entry *scope;zend_function *prototype;uint32_t num_ args;uint32_t required_num_args;zend_arg_info *arg_info;/* END of common elements */uint32_t *refcount;uint32_t last; Zend_op *opcodes;int last_var;uint32_t t;zend_string **vars;int last_live_range;int last_try_catch;zend_live_range * Live_range;zend_try_catch_element *try_catch_array;/* Static variables support */hashtable *static_variables;zend_ String *filename;uint32_t line_start;uint32_t line_end;zend_string *doc_comment;uint32_t early_binding; /* The linked list of delayed declarations */int last_literal;zval *literals;int cache_size;void **run_time_cache ; void *reserved[zend_max_reserved_resources];};
That's a lot of stuff, right? A simple understanding, which is essentially a OPCODE array plus the set of environment data needed to execute the process. Describes several relatively important fields:
opcodes
The array that holds the OPCODE.
filename
The file name of the currently executing script.
function_name
The name of the currently executing method.
static_variables
Static variable list.
last_try_catch
try_catch_array
in the current context, if an exception occurs, the information required for the try-catch-finally jump.
literals
All constants such as String foo or the number 23, such as the constant literal collection.
Why do you need to generate such a large amount of data? Because the more information is generated during the compile time, the less time is required to execute the period.
Next, let's look at how PHP executes OPCODE. The execution of the OPCODE is placed in a cycle, the function in which the loop is located zend/zend_vm_execute.h
execute_ex
:
Zend_api void execute_ex (Zend_execute_data *ex) {dcl_oplinezend_execute_data *execute_data = ex; Load_opline (); Zend_vm_loop_interrupt_check (); while (1) {if (unexpected (ret = ((opcode_handler_t) opline->handler) (Zend_opcode_ handler_args_passthru)) = 0) {if (expected (Ret > 0)) {execute_data = EG (Current_execute_data); Zend_vm_loop_interrupt_check ();} else {return;}}} Zend_error_noreturn (E_core_error, "arrived at end of main loop which shouldn ' t happen");}
Here, I removed some of the environment variables to determine the branch, preserving the running main flow. As you can see, in an infinite loop, the virtual opportunity constantly calls OPCODE the specified handler
function to process the instruction set until the result of the instruction processing is ret
less than 0. Notice that, instead of moving the current pointer to the OPCODE array in the main flow, the process is placed at the end of the specific function that the instruction executes. So we can see the call to this macro at the end of most OPCODE implementation functions:
Zend_vm_next_opcode_check_exception ();
In the previous simple example, we saw VLD print out the execution OPCODE array, and finally there was an instruction for ZEND_RETURN
the OPCODE. But there is no such statement in the PHP code we are writing. At compile time, the virtual opportunity automatically adds this instruction to the end of the OPCODE array. The ZEND_RETURN
function that corresponds to the instruction returns 1, judging that the result of execution is less than 0 o'clock, it exits the loop, thus ending the program's operation.
Method invocation
If we call a custom function, how does the virtual opportunity handle it?
<?phpfunction foo () { echo ' Test ';} Foo ();
We view the generated OPCODE through VLD. There are two OPCODE instruction execution stacks, because we have customized a PHP function. On the first execution stack, invoking a custom function executes two OPCODE directives: INIT_FCALL
and DO_FCALL
.
compiled vars:noneline #* E I O op fetch ex T return operands-------------------------------------------------------------------------------------2 0 E > NOP 6 1 Init_fcall ' foo ' 2 Do_fcall 0 3 > RETURN 1compi LED Vars:noneline #* E I O op fetch ext return operands------------------------ -------------------------------------------------------------3 0 E > ECHO ' Test ' 4 1 > RETURN null
Where the INIT_FCALL
context data required to execute the function is prepared. DO_FCALL
responsible for executing the function. DO_FCALL
processing function handles a lot of logic depending on the invocation situation, and I've extracted the logical part of the user-defined function that executes it:
Zend_vm_handler (Zend_do_fcall, any, any, SPEC (RETVAL)) { Use_opline zend_execute_data *call = EX (call); Zend_function *FBC = call->func; Zend_object *object; Zval *ret; ... if (expected (Fbc->type = = zend_user_function)) { ret = NULL; if (return_value_used (Opline)) { ret = Ex_var (Opline->result.var); Zval_null (ret); } Call->prev_execute_data = Execute_data; I_init_func_execute_data (call, &fbc->op_array, ret); if (expected (zend_execute_ex = = execute_ex)) { zend_vm_enter (); } else { Zend_add_call_flag (call, Zend _call_top); ZEND_EXECUTE_EX (call); } } ... Zend_vm_set_opcode (Opline + 1); Zend_vm_continue ();}
As you can see, the DO_FCALL
context data before the calling function is first saved to, and then the function is called, and the custom function call->prev_execute_data
i_init_func_execute_data
object op_array
(each custom function generates the corresponding data at compile time, and its data structure contains the OPCODE array of functions) Assigns a value to the new execution context object.
Then, call the zend_execute_ex
function to begin executing the custom function. Is zend_execute_ex
actually the previously mentioned execute_ex
function (the default is this, but the extension may rewrite the zend_execute_ex
pointer, this API allows the PHP extension developers to extend the function by overwriting functions, not the topic of this article, not ready to delve into), Just the context data is replaced with the context data where the current function resides.
We can understand that the outermost code is a default function (similar to a function in C main()
), and the user-defined function is inherently indistinguishable.
Logical Jump
We know that directives are executed sequentially, and our programs generally contain a lot of logical judgments and loops, and how is this part implemented through OPCODE?
<?php$a = 10;if ($a = =) { echo ' success ';} else { echo ' failure ';}
We still look at OPCODE through VLD (we have to say that the VLD extension is an artifact of parsing PHP).
Compiled VARs: !0 = $aline #* E I O op fetch ext return Operands------------------------------------------------------------------------------------- 2 0 E > ASSIGN ! 0, 3 1 is_equal ! 0, 2 > Jmpz ,->5 4 3 > ECHO ' success ' 4 > JMP ->6 6 5 > ECHO ' failure ' 7 6 > > RETURN 1
We see, JMPZ
and JMP
control the execution process. JMP
The logic is very simple, point the current OPCODE pointer to the OPCODE that needs to jump.
Zend_vm_handler (zend_jmp, jmp_addr, any) {Use_oplinezend_vm_set_opcode (op_jmp_addr, opline)); Zend_vm_continue ();}
JMPZ
Just one more judgment, depending on the outcome to choose whether to jump, here will not repeat the list. And the way of dealing with loops is basically similar to the judgment.
<?php$a = [1, 2, 3];foreach ($a as $n) { echo $n;}
Compiled VARs: !0 = $a,! 1 = $nline #* E I O op fetch ext return Operands------------------------------------------------------------------------------------- 2 0 E > ASSIGN ! 0, <array> 3 1 > Fe_reset_r ! 0,->5 2 > > Fe_fetch_r $ 1,->5 4 3 > ECHO ! 1 4 > JMP ->2 5 > fe_free $ 5 6 > RETURN 1
The loop only requires JMP
instructions to determine if FE_FETCH_R
the end of the array has been reached and exits the loop if it arrives.
Conclusion
By understanding the Zend virtual machine, I believe you have a deeper understanding of how PHP works. The idea of a line of code that we write, the last time the machine executes, becomes an endless set of instructions, each of which is built on complex processing logic. Those who write from the former random code, now will not in the mind unconsciously converted into OPCODE again taste?