PHP Kernel Analysis-zend Virtual machine detailed

Source: Internet
Author: User
PHP is an interpreted language. such as Java, Python, Ruby, Javascript, and other interpreted languages, the code we write will not be compiled into machine code to run, but will be compiled by the intermediate code running on the virtual machine (VM). Running PHP virtual machine, called Zend virtual machine, today we will drill into the kernel, explore the principle of Zend virtual machine operation.

OPCODE

What is OPCODE? It is an instruction that a virtual machine can identify and process. Zend virtual machine contains a series of OPCODE, through OPCODE virtual machine can do a lot of things, listing a few OPCODE examples:

    • ZEND_ADDAdds the two operands.

    • ZEND_NEWCreate a PHP object.

    • ZEND_ECHOThe content is output to standard output.

    • ZEND_EXITQuit PHP.

Such operations, PHP defined 186 (with PHP update, will certainly support more kinds of OPCODE), all the definition and implementation of OPCODE can be in the source zend/zend_vm_def.h file (the content of this file is not the original C code, but a template, explained in the following section).

Let's look at how PHP designs OPCODE Data Structures:

struct _ZEND_OP {const void *handler;znode_op op1;znode_op op2;znode_op result;uint32_t extended_value;uint32_t Lineno; Zend_uchar Opcode;zend_uchar Op1_type;zend_uchar Op2_type;zend_uchar result_type;};

Careful observation of the OPCODE data structure, is not able to find the sense of assembly language. Each OPCODE contains two operands, op1 and op2 the handler pointer points to the function that performs the OPCODE operation, and the result of the function processing is saved in result .

Let's give a simple example:

<?php$b = 1; $a = $b + 2;

We see through the VLD extension that, after compiling, the code above generates the OPCODE of the zend_add instruction.

Compiled VARs:  !0 = $b,! 1 = $aline     #* E I O op                           fetch          ext  return  Operands-------------------------------------------------------------------------------------   2     0  E >   ASSIGN                                                   ! 0, 1   3     1        ADD                                              ~      ! 0, 2         2        ASSIGN                                                   ! 1, to   8     3      > RETURN                                                   1

Where the second line is ZEND_ADD the OPCODE of the instruction. We see that it receives 2 operands, op1 is a variable $b , op2 is a numeric constant of 1, and the returned result is stored in a temporary variable. In the zend/zend_vm_def.h file, we can find the corresponding function implementation of the ZEND_ADD instruction:

Zend_vm_handler (1, Zend_add, const| tmpvar| CV, const| tmpvar| CV) {use_oplinezend_free_op free_op1, Free_op2;zval *op1, *op2, *result;op1 = Get_op1_zval_ptr_undef (BP_VAR_R); op2 = GET _op2_zval_ptr_undef (Bp_var_r), if (Expected (z_type_info_p (OP1) = = Is_long)) {if (expected (z_type_info_p) = = OP2 LONG) {result = Ex_var (Opline->result.var); Fast_long_add_function (result, OP1, OP2); Zend_vm_next_opcode ();} else if (Expected (z_type_info_p (OP2) = = is_double)) {result = Ex_var (Opline->result.var); Zval_double (Result, ((DOUBLE) z_lval_p (OP1)) + z_dval_p (OP2)); Zend_vm_next_opcode ();}} else if (Expected (z_type_info_p (OP1) = = is_double)) {...}

The code above is not a native C code, but a template.

Why did you do that? Because PHP is a weakly typed language, its implementation of C is a strongly typed language. The weakly typed language supports automatic type matching, and the implementation of automatic type matching, like the code above, handles different types of parameters by judgement. Imagine that if each OPCODE process needs to determine the type of parameters passed in, then performance is bound to be a huge problem (the OPCODE that a request needs to handle can reach thousands).

Is there any way to do that? We found that at compile time, we were able to determine the type of each operand (possibly a constant or a variable). So, when PHP actually executes the C code, the different types of operands are divided into different functions for the virtual machine to call directly. This part of the code is put in zend/zend_vm_execute.h , and the expanded file is quite large, and we notice that there is this code:

if (Is_const = = IS_CV) {

It doesn't mean anything at all, does it? However, the compiler of C will automatically optimize this judgment. In most cases, we want to understand the logic of a OPCODE processing, or it's easier to read a template file zend/zend_vm_def.h . By the way, the program that generates C code from a template is implemented in PHP.

Execution process

To be exact, PHP's execution is divided into two parts: Compile and execute. Here I will not expand the compilation part in detail, but focus on the process of execution.

After a series of compiling processes, such as syntax and lexical analysis, we get a data named Oparray, which is structured as follows:

struct _zend_op_array {/* Common elements */zend_uchar Type;zend_uchar arg_flags[3];/* Bitset of Arg_info.pass_by_refere nce */uint32_t fn_flags;zend_string *function_name;zend_class_entry *scope;zend_function *prototype;uint32_t num_ args;uint32_t required_num_args;zend_arg_info *arg_info;/* END of common elements */uint32_t *refcount;uint32_t last; Zend_op *opcodes;int last_var;uint32_t t;zend_string **vars;int last_live_range;int last_try_catch;zend_live_range * Live_range;zend_try_catch_element *try_catch_array;/* Static variables support */hashtable *static_variables;zend_ String *filename;uint32_t line_start;uint32_t line_end;zend_string *doc_comment;uint32_t early_binding; /* The linked list of delayed declarations */int last_literal;zval *literals;int cache_size;void **run_time_cache  ; void *reserved[zend_max_reserved_resources];};

That's a lot of stuff, right? A simple understanding, which is essentially a OPCODE array plus the set of environment data needed to execute the process. Describes several relatively important fields:

    • opcodesThe array that holds the OPCODE.

    • filenameThe file name of the currently executing script.

    • function_nameThe name of the currently executing method.

    • static_variablesStatic variable list.

    • last_try_catchtry_catch_arrayin the current context, if an exception occurs, the information required for the try-catch-finally jump.

    • literalsAll constants such as String foo or the number 23, such as the constant literal collection.

Why do you need to generate such a large amount of data? Because the more information is generated during the compile time, the less time is required to execute the period.

Next, let's look at how PHP executes OPCODE. The execution of the OPCODE is placed in a cycle, the function in which the loop is located zend/zend_vm_execute.h execute_ex :

Zend_api void execute_ex (Zend_execute_data *ex) {dcl_oplinezend_execute_data *execute_data = ex; Load_opline (); Zend_vm_loop_interrupt_check (); while (1) {if (unexpected (ret = ((opcode_handler_t) opline->handler) (Zend_opcode_ handler_args_passthru)) = 0) {if (expected (Ret > 0)) {execute_data = EG (Current_execute_data); Zend_vm_loop_interrupt_check ();} else {return;}}} Zend_error_noreturn (E_core_error, "arrived at end of main loop which shouldn ' t happen");}

Here, I removed some of the environment variables to determine the branch, preserving the running main flow. As you can see, in an infinite loop, the virtual opportunity constantly calls OPCODE the specified handler function to process the instruction set until the result of the instruction processing is ret less than 0. Notice that, instead of moving the current pointer to the OPCODE array in the main flow, the process is placed at the end of the specific function that the instruction executes. So we can see the call to this macro at the end of most OPCODE implementation functions:

Zend_vm_next_opcode_check_exception ();

In the previous simple example, we saw VLD print out the execution OPCODE array, and finally there was an instruction for ZEND_RETURN the OPCODE. But there is no such statement in the PHP code we are writing. At compile time, the virtual opportunity automatically adds this instruction to the end of the OPCODE array. The ZEND_RETURN function that corresponds to the instruction returns 1, judging that the result of execution is less than 0 o'clock, it exits the loop, thus ending the program's operation.

Method invocation

If we call a custom function, how does the virtual opportunity handle it?

<?phpfunction foo () {    echo ' Test ';} Foo ();

We view the generated OPCODE through VLD. There are two OPCODE instruction execution stacks, because we have customized a PHP function. On the first execution stack, invoking a custom function executes two OPCODE directives: INIT_FCALL and DO_FCALL .

compiled vars:noneline #* E I O op fetch ex    T return operands-------------------------------------------------------------------------------------2 0 E >                                               NOP 6 1 Init_fcall ' foo ' 2 Do_fcall 0 3 > RETURN 1compi LED Vars:noneline #* E I O op fetch ext return operands------------------------                                                     -------------------------------------------------------------3 0 E > ECHO ' Test ' 4 1 > RETURN null 

Where the INIT_FCALL context data required to execute the function is prepared. DO_FCALLresponsible for executing the function. DO_FCALLprocessing function handles a lot of logic depending on the invocation situation, and I've extracted the logical part of the user-defined function that executes it:

Zend_vm_handler (Zend_do_fcall, any, any, SPEC (RETVAL)) {    Use_opline    zend_execute_data *call = EX (call);    Zend_function *FBC = call->func;    Zend_object *object;    Zval *ret;    ...    if (expected (Fbc->type = = zend_user_function)) {        ret = NULL;        if (return_value_used (Opline)) {            ret = Ex_var (Opline->result.var);            Zval_null (ret);        }        Call->prev_execute_data = Execute_data;        I_init_func_execute_data (call, &fbc->op_array, ret);        if (expected (zend_execute_ex = = execute_ex)) {            zend_vm_enter ();        } else {            Zend_add_call_flag (call, Zend _call_top);            ZEND_EXECUTE_EX (call);        }    }    ...    Zend_vm_set_opcode (Opline + 1);    Zend_vm_continue ();}

As you can see, the DO_FCALL context data before the calling function is first saved to, and then the function is called, and the custom function call->prev_execute_data i_init_func_execute_data object op_array (each custom function generates the corresponding data at compile time, and its data structure contains the OPCODE array of functions) Assigns a value to the new execution context object.

Then, call the zend_execute_ex function to begin executing the custom function. Is zend_execute_ex actually the previously mentioned execute_ex function (the default is this, but the extension may rewrite the zend_execute_ex pointer, this API allows the PHP extension developers to extend the function by overwriting functions, not the topic of this article, not ready to delve into), Just the context data is replaced with the context data where the current function resides.

We can understand that the outermost code is a default function (similar to a function in C main() ), and the user-defined function is inherently indistinguishable.

Logical Jump

We know that directives are executed sequentially, and our programs generally contain a lot of logical judgments and loops, and how is this part implemented through OPCODE?

<?php$a = 10;if ($a = =) {    echo ' success ';} else {    echo ' failure ';}

We still look at OPCODE through VLD (we have to say that the VLD extension is an artifact of parsing PHP).

Compiled VARs:  !0 = $aline     #* E I O op                           fetch          ext  return  Operands-------------------------------------------------------------------------------------   2     0  E >   ASSIGN                                                   ! 0,   3     1        is_equal      ! 0,         2      > Jmpz                                                     ,->5   4     3    >   ECHO                                                     ' success '         4      > JMP                                                      ->6   6     5    >   ECHO                                                     ' failure '   7     6    > > RETURN                                                   1

We see, JMPZ and JMP control the execution process. JMPThe logic is very simple, point the current OPCODE pointer to the OPCODE that needs to jump.

Zend_vm_handler (zend_jmp, jmp_addr, any) {Use_oplinezend_vm_set_opcode (op_jmp_addr, opline)); Zend_vm_continue ();}

JMPZJust one more judgment, depending on the outcome to choose whether to jump, here will not repeat the list. And the way of dealing with loops is basically similar to the judgment.

<?php$a = [1, 2, 3];foreach ($a as $n) {    echo $n;}
Compiled VARs:  !0 = $a,! 1 = $nline     #* E I O op                           fetch          ext  return  Operands-------------------------------------------------------------------------------------   2     0  E >   ASSIGN                                                   ! 0, <array>   3     1      > Fe_reset_r      ! 0,->5         2    > > Fe_fetch_r $                                               1,->5   4     3    >   ECHO                                                     ! 1         4      > JMP                                                      ->2         5    >   fe_free                                                  $   5     6      > RETURN                                                   1

The loop only requires JMP instructions to determine if FE_FETCH_R the end of the array has been reached and exits the loop if it arrives.

Conclusion

By understanding the Zend virtual machine, I believe you have a deeper understanding of how PHP works. The idea of a line of code that we write, the last time the machine executes, becomes an endless set of instructions, each of which is built on complex processing logic. Those who write from the former random code, now will not in the mind unconsciously converted into OPCODE again taste?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.