Analyze the ZendVM engine from the PHP syntax sugar

Source: Internet
Author: User
Tags vars
Analyze the ZendVM engine from the PHP syntax sugar

1.

Let's talk about the syntax sugar of PHP5.3 +. we usually write it like this:

 

Syntax sugar can be written as follows:

  

The execution result $ B = 1. the subsequent statement is more concise, but it is usually not recommended to use too much syntactic sugar, especially for obfuscation. for example, PHP 7 is newly added ?? As follows:

   

Equivalent:

    

? : And ?? Isn't it easy for you to confuse? if so, I suggest you rather not use it. the code is readable and easy to maintain is more important.

Syntactic sugar is not the focus of this article. our goal is to start with syntactic sugar and talk about the parsing principles of Zend VM.

2.

Analysis of PHP source code branch => remotes/origin/PHP-5.6.14, for how to view opcode through vld, please refer to the article I wrote earlier:
Http://www.yinqisen.cn/blog-680.html

     

The corresponding opcdoe is as follows:

number of ops:  5compiled vars:  !0 = $a, !1 = $bline     #* E I O op                           fetch          ext  return  operands-------------------------------------------------------------------------------------   2     0  E >   ASSIGN                                                   !0, 0   3     1        JMP_SET_VAR                                      $1      !0         2        QM_ASSIGN_VAR                                    $1      1         3        ASSIGN                                                   !1, $1   4     4      > RETURN                                                   1branch: #  0; line:     2-    4; sop:     0; eop:     4; out1:  -2path #1: 0,

Vim Zend/zend_language_parser.y + 834

834 ›   |›  expr '?' ':' { zend_do_jmp_set(&$1, &$2, &$3 TSRMLS_CC); }835 ›   ›   expr     { zend_do_jmp_set_else(&$$, &$5, &$2, &$3 TSRMLS_CC); }

If you like it, do it yourself and redefine it? : Syntax sugar. Follow the BNF grammar rules and use bison for parsing. if you are interested, you can continue to learn more about Google.

You can see from the opcode of the vld that zend_do_jmp_set_else is executed. the code is in Zend/zend_compile.c:

void zend_do_jmp_set_else(znode *result, const znode *false_value, const znode *jmp_token, const znode *colon_token TSRMLS_DC){›   zend_op *opline = get_next_op(CG(active_op_array) TSRMLS_CC);›   SET_NODE(opline->result, colon_token);›   if (colon_token->op_type == IS_TMP_VAR) {›   ›   if (false_value->op_type == IS_VAR || false_value->op_type == IS_CV) {›   ›   ›   CG(active_op_array)->opcodes[jmp_token->u.op.opline_num].opcode = ZEND_JMP_SET_VAR;›   ›   ›   CG(active_op_array)->opcodes[jmp_token->u.op.opline_num].result_type = IS_VAR;›   ›   ›   opline->opcode = ZEND_QM_ASSIGN_VAR;›   ›   ›   opline->result_type = IS_VAR;›   ›   } else {›   ›   ›   opline->opcode = ZEND_QM_ASSIGN;›   ›   }›   } else {›   ›   opline->opcode = ZEND_QM_ASSIGN_VAR;›   }›   opline->extended_value = 0;›   SET_NODE(opline->op1, false_value);›   SET_UNUSED(opline->op2);›   GET_NODE(result, opline->result);›   CG(active_op_array)->opcodes[jmp_token->u.op.opline_num].op2.opline_num = get_next_op_number(CG(active_op_array));›   DEC_BPC(CG(active_op_array));}

3.

There are two key Opcodes: ZEND_JMP_SET_VAR and ZEND_QM_ASSIGN_VAR. how can I continue to read the code? The opcode of PHP is described below.

PHP5.6 has 167 opcode, which means you can perform 167 different computing operations, official documentation here http://php.net/manual/en/internals2.opcodes.list.php

PHP internally uses the _ zend_op struct to represent opcode, vim Zend/zend_compile.h + 111

111 struct _zend_op {112 ›   opcode_handler_t handler;113 ›   znode_op op1;114 ›   znode_op op2;115 ›   znode_op result;116 ›   ulong extended_value;117 ›   uint lineno;118 ›   zend_uchar opcode;119 ›   zend_uchar op1_type;120 ›   zend_uchar op2_type;121 ›   zend_uchar result_type;122 }

PHP 7.0 is slightly different. The main difference is that the 64-bit system uint is changed to uint32_t and the number of bytes is specified.

You regard opcode as a calculator, and only accept two operands (op1, op2), execute an operation (handler, such as addition, subtraction, multiplication, division), and then it returns a result to you, A little more processing of arithmetic overflow (extended_value ).

The Zend VM operates in the same way for each opcode. it has a handler (function pointer) pointing to the address of the processing function. This is a C function that contains the code for executing the opcode. with op1 and op2 as parameters, a result is returned after the execution is complete ), sometimes a piece of information (extended_value) is appended ).

Use the operand ZEND_JMP_SET_VAR in our example to describe vim Zend/zend_vm_def.h + 4995.

4942 ZEND_VM_HANDLER (158, ZEND_JMP_SET_VAR, CONST | TMP | VAR | CV, ANY)

4942 ZEND_VM_HANDLER(158, ZEND_JMP_SET_VAR, CONST|TMP|VAR|CV, ANY)4943 {4944 ›   USE_OPLINE4945 ›   zend_free_op free_op1;4946 ›   zval *value, *ret;49474948 ›   SAVE_OPLINE();4949 ›   value = GET_OP1_ZVAL_PTR(BP_VAR_R);49504951 ›   if (i_zend_is_true(value)) {4952 ›   ›   if (OP1_TYPE == IS_VAR || OP1_TYPE == IS_CV) {4953 ›   ›   ›   Z_ADDREF_P(value);4954 ›   ›   ›   EX_T(opline->result.var).var.ptr = value;4955 ›   ›   ›   EX_T(opline->result.var).var.ptr_ptr = &EX_T(opline->result.var).var.ptr;4956 ›   ›   } else {4957 ›   ›   ›   ALLOC_ZVAL(ret);4958 ›   ›   ›   INIT_PZVAL_COPY(ret, value);4959 ›   ›   ›   EX_T(opline->result.var).var.ptr = ret;4960 ›   ›   ›   EX_T(opline->result.var).var.ptr_ptr = &EX_T(opline->result.var).var.ptr;4961 ›   ›   ›   if (!IS_OP1_TMP_FREE()) {4962 ›   ›   ›   ›   zval_copy_ctor(EX_T(opline->result.var).var.ptr);4963 ›   ›   ›   }4964 ›   ›   }4965 ›   ›   FREE_OP1_IF_VAR();4966 #if DEBUG_ZEND>=24967 ›   ›   printf("Conditional jmp to %d\n", opline->op2.opline_num);4968 #endif4969 ›   ›   ZEND_VM_JMP(opline->op2.jmp_addr);4970 ›   }49714972 ›   FREE_OP1();4973 ›   CHECK_EXCEPTION();4974 ›   ZEND_VM_NEXT_OPCODE();4975 }

I _zend_is_true is used to determine whether the operand is true. Therefore, ZEND_JMP_SET_VAR is a conditional value assignment. I believe everyone can understand it. The following describes the key points.

Note that zend_vm_def.h is not a header file of C that can be compiled directly. it can only be called a template. the specific header that can be compiled is zend_vm_execute.h (this file can contain more than 45000 lines ), it is not manually generated, but generated by parsing zend_vm_gen.php script zend_vm_def.h. (interesting, there is a chicken first or an egg first. Where does PHP come from ?), I guess this is a later product. it should not be used in earlier php versions.

The above ZEND_JMP_SET_VAR code generates different types of handler functions based on different parameters CONST | TMP | VAR | CV, but with the same functions:

static int ZEND_FASTCALL  ZEND_JMP_SET_VAR_SPEC_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS)static int ZEND_FASTCALL  ZEND_JMP_SET_VAR_SPEC_TMP_HANDLER(ZEND_OPCODE_HANDLER_ARGS)static int ZEND_FASTCALL  ZEND_JMP_SET_VAR_SPEC_VAR_HANDLER(ZEND_OPCODE_HANDLER_ARGS)static int ZEND_FASTCALL  ZEND_JMP_SET_VAR_SPEC_CV_HANDLER(ZEND_OPCODE_HANDLER_ARGS)

The purpose of this operation is to determine handler during compilation and improve runtime performance. If this is not done, you can select the parameter type at runtime, but the performance is poor. Of course, sometimes some junk code (seemingly useless) is generated in this case. Don't worry, the C compiler will further optimize the processing.

Zend_vm_gen.php can also accept some parameters. The details of the README file Zend/README. ZEND_VM in the PHP source code are described in detail.

4.

Here, we know how opcode corresponds to handler. However, there is another process as a whole, that is, syntax parsing. how is the opcode after parsing connected?

After parsing, there will be a large array containing all opcodes (the linked list may be more accurate). from the code above, we can see that after each handler is executed, ZEND_VM_NEXT_OPCODE () will be called to retrieve the next opcode and continue the execution until the final exit. the circular code vim Zend/zend_vm_execute.h + 337:

ZEND_API void execute_ex(zend_execute_data *execute_data TSRMLS_DC){›   DCL_OPLINE›   zend_bool original_in_execution;›   original_in_execution = EG(in_execution);›   EG(in_execution) = 1;›   if (0) {zend_vm_enter:›   ›   execute_data = i_create_execute_data_from_op_array(EG(active_op_array), 1 TSRMLS_CC);›   }›   LOAD_REGS();›   LOAD_OPLINE();›   while (1) {    ›   int ret;#ifdef ZEND_WIN32›   ›   if (EG(timed_out)) {›   ›   ›   zend_timeout(0);›   ›   }#endif›   ›   if ((ret = OPLINE->handler(execute_data TSRMLS_CC)) > 0) {›   ›   ›   switch (ret) {›   ›   ›   ›   case 1:›   ›   ›   ›   ›   EG(in_execution) = original_in_execution;›   ›   ›   ›   ›   return;›   ›   ›   ›   case 2:›   ›   ›   ›   ›   goto zend_vm_enter;›   ›   ›   ›   ›   break;›   ›   ›   ›   case 3:›   ›   ›   ›   ›   execute_data = EG(current_execute_data);›   ›   ›   ›   ›   break;›   ›   ›   ›   default:›   ›   ›   ›   ›   break;›   ›   ›   }›   ›   }›   }›   zend_error_noreturn(E_ERROR, "Arrived at end of main loop which shouldn't happen");}

Macro definition: vim Zend/zend_execute.c + 1772.

1772 #define ZEND_VM_NEXT_OPCODE() \1773 ›   CHECK_SYMBOL_TABLES() \1774 ›   ZEND_VM_INC_OPCODE(); \1775 ›   ZEND_VM_CONTINUE()329 #define ZEND_VM_CONTINUE()         return 0330 #define ZEND_VM_RETURN()           return 1331 #define ZEND_VM_ENTER()            return 2332 #define ZEND_VM_LEAVE()            return 3

While is an endless loop. execute a handler function. except in some cases, most handler functions call ZEND_VM_NEXT_OPCODE ()-> ZEND_VM_CONTINUE (), return 0, and continue the loop.

Note: for example, if the yield coroutine is an exception, it returns 1 and directly returns an output loop. In the future, we will have the opportunity to analyze yield separately.

I hope you will have a detailed understanding of the parsing process of the PHP Zend engine after reading the above content. next we will analyze the principles and briefly talk about PHP optimization.

5. PHP Optimization considerations

5.1 echo output

      

View opcode in vld:

number of ops:  5compiled vars:  !0 = $foo, !1 = $barline     #* E I O op                           fetch          ext  return  operands-------------------------------------------------------------------------------------   2     0  E >   ASSIGN                                                   !0, 'foo'   3     1        ASSIGN                                                   !1, 'bar'   4     2        CONCAT                                           ~2      !0, !1         3        ECHO                                                     ~2   5     4      > RETURN                                                   1branch: #  0; line:     2-    5; sop:     0; eop:     4; out1:  -2path #1: 0,

ZEND_CONCAT connects the values of $ a and $ B and saves them to the temporary variable ~ 2, and then echo it out. In this process, you need to allocate a piece of memory for temporary variables and release them after use. you also need to call the splicing function to execute the splicing process.

If it is written like this:

       

Corresponding opcode:

number of ops:  5compiled vars:  !0 = $foo, !1 = $barline     #* E I O op                           fetch          ext  return  operands-------------------------------------------------------------------------------------   2     0  E >   ASSIGN                                                   !0, 'foo'   3     1        ASSIGN                                                   !1, 'bar'   4     2        ECHO                                                     !0         3        ECHO                                                     !1   5     4      > RETURN                                                   1branch: #  0; line:     2-    5; sop:     0; eop:     4; out1:  -2path #1: 0,

There is no need to allocate memory or execute the concatenation function. is it more efficient! To learn about the splicing process, you can find the handler corresponding to ZEND_CONCAT's opcode based on the content described in this article and do a lot of things.

5.2 define () and const

The const keyword was introduced from 5.3, which is very different from define. it is similar to # define in C language.

Define () is a function call with function call overhead.

Const is a keyword that directly generates opcode, which can be determined during the compilation period and does not need to be dynamically allocated during the execution period.

The value of const is dead and cannot be changed during runtime. Therefore, it is similar to the # define of C language, which is determined during compilation and has restrictions on the value type.

View the code directly and compare the opcode:

Define example:

        

Define opcode:

number of ops:  6compiled vars:  noneline     #* E I O op                           fetch          ext  return  operands-------------------------------------------------------------------------------------   2     0  E >   SEND_VAL                                                 'FOO'         1        SEND_VAL                                                 'foo'         2        DO_FCALL                                      2          'define'   3     3        FETCH_CONSTANT                                   ~1      'FOO'         4        ECHO                                                     ~1   4     5      > RETURN                                                   1

Const example:

         

Const opcode:

number of ops:  4compiled vars:  noneline     #* E I O op                           fetch          ext  return  operands-------------------------------------------------------------------------------------   2     0  E >   DECLARE_CONST                                            'FOO', 'foo'   3     1        FETCH_CONSTANT                                   ~0      'FOO'         2        ECHO                                                     ~0   4     3      > RETURN                                                   1

5.3 Cost of dynamic functions

          

Corresponding opcode:

number of ops:  3compiled vars:  noneline     #* E I O op                           fetch          ext  return  operands-------------------------------------------------------------------------------------   2     0  E >   NOP   3     1        DO_FCALL                                      0          'foo'   4     2      > RETURN                                                   1

Dynamic Call code:

           

Opcode:

number of ops:  5compiled vars:  !0 = $aline     #* E I O op                           fetch          ext  return  operands-------------------------------------------------------------------------------------   2     0  E >   NOP   3     1        ASSIGN                                                   !0, 'foo'   4     2        INIT_FCALL_BY_NAME                                       !0         3        DO_FCALL_BY_NAME                              0   5     4      > RETURN                                                   1

Run vim Zend/zend_vm_def.h + 2630 to check what INIT_FCALL_BY_NAME does. the code is too long and will not be listed here. Although dynamic features are convenient, they will sacrifice performance. Therefore, you must balance the advantages and disadvantages before using them.

Cost of the 5.4 class delay statement

Check the Code First:

            

Corresponding opcode:

number of ops:  4compiled vars:  noneline     #* E I O op                           fetch          ext  return  operands-------------------------------------------------------------------------------------   2     0  E >   NOP   3     1        NOP         2        NOP   4     3      > RETURN

Change Declaration order:

             

Corresponding opcode:

number of ops:  4compiled vars:  noneline     #* E I O op                           fetch          ext  return  operands-------------------------------------------------------------------------------------   2     0  E >   FETCH_CLASS                                   0  :0      'Bar'         1        DECLARE_INHERITED_CLASS                                  '%00foo%2FUsers%2Fqisen%2Ftmp%2Fvld.php0x103d58020', 'foo'   3     2        NOP   4     3      > RETURN                                                   1

In strong languages, subsequent writing will produce compilation errors, but dynamic languages like PHP will postpone the class declaration to runtime. if you do not pay attention to it, it is very likely to step on this ray.

Therefore, after learning about the Zend VM principles, we should pay more attention to the use of dynamic features, but we should never use them when there are dispensable ones.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.