HelloWorld (II) Preface to the analysis of the PHP-Zend Engine this time, I started the execution process of the Zend virtual machine around HelloWorld. PHP version of HelloWorld: & lt ;? Php ??? Echo 'helloworld ';? & Gt; in the lexical analysis phase mentioned in the previous article, the above script PHP-Zend engine will be analyzed as Hello World (2)
Preface this time, I started the Zend VM execution process around Hello World. PHP version of Hello World: In the lexical analysis phase discussed in the previous article, we will analyze the above script into a Token sequence: we get a Token sequence: T_OPEN_TAG, T_ECHO, T_CONSTANT_ENCAPSED_STRING, ';', T_CLOSE_TAG. But how does one analyze the Token sequence during execution of the Zend virtual machine? To track the running track, we should start with the command line. in the do_cli function in $ PHPSRC/sapi/cli/php_cli.c, we receive the command line parameter input (php-f HelloWorld. php indicates executing HelloWorld. php file ). We track the definition of php_execute_script in $ PHPSRC/main. c, and then call zend_execute_scripts () In the definition of zend_execute_scripts, we found :? EG (active_op_array) =?
Zend_compile_file(File_handle, type TSRMLS_CC );
Zend_execute(EG (active_op_array) TSRMLS_CC); first, parse the file into opcode through zend_compile_file (this step will undergo lexical syntax analysis ), then run the generated intermediate code with zend_execute (the so-called runtime ). This is very similar to the C language compilation method. it is first compiled into an assembly and then converted into a machine code. the opcode here is similar to the Assembly generated during the C language compilation process. Another idea can be extended, because every time you parse the php file, you need to analyze the lexical syntax to obtain the corresponding opcode. In fact, when the script file does not change, the generated opcode does not need to be changed. therefore, to reduce the execution time of PHP scripts, you can cache the opcode of the scripts (such as cache in the shared memory ). I will give a flowchart, and then follow this flowchart to see what Zend has done: Let's first look at how to compile opcode. Lexical syntax analysis-> opcode from the above section we know through zend_compile_file (actually compile_file () <定义在zend zend_language_scanner.c的555行> Compile the opcode from the script file. In fact, the opcode is compiled using the zendparse API. The PHP syntax parser is generated using bison. after installation, run: bison-o zend_language_parser.c in the $ PHPSRC/Zend directory? Zend_language_parser.y will generate the syntax parser zend_language_parser.c in the Zend directory. Here, zendparse is the yyparse in the syntax parser! We ignore the generated syntax parser and follow the bison Declaration file in the Hello World example (I remove the declaration that I don't want to close ):
start:top_statement_list???? { zend_do_end_compilation(TSRMLS_C); };top_statement_list:top_statement_list? { zend_do_extended_info(TSRMLS_C); } top_statement { HANDLE_INTERACTIVE(); }|???? /* empty */;top_statement:statement????????????????????????????? { zend_verify_namespace(TSRMLS_C); };statement:unticked_statement { DO_TICKS(); }|???? T_STRING ':' { zend_do_label(&$1 TSRMLS_CC); };unticked_statement:|???? T_ECHO echo_expr_list ';'echo_expr_list:echo_expr_list ',' expr { zend_do_echo(&$3 TSRMLS_CC); }|???? expr???????????????????????? { zend_do_echo(&$1 TSRMLS_CC); };expr:r_variable???????????????????????? { $$ = $1; }|???? expr_without_variable????????? { $$ = $1; };expr_without_variable:|???? scalar??????????????????? { $$ = $1; }scalar:|???? common_scalar?????????????? { $$ = $1; };common_scalar:|???? T_CONSTANT_ENCAPSED_STRING???? { $$ = $1; };
Syntax analysis starts from start and goes from top to bottom. a PHP script corresponds to a top_statement_list and is then divided into one statement for each row, we found that echo 'hello world' is an unticked_statement (pay attention to the echo_expr_list statement ,? We can also find that echo 'hello' and 'world' are supported in syntax ). Finally, the syntax parsing of this line is ended by recursion to the T_CONSTANT_ENCAPSED_STRING state. Here, we ignore the compilation principle and how to trace back in the syntax analysis phase. let's take a look at the Zend Engine's own problems. The code in the block "{}" behind the rule is used to process the action when the rule is scanned. we can see that the execution of echo calls the zend_do_echo function. In the block of the action declaration, we can see $, $1, $2, $3, and so on. These correspond to the return values in the rule, parameter 1, parameter 2 ......, The returned values and parameters are of the YYSTYPE type, which is defined in 43 rows: # define YYSTYPE znode. Znode is defined in zend_compile.h: I noticed the structure of zend_op, so I found that this is the opcode structure corresponding to each statement !!!! The structure of opcode is very similar to that of assembly. one operator and two operands. In the Zend Engine, the main thing of every opcode is the handler. we will see how the handler is generated in Zend. Here, let's take a look at the opcode generated in the Hello World example. Install vld and run: php-dvld. active = 1 HelloWorld. php, we can see the opcode list compiled by this php file: We can see that the opcode type of the echo statement is ECHO, and return has no return value. there is only one operand "Hello World ". After the syntax analysis, we have compiled opcode for each statement, and Zend will put it into an op_array (actually a list of opcode ). Let's look at what zend_do_echo has done: first, use get_next_op to generate an opcode at the end of the current op_array, and then set its opcode type to ZEND_ECHO, then set its first parameter op1 and mark the second parameter op2 as unnecessary (unused ). After so many steps, we get a list of op_array. Each opcode in this list is bound to its own type, next, let's take a look at how each opcode node binds handler. Zend_vm_def.h defines the handler of ZEND_ECHO. it is worth noting that the value 40 here will be used soon, because the echo parameters can include constants, variables, and so on, therefore, different handler define all handler corresponding to opcode in zend_vm_execute.h. we only pay attention to echo-related handler and pay attention to the code:
Void zend_init_opcodes_handlers (void) {static const opcode_handler_t labels [] = {// 40913 rows of labels, // 41914 rows of labels, ZEND_ECHO_SPEC_CONST_HANDLER };
It takes a short time to remember the labels and the number of rows. We found that the calculation of the return statement at the last side of the handler method was obtained. according to the preceding echo, the opcode is 40 (assuming that the op1 parameter and op2 parameter are both 0 ), the corresponding handler is: zend_opcode_handlers [40*25 + 0*5 + 0*5] =? Zend_opcode_handlers [1000] =? Labels [1000] =? ZEND_ECHO_SPEC_CONST_HANDLER (why? Because: 41914 rows-40913 rows-1 = 1000 ). Before executing opcode on the VM, we have explained that zend_compile_file compiles a script into a list of opcode :? EG (active_op_array) =?
Zend_compile_file(File_handle, type TSRMLS_CC );
Zend_execute(EG (active_op_array) TSRMLS_CC); after that, the Zend Engine uses zend_execute to execute the returned opcode. We locate zend_execute and finally execute it to row 337 of Zend/zend_vm_execute.h: We can see that the current opcode list will be cycled during virtual machine execution, and then the handler of Each opcode line will be called, determine what to do next based on the handler return value (for example, function call ). In this article, we only focus on the items related to Hello World. we know that the handler of echo is ZEND_ECHO_SPEC_CONST_HANDLER. through the final positioning, you will find that it has called: zend_write = (zend_write_func_t) utility_functions-> write_function; here, utility_functions contains some basic handler. each sapi access layer modifies the basic function pointer here. For example, in command line mode, sapi_cli_single_write is called: from the source code, we can see that the final write operation is to call write/fwrite to write to the standard output stream (that is, the terminal screen ). According to the process at the end of the conclusion, the flowchart is as follows: