PrefaceThis time around Hello World, I started the process of Zend virtual machine execution. Hello World's PHP Version: <?php echo ' Hello World '?> the lexical analysis phase of the previous article to analyze the script above to parse out a token sequence: We get a token sequence: T_open_tag, T _echo, t_constant_encapsed_string, '; ', T_close_tag. But in the process of Zend virtual machine execution, how to analyze this token sequence.
Track running trajectoryLet's start with the command line, where the DO_CLI function in $PHPSRC/SAPI/CLI/PHP_CLI.C receives the command line parameter input (Php-f helloworld.php represents the execution helloworld.php file). We traced the php_execute_script definition to the $PHPSRC/MAIN/MAIN.C, followed by a call to Zend_execute_scripts () <zend/zend.c>, in Zend_execute _scripts's definition inside we found: EG (active_op_array) =
Zend_compile_file(File_handle, type TSRMLS_CC);
Zend_execute(EG (Active_op_array) tsrmls_cc); First, the file is parsed into opcode intermediate code through Zend_compile_file (this step will undergo lexical parsing), and the generated intermediate code is executed using Zend_execute (this is the so-called runtime). This is much like the C language of the compiler, first compiled into the assembly, and then into machine code, where the opcode is similar to the C-language compiler generated in the compilation process. Also can extend a thought, because every time parsing php file, need to pass lexical grammar analysis to get corresponding opcode, in fact, the script file does not change, the generation of opcode also need not change, so in order to reduce the execution time of PHP script, The opcode of the script can be cached (for example, cached in shared memory). I give a flowchart, and then along with this flowchart, look at what Zend did: Let's see how to compile the opcode first.
lexical grammar analysis->opcodeFrom the previous section we learned that we compiled the script files out of opcode through Zend_compile_file (actually compile_file () < defined in ZEND/ZEND_LANGUAGE_SCANNER.C's 555 lines >). The opcode is actually compiled by Zendparse this API. PHP's parser is built with bison, and runs after installation in the $phpsrc/zend directory: Bison-o zend_language_parser.c zend_language_parser.y The parser zend_language_parser.c is generated in the Zend directory. And the zendparse here is the yyparse! inside the parser. We ignore the generated parser, and follow the example of Hello World to trace the Bison declaration file (I removed the statement that I don't want to close):
start:top_statement_list {zend_do_end_compilation (tsrmls_c);}; top_statement_list:top_ statement_list {zend_do_extended_info (Tsrmls_c);} top_statement {handle_interactive ();} | & nbsp /* Empty * *;top_statement:statement {Zend_ Verify_namespace (Tsrmls_c); }; statement:unticked_statement {do_ticks ();} | t_string ': ' {zend_do_label (&$1 TSRMLS_CC ); Unticked_statement: | t_echo echo_expr_list '; Echo_expr_list:echo_expr_list ', ' expr {zend_do_echo (&$3 tsrmls_cc);} | expr
{Zend_do_echo (&$1 tsrmls_cc);}; Expr:r_variable {$$ = $ | expr_without_variable
{$$ = $;}; Expr_without_variable: | scalar
{$$ = $} Scalar: | common_scalar
{$$ = $;}; Common_scalar: | t_constant_encapsed_string {$$ = $};
Parsing from start, the Top-down analysis, a PHP script corresponds to a top_statement_list, then divided into each line of a statement statement, found that echo ' Hello world ' is a unticked_ Statement (pay attention to Echo_expr_list's statement, we can also find that the syntax supports echo ' Hello ', ' World '). The final recursion to the t_constant_encapsed_string state ends the parsing of the line. Here we ignore the compiler principle in the parsing phase is how to do backtracking and so on, we focus on the Zend engine itself. The code inside the block "{}" behind the rule is used to handle the actions that are scanned into this rule, and it is possible to see that Echo's execution calls the Zend_do_echo function. In the block of the action statement, we see the $$, $1,$2,$3, these correspond to the return values in the rule, parameter 1, Parameter 2 ..., the return value here and the parameters are Yystype types, which are defined in 43 rows: #define YYSTYPE znode. The definition of Znode is in zend_compile.h: Notice the structure of ZEND_OP, and the trace finds that this is the opcode structure of each of the last statements .... The structure of the opcode is very similar to the assembly, an operator, two operands. In the Zend engine, each opcode main thing is that handler, we will see how the Zend inside produces this handler. When we get here, let's hold and look back, and we'll see what the opcode of the example of Hello World is. Install VLD, then run: php-dvld.active=1 helloworld.php, we can see the php file compiled opcode list: You can see the echo of the statement opcode type is echo, There is no return value, only one operand "Hello world". Now after parsing, we compile the opcode,zend for each statement and put it in a op_array (which is actually a opcode list). Looking back we looked at what Zend_do_echo did: first generate a opcode by get_next_op on the last side of the current Op_array, then set its opcode type to be zend_echo, and then set its first argument OP1, whileMarking the second parameter OP2 is not required (unused). After so many steps we got a list of Op_array, each of the opcode in the list is bound to its own type, and then we look at how each opcode node binds handler. Zend_vm_def.h defines Zend_echo's handler, noting that the 40 here will need to be used, because the parameters of echo can have several: constants, variables, and so on, so corresponding to different handler in Zend_vm_ Execute.h defines all the handler corresponding to opcode, we only focus on the echo-related handler and note the code: void Zend_init_opcodes_handlers (void) {static const opcode_handler_t labels[] = {//40913 row zend_echo_spec_const_handler,//41914 row Zend_echo_spec_const_handler,zend_echo_ Spec_const_handler,zend_echo_spec_const_handler,zend_echo_spec_const_handler}; Please take a short time to remember the labels and the number of rows here first. The calculation of the last-side return statement of the method for obtaining handler is found, according to the opcode of the echo above, which is 40 (assuming that the type of the two parameter op1,op2 is 0), the corresponding handler is: Zend_opcode_ HANDLERS[40*25+0*5+0*5] = zend_opcode_handlers[1000] = labels[1000] = zend_echo_spec_const_ HANDLER (How did you get here?) Because: 41914 lines-40913 lines -1=1000).
virtual machine Execution opcodeWe have already explained that zend_compile_file a script into a opcode list:eg (active_op_array) =
Zend_compile_file(File_handle, type TSRMLS_CC);
Zend_execute(EG (Active_op_array) tsrmls_cc); After that, the Zend engine executes the returned opcode with Zend_execute. We're positioned at the end of Zend_execute's 337 lines to zend/zend_vm_execute.h: As you can see, when the virtual machine executes, it loops through the current opcode list, and then calls each row opcode handler, Determine what to do next (such as a function call, etc.) based on the handler return value, and then expand later. In this article we are only interested in the things that are related to Hello World, and we know that ECHO's HANDLER is Zend_echo_spec_const_handler, and by the final position you will find it invoked: Zend_write = (zend_ write_func_t) utility_functions->write_function; Here the utility_functions contains some basic handler, each SAPI access layer modifies the underlying function pointers here, for example, in the command-line mode, the last call to Sapi_cli_single_write: from the source, We see that the last write operation is to invoke the Write/fwrite write to the standard output stream (also that is, the terminal screen).
ConclusionFinally, according to the front process, and then expand the flowchart is: