A deep understanding of the PHP code execution process

Source: Internet
Author: User
1. preface language is an expression of communication and communication. Each language has its own symbols, expressions, and rules. For programming languages, it also consists of specific symbols, specific expressions and rules. The role of language is communication, whether it is natural language, or preparation 1. Preface

Language is an expression of communication and communication. Each language has its own symbols, expressions, and rules. For programming languages, it also consists of specific symbols, specific expressions and rules. The role of a language is communication. whether it is natural language or programming language, the difference is that natural language is a tool for communication between people, programming languages are communication channels between people and machines.

For PHP, it is also a set of commands that comply with certain rules. After programmers implement their ideas in the PHP language, they can use the PHP virtual machine (specifically, the PHP language engine Zend) these PHP commands are converted into C language (which can be understood as a lower-level instruction set) commands, while C language is converted into assembly language, and finally the assembly language converts the processor rules into machine code execution. This is a process in which high-level abstraction is constantly embodied and refined.

The conversion from one language to another is called compilation, which can be called the source language and the target language respectively. This compilation process occurs when the target language is lower (or lower) than the source language ). The compilation process of language conversion is completed by the compiler. encoder is usually divided into a series of processes: lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and target code generation. The functions of the previous stages (lexical analysis, syntax analysis, and semantic analysis) are to analyze the source program. we can call it the front-end of the compiler. The subsequent phases (intermediate code generation, code optimization, and target code generation) are used to construct the target program. we can call it the back-end of the compiler. A language is called a compilation language. it is generally because there is a translation process before the execution of the program. The key point is that there is a completely different form of equivalent program generation. PHP is called an interpreted language because no such program is generated. it generates intermediate code Opcode, which is only an internal data structure of PHP.

II. PHP code execution process

For example, we can write a simple program.

 
What is the execution process of this simple program? In fact, the execution process is also divided into four steps as we mentioned earlier. (The Zend execution process of the PHP language engine does not include the execution process of the Web server .)

1. scanning (Lexing), converting PHP code into a language snippet (Tokens) 2. parsing: converts Tokens into a simple and meaningful expression. 3. compilation, compile the expression into Opocdes4.Execution, and execute Opcodes sequentially. each time, the PHP script function is implemented.


Note 1: Opcode is an intermediate language after PHP scripts are compiled, like ByteCode of Java or MSL of. NET.

Note 2: The current Cache, such as APC, enables PHP to Cache Opcodes. in this way, when a request comes, you do not need to repeat the previous three steps, this greatly improves the execution speed of PHP.

1. Scanning (Lexing) converts PHP code into a language snippet (Tokens)

So what is Lexing? Those who have learned the compilation principles should have some knowledge about the lexical analysis steps in the compilation principles. Lex is a basis table for lexical analysis.

For PHP, Flex is used at the beginning, and then changed to re2c. for MySQL, Flex is used for lexical analysis. In addition, Lex is also used as the standard lexical analyzer of UNIX systems. These tools will read an input string stream that represents the lexical analyzer rules, and then output the source code of the lexical analyzer implemented in C language. Here we will only introduce PHP's current lexical analyzer, re2c. The Zend/zend_language_scanner.l file in the source code directory is the re2c rule file. to modify the rule file, you must install re2c to recompile the file and generate a new rule file. Zend/zend_language_scanner.c will perform lexical analysis based on the PHP code entered by Zend/zend_language_scanner.l to obtain one "word ".

From PHP4.2, a function named token_get_all is provided. This function can generate a PHP code Scanning into Tokens;

We use the following code to use the token_get_all function to process the PHP code we mentioned at the beginning.

 ";$phpcode = <<
 
  PHPCODE;// $tokens = token_get_all($phpcontent);// print_r($tokens);$tokens = token_get_all($phpcode); foreach ($tokens as $key => $token) {$tokens[$key][0] = token_name($token[0]);}print_r($tokens);?>
 

Note: For ease of understanding and viewing, I use the token_name function to change the parser code to the symbol name description.

If you want to see the original shoes, you can comment out the lines 10 and 11 in the above code.

Interpreter code list see: http://www.php.net/manual/zh/tokens.php

The result is as follows:

Array(    [0] => Array        (            [0] => T_OPEN_TAG            [1] =>  1        )    [1] => Array        (            [0] => T_WHITESPACE            [1] =>             [2] => 2        )    [2] => Array        (            [0] => T_ECHO            [1] => echo            [2] => 2        )    [3] => Array        (            [0] => T_WHITESPACE            [1] =>              [2] => 2        )    [4] => Array        (            [0] => T_CONSTANT_ENCAPSED_STRING            [1] => "Hello World!"            [2] => 2        )    [5] =>     [6] => Array        (            [0] => T_WHITESPACE            [1] =>              [2] => 2        )    [7] =>     [8] => Array        (            [0] => T_WHITESPACE            [1] =>              [2] => 3        )    [9] => Array        (            [0] => T_LNUMBER            [1] => 1            [2] => 3        )    [10] => Array        (            [0] => T_WHITESPACE            [1] =>              [2] => 3        )    [11] =>     [12] => Array        (            [0] => T_WHITESPACE            [1] =>              [2] => 3        )    [13] => Array        (            [0] => T_LNUMBER            [1] => 1            [2] => 3        )    [14] =>     [15] => Array        (            [0] => T_WHITESPACE            [1] =>             [2] => 3        )    [16] => Array        (            [0] => T_ECHO            [1] => echo            [2] => 4        )    [17] => Array        (            [0] => T_WHITESPACE            [1] =>              [2] => 4        )    [18] =>     [19] => Array        (            [0] => T_WHITESPACE            [1] =>             [2] => 4        )    [20] => Array        (            [0] => T_CLOSE_TAG            [1] => ?>            [2] => 5        ))

After analyzing the returned results, we can find that all strings, characters, and spaces in the source code are returned as they are.

The characters in each source code appear in the corresponding sequence.

Other statements, such as labels, operators, and statements, are converted into three parts.

1. Token ID interpreter code (that is, the corresponding code for changing Token within Zend, such as T_ECHO and T_STRING)

2. original content in source code

3. the number of lines in the source code.

2. Parsing: converts Tokens into a simple and meaningful expression.

The next step is the Parsing phase. Parsing first discards more than spaces in the Tokens Array,

Then convert the remaining Tokens into a simple expression.

1.echo a constant string2.add two numbers together3.store the result of the prior expression to a variable4.echo a variable

Bison is a general purpose analyzer generator. It converts the description of the LALR (1) context-independent grammar into a C program that analyzes the grammar. It can be used to generate interpreters, compilers, protocol implementations, and other programs. Bison is compatible with Yacc. all well-written Yacc syntaxes should work in Bison without modification. It is not only compatible with Yacc, but also has many features not available in Yacc.

The Bison analyzer file is a C code that defines a function named yyparse and implements a syntax. This function is not a C program that can complete all syntax analysis tasks. In addition, we must provide some additional functions, such as the lexical analyzer and the error reporting function called when the analyzer reports an error. We know that a complete C program must start with a function named main. if we want to generate an executable file and run the syntax parser, we need the main function, and directly or indirectly calls yyparse somewhere, otherwise the syntax analyzer will never run.

In the PHP source code, the lexical analyzer calls the lex_scan function defined by the re2c rule, and the function provided to Bison is zendlex. Yyparse is replaced by zendparse.

3. Compilation: After the expression is compiled into Opocdes, it is the Compilation stage. it will compile Tokens into op_array, and each op_arrayd contains the following five parts:

In PHP implementation, the opcode structure is as follows:

Struct _ zend_op {opcode_handler_t handler; // znode result called when the opcode is executed; znode op1; znode op2; ulong extended_value; uint lineno; zend_uchar opcode; // opcode code };

Similar to the CPU command, there is an opcode field indicating the Command and the number of operations operated by this opcode.

PHP is not as underlying as assembly, and other information may be needed during script execution. the extended_value field stores this information.

The result field stores the result after the command is executed.

The PHP script is compiled to opcode and saved in op_array. its internal storage structure is as follows:

Struct _ zend_op_array {/* Common elements */zend_uchar type; char * function_name; // if it is a user-defined function, the name of the function zend_class_entry * scope; zend_uint fn_flags will be saved here; union _ zend_function * prototype; zend_uint num_args; zend_uint limit; limit * arg_info; zend_bool limit; unsigned char return_reference;/* END of common elements */zend_bool limit; limit * refcount; zend_op * opcodes; // opcode array zend_uint last, size; zend_compiled_variable * vars; int last_var, size_var ;//...}

As mentioned above, opcodes is saved here and executed by the following execute function during execution:

ZEND_API void execute (zend_op_array * op_array TSRMLS_DC) {//... cyclically execute opcode in op_array or execute opcode in other op_array}

As mentioned above, each opcode has an opcode_handler_t function pointer field for executing this opcode.

PHP can process opcode in three ways: CALL, SWITCH, and GOTO.

PHP uses the CALL method by default, that is, the function CALL method. because opcode execution is frequently required by every PHP program,

SWITCH or GOTO can be used for distribution. generally, GOTO is more efficient,

However, whether the efficiency is improved depends on different CPUs.

In our above example, our PHP code will be Parsing:

* ZEND_ECHO     'Hello World%21'* ZEND_ADD       ~0 1 1* ZEND_ASSIGN  !0 ~0* ZEND_ECHO     !0* ZEND_RETURN  1
You may ask, Where Did Our $ a go? This section describes the operands. each operand consists of the following two parts:
A) op_type: IS_CONST, IS_TMP_VAR, IS_VAR, IS_UNUSED, or IS_CV B) u, a consortium. according to op_type, saves the value (const) or left value (var) of this operand with different types respectively)

For var, each var is different from IS_TMP_VAR. as the name suggests, this is a temporary variable that stores some op_array results for the next op_array to use, the u of this operand stores a handle (integer) pointing to the variable table ~ . For example ~ 0 indicates the unknown temporary variable IS_VAR of the variable Table No. 0. this is our variable in the general sense. They start with $.
IS_CV indicates a cache mechanism used by the compiler after ZE2.1/PHP5.1. this variable stores the address of the variable referenced by it. when a variable is referenced for the first time, the variable will be used up by the CV, and you do not need to search for the active symbol table again for future reference of this variable. The CV variable starts! .

In this case, our $ a is optimized! 0.
For example, we use VLD to view the opcodes display as follows: html
(Note: the blog post of laruence is from years. although the data in this article is similar to that of laruence, PHP has changed a lot since now, so don't be surprised if you see the program running results in your blog and the related descriptions are different from those of laruence. The results of your humble colleagues are all run and verified, and the PHP version is 5.4)
TIPI: http://www.php-internals.com/

The layout is always messy, and it has been changed several times --.





Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.