[PHP] [] an in-depth understanding of the principles of PHP Opcodes By laruence (http://www.laruence.com /)
· Address: http://www.laruence.com/2008/06/18/221.html
Recently, I want to give a lecture to Yahoo colleagues about the internal mechanism of PHP and Apache processing requests. I just wrote some text about Opcodes and I sent it to them, this article is based on Sara Golemon's Understanding OPcode
Opcode is an intermediate language after PHP script compilation. it is like ByteCode of Java or MSL of. NET. for example, you wrote the following PHP code:
echo "Hello World"; $a = 1 + 1; echo $a;
PHP executes the code in the following four steps (it should be the PHP language engine Zend)
1. Scanning (Lexing) converts PHP code into a language snippet (Tokens)
2. Parsing: converts Tokens into a simple and meaningful expression.
3. Compilation: compile the expression into Opocdes
4. Execution: execute Opcodes one by one at a time to implement the PHP script function.
Note: The current Cache, such as APC, enables PHP to Cache Opcodes. in this way, when a request comes, you do not need to repeat the previous three steps, this greatly improves the execution speed of PHP.
So what is Lexing? Those who have learned the compilation principles should have some knowledge about the lexical analysis steps in the compilation principles. Lex is a basis table for lexical analysis. Zend/zend_language_scanner.c will perform lexical analysis on the entered PHP code based on Zend/zend_language_scanner.l (Lex file) to get a "word". PHP4.2 starts to provide a function called token_get_all, this function can introduce a piece of PHP code Scanning into Tokens;
If you use this function to process the PHP code we mentioned at the beginning, the following results will be obtained:
Array( [0] => Array ( [0] => 367 [1] => Array ( [0] => 316 [1] => echo ) [2] => Array ( [0] => 370 [1] => ) [3] => Array ( [0] => 315 [1] => "Hello World" ) [4] => ; [5] => Array ( [0] => 370 [1] => ) [6] => = [7] => Array ( [0] => 370 [1] => ) [8] => Array ( [0] => 305 [1] => 1 ) [9] => Array ( [0] => 370 [1] => ) [10] => + [11] => Array ( [0] => 370 [1] => ) [12] => Array ( [0] => 305 [1] => 1 ) [13] => ; [14] => Array ( [0] => 370 [1] => ) [15] => Array ( [0] => 316 [1] => echo ) [16] => Array ( [0] => 370 [1] => ) [17] => ;)
After analyzing the returned results, we can find that all strings, characters, and spaces in the source code are returned as they are. The characters in each source code appear in the corresponding sequence. Other statements, such as tags, operators, and statements, are converted into an Array containing two parts: Token ID (that is, the corresponding code for changing Token within Zend, for example, t_ECHO, T_STRING), and the original content in the source code.
Next, it is the Parsing phase. Parsing first discards more than spaces in the Tokens Array, and then converts the remaining Tokens into a simple expression.
1. echo a constant string
2. add two numbers together
3. store the result of the prior expression to a variable
4. echo a variable
Then the Compilation phase is changed. it will compile Tokens into op_array, and each op_arrayd contains the following five parts:
1. indicates the operation type of each op_array, such as add and echo.
2. store the Opcode results.
3. operands 1 to Opcode
4. operand 2
5. the extended value is an integer to distinguish the overloaded operator.
For example, our PHP code will be Parsing:
ZEND_ECHO 'Hello world' ZEND_ADD ~ 0 1 1 ZEND_ASSIGN! 0 ~ 0 ZEND_ECHO! 0
You may ask, where is our $?
Well, we will introduce the operands. each operand consists of the following two parts:
A) op_type: IS_CONST, IS_TMP_VAR, IS_VAR, IS_UNUSED, or IS_CV
B) u, a consortium, stores the value (const) or left value (var) of this operand with different types according to op_type)
For var, each var is different.
IS_TMP_VAR, as the name suggests, is a temporary variable that stores the results of some op_array for the next op_array. The u of this operand stores a handle (integer) pointing to the variable table ), this type of operand is generally used ~ For example ~ 0 indicates the temporary variable of the variable table No. 0 unknown
IS_VAR is a variable in the general sense. They start with $.
IS_CV indicates a cache mechanism used by the compiler after ZE2.1/PHP5.1. this variable stores the address of the variable referenced by it. when a variable is referenced for the first time, the variable will be used up by the CV, and you do not need to search for the active symbol table again for future reference of this variable. The CV variable starts! .
In this case, our $ a is optimized! 0.