Opcode is a PHP script-compiled intermediate language , like Java's bytecode, or. Net of MSL.
For example, you write down the following PHP code:
<?php
echo "Hello World";
$a = 1 + 1;
echo $a;
?>
PHP executes this code in the following 4 steps (to be exact, it should be PHP's language engine Zend):
- Scanning (lexing), convert PHP code to language fragment (Tokens) (Scan-language fragment)
- parsing, convert tokens to simple and meaningful expressions (parse-expression)
- compilation, compiling the expression into Opocdes(encoded-opcodes)
- execution, executes opcodes sequentially, one at a time, thus realizing the function of PHP script. (executive opcodes)
Now some caches, such as APC, can make PHP cache opcodes, so that every time there is a request, there is no need to repeat the previous 3 steps , which can greatly improve the speed of PHP execution .
Then what is lexing?
Students who have learned the principles of compiling should lexical analysis Steps to understand, Lex is a lexical analysis based on table .
zend/zend_language_scanner.c will be entered according to the zend/zend_language_scanner.l(Lex file ). The PHP code carries out lexical analysis to get a "word ".
PHP4.2 began to provide a function called token_get_all, this function can be a section of PHP code scanning into tokens;
If we use this function to process the PHP code we mentioned at the beginning, we will get the following result:
- Array
- (
- [0] = = Array
- (
- [0] = 367
- [1] = = Array
- (
- [0] = +
- [1] = echo
- )
- [2] = = Array
- (
- [0] = 370
- [1] = =
- )
- [3] = = Array
- (
- [0] = 315
- [1] = "Hello World"
- )
- [4] = = ;
- [5] = = Array
- (
- [0] = 370
- [1] = =
- )
- [6] = =
- [7] = = Array
- (
- [0] = 370
- [1] = =
- )
- [8] = = Array
- (
- [0] = 305
- [1] = 1
- )
- [9] = = Array
- (
- [0] = 370
- [1] = =
- )
- [ten] = +
- [one] = = Array
- (
- [0] = 370
- [1] = =
- )
- [+] = Array
- (
- [0] = 305
- [1] = 1
- )
- [+] = ;
- [+] = Array
- (
- [0] = 370
- [1] = =
- )
- [+] = Array
- (
- [0] = +
- [1] = echo
- )
- [+] = Array
- (
- [0] = 370
- [1] = =
- )
- [+] = ;
- )
Analysis of the return result we can find that the source of strings, characters, spaces, will be returned as is. Each character in the source code appears in the appropriate order. And, other such as tags, operators, statements, will be converted to a two-part Array:token ID (that is, in the Zend internal change Token of the corresponding code, such as, t_echo,t_string), and the source of the original content.
Next, is the parsing stage, parsing first discards more spaces in the tokens array, and then converts the remaining tokens to a simple expression of one
- echo a constant string
- Add numbers together
- Store the result of the prior expression to a variable
- echo a variable
Then change the compilation stage, it will tokens compiled into a op_array, each op_arrayd contains the following 5 parts:
- The identification of the opcode number, indicating the type of operation for each op_array, such as Add, echo
- Results Store opcode results
- Operand 1 to opcode operand
- Number of Operations 2
- Extended value 1 shaping to differentiate the overloaded operator
For example, our PHP code will be parsing into:
* zend_echo ' Hello world '
* Zend_add ~0 1 1
* Zend_assign!0 ~0
* Zend_echo!0
Oh, you might ask, where is our $ A?
Well, this is about the operand, and each operand consists of the following two parts:
A) Op_type: for Is_const, Is_tmp_var, Is_var, is_unused, or IS_CV
b) U, a consortium that holds the value (const) or Lvalue (Var) of this operand in different types, depending on the op_type.
And for Var, every Var is different.
Is_tmp_var, as the name implies, this is a temporary variable, save some Op_array results, so that the next op_array to use, this operand of u holds a pointer to the variable table of a handle (integer), this operand is generally used to start, such as ~0, A temporary variable that represents the unknown number No. 0 of the variable table
Is_var This is our general sense of the variable, they start with a $ expression
IS_CV says ZE2.1/ PHP5.1 later compiler uses a cache mechanism, this variable holds the address of the variable referenced by it, when a variable is referenced for the first time, it will be CV up, the reference to this variable will not need to find the active symbol table again, CV variable to! The beginning indicates.
So it seems that our $ A is optimized to 0.
Deep understanding of PHP principles opcodes