PHP kernel exploration: the Zend virtual machine has learned that the execution process of a PHP file on the server side includes the following two major processes:
- Hand the file to be executed by the php program. after the php program completes basic preparation, start the PHP and Zend engines and load the registered extension modules.
- After initialization, read the script file. the Zend Engine performs lexical analysis and syntax analysis on the script file. Then compiled into opcode for execution. For example, if opcode caches such as apc are installed, the compilation process may be skipped and opcode is directly read from the cache for execution.
In step 2, lexical analysis, syntax analysis, intermediate code compilation, and execution of intermediate code are collectively referred to as Zend virtual machines. Compared with compiled languages such as Java and C #, PHP does not need to be compiled to run manually. We call it an explanatory language. Java has its own Java virtual machine, which implements a unified language on multiple platforms; C # has its own. NET virtual machine, which implements multiple languages on a single platform. PHP, like them, also has its own Zend virtual machine. They are essentially the same. they are all abstract computers. These virtual machines are abstracted from another language in a lower-layer language. they have their own instruction sets and their own memory management system. They will eventually convert the implementation of languages with higher levels of abstraction to those with lower levels of abstraction, and implement other auxiliary functions, such as memory management and garbage collection mechanisms, in order to reduce the programmer's work on the specific implementation, so that more time and energy can be invested in the business logic. From the abstract level, Zend virtual machines are more advanced than Java and other languages. the advanced level here is not to say that the functions are more powerful or more efficient. to put it simply, Zend virtual machines are farther away from the real machine implementation. Over the past few years, the development of language has only been constantly abstract, constantly away from machines, and there is no fundamental change.
Here we will talk about the implementation principle and key data structure of the Zend virtual machine from the past and present of the virtual machine, and introduce a syntax implementation example and a description of the source code encryption and decryption process.
In wiki, Virtual machines are defined as Virtual machines, which refer to a special software in the architecture of computer science, he can create an environment between the computer platform and the end user, and the end user operates the software based on the environment created by the software. In computer science, virtual machines are the software implementation of computers that can run programs like real machines.
A virtual machine is an abstract computer that has its own instruction set and its own memory management system. The languages implemented on such virtual machines are lower abstract-level languages which are more clear and easier to learn.
How is the PHP file parsed, what is the generated intermediate code, how is the generated intermediate code mapped to the actual PHP code, and how is the generated intermediate code executed? What intermediate data will be stored during execution? Can the entire virtual machine be optimized? How to optimize it?
Zend virtual machine architecture
Abstract The implementation of Zend virtual machines from the concept layer. we can divide the architecture of Zend virtual machines into the interpretation layer, execution engine, and intermediate data layer.
Zend virtual machine architecture
When a piece of PHP code enters the Zend virtual machine, it will be executed in two steps: compilation and execution. For an explanatory language, this is a creative move, but the current implementation is not thorough. After the PHP code enters the Zend virtual machine, although it will be executed in these two steps, these two steps are continuous for a general execution process, that is to say, it is not converted into a compiled language like Java: generate an intermediate file to store the compiled result. If you execute such an operation each time, the performance of the PHP script is a huge loss. Although there are cache solutions like APC and eAccelerator. However, they do not change in nature and cannot separate the two steps to expand and expand.
Interpretation layer
The interpretation layer is where the Zend virtual machine executes the compilation process. It consists of three parts: lexical parsing, syntax parsing, and intermediate code compilation. Lexical analysis removes spaces, removes comments, and splits the PHP source file to be executed into tokens and hierarchical structures ).
Syntax analysis is to execute some actions based on the defined syntax rules of the accepted tag sequence. the Bison currently used by the Zend virtual machine uses the BNF paradigm to describe the syntax. Compile and generate intermediate code to generate intermediate code against the opcode developed by Zend virtual machine based on the syntax parsing result. in PHP5.3.1, the Zend virtual machine supports 135 commands (see Zend/zend_vm_opcodes.h file ), whether it is a simple output statement or a complex recursive call of the program, the Zend virtual machine will eventually convert all the PHP code we have written into the sequence of these 135 commands, then, it is executed in sequence in the execution engine.
Intermediate Data Layer
When the Zend virtual machine executes a PHP code, it needs memory to store a lot of things, such as intermediate code, the function list that comes with PHP, the user-defined function list, and the class that comes with PHP, user-defined classes, constants, objects created by programs, parameters passed to functions or methods, return values, local variables, and intermediate results of some operations. We call all the data stored here an intermediate data layer.
If PHP is attached to the Apache2 server as a mod extension, some data in the intermediate data layer may be shared by multiple threads, if PHP comes with a function list. If you only consider the method of a single process, when a process is created, it will be loaded with various function lists, class lists, constant lists, and so on. After the PHP code is compiled at the interpretation layer, various user-defined functions, classes, or constants are added to the previous list, however, the assignment values of some fields in the structure of these functions are different.
When the execution engine executes the generated intermediate code, a new execution intermediate data structure (zend_execute_data) is added to the stack of the Zend virtual machine ), it includes snapshots of the list of active symbols in the current execution process, and some local variables.
Execution Engine
The execution engine of the Zend virtual machine is a very simple implementation. it only calls the corresponding method execution step by step based on the intermediate code sequence (EX (opline. There is no variable in the execution engine that stores the next instruction like a PC Register. when the Zend virtual machine executes a certain instruction, all its tasks are completed, this command calls the next command to move the pointer of the sequence forward to execute the next command and then execute the return statement at the end. This is essentially a nested function call.
Back to the question at the beginning, after PHP completes lexical analysis, syntax analysis, and intermediate code generation, the PHP file will be parsed into the PHP intermediate code opcode. The generated intermediate code does not have a full one-to-one correspondence with the actual PHP code. It only generates intermediate code for the PHP code given by the user and the PHP syntax rules and some internal conventions, and the intermediate code also needs to rely on some global variables for data transfer and association. The execution process of the generated intermediate code is based on the smoothness of the intermediate code and depends on the global variables in the execution process for step-by-step execution. Of course, some function jumps will also lead to offsets, but will eventually return to the offset point.
Additional reading
The topic list of this article is as follows:
- PHP kernel exploration: starting from the SAPI interface
- PHP kernel exploration: start and end of a request
- PHP kernel exploration: one request lifecycle
- PHP kernel exploration: single-process SAPI lifecycle
- PHP kernel exploration: SAPI lifecycle of multiple processes/threads
- PHP kernel exploration: Zend Engine
- PHP kernel exploration: Exploring SAPI again
- PHP kernel exploration: Apache Module Introduction
- PHP kernel exploration: Support for PHP through mod_php5
- PHP kernel exploration: Apache running and hook functions
- PHP kernel exploration: embedded PHP
- PHP kernel exploration: FastCGI of PHP
- PHP kernel exploration: how to execute PHP scripts
- PHP kernel exploration: execution details of PHP scripts
- PHP kernel exploration: OpCode
- PHP kernel exploration: opcode in PHP
- PHP kernel exploration: interpreter execution process
- PHP kernel exploration: variable overview
- PHP kernel exploration: variable storage and type
- PHP kernel exploration: hash table in PHP
- PHP kernel exploration: Understanding the hash table in Zend
- PHP kernel exploration: PHP hash algorithm design
- PHP kernel exploration: translating a HashTables article
- PHP kernel exploration: What is a hash collision attack?
- PHP kernel exploration: constant implementation
- PHP kernel exploration: variable storage
- PHP kernel exploration: variable type
- PHP kernel exploration: variable value operations
- PHP kernel exploration: variable creation
- PHP kernel exploration: predefined variables
- PHP kernel exploration: variable retrieval
- PHP kernel exploration: variable type conversion
- PHP kernel exploration: Implementation of weak type variables
- PHP kernel exploration: Implementation of static variables
- PHP kernel exploration: variable type prompt
- PHP kernel exploration: variable lifecycle
- PHP kernel exploration: variable assignment and destruction
- PHP kernel exploration: variable scope
- PHP kernel exploration: strange variable name
- PHP kernel exploration: variable value and type storage
- PHP kernel exploration: Global variable
- PHP kernel exploration: variable type conversion
- PHP kernel exploration: Memory management
- PHP kernel exploration: Zend memory manager
- PHP kernel exploration: PHP memory management
- PHP kernel exploration: memory application and destruction
- PHP kernel exploration: reference count and write-time replication
- PHP kernel exploration: the garbage collection mechanism of PHP5.3
- PHP kernel exploration: cache in memory management
- PHP kernel exploration: COW mechanism for writing
- PHP kernel exploration: Array and linked list
- PHP kernel exploration: using the hash table API
- PHP kernel exploration: Array Operations
- PHP kernel exploration: Array source code analysis
- PHP kernel exploration: Function Classification
- PHP kernel exploration: internal structure of functions
- PHP kernel exploration: function structure conversion
- PHP kernel exploration: process of defining functions
- PHP kernel exploration: function parameters
- PHP kernel exploration: zend_parse_parameters function
- PHP kernel exploration: function return value
- PHP kernel exploration: form parameter return value
- PHP kernel exploration: function call and execution
- PHP kernel exploration: reference and function execution
- PHP kernel exploration: Anonymous functions and closures
- PHP kernel exploration: Object-oriented
- PHP kernel exploration: class structure and implementation
- PHP kernel exploration: Class member variables
- PHP kernel exploration: Class Member methods
- PHP kernel exploration: class prototype zend_class_entry
- PHP kernel exploration: class definition
- PHP kernel exploration: Access control
- PHP kernel exploration: inheritance, polymorphism and abstract class
- PHP kernel exploration: Magic functions and latency binding
- PHP kernel exploration: reserved class and special class
- PHP kernel exploration: Object
- PHP kernel exploration: creating object instances
- PHP kernel exploration: object attribute read/write
- PHP kernel exploration: namespace
- PHP kernel exploration: defining interfaces
- PHP kernel exploration: Inheritance and implementation interfaces
- PHP kernel exploration: resource type
- PHP kernel exploration: Zend virtual machine
- PHP kernel exploration: Virtual Machine lexical parsing
- PHP kernel exploration: Virtual Machine syntax analysis
- PHP kernel exploration: Execution of intermediate code opcode
- PHP kernel exploration: code encryption and decryption
- PHP kernel exploration: Specific execution process of zend_execute
- PHP kernel exploration: variable reference and counting rules
- PHP kernel exploration: new garbage collection mechanism description