When it comes to PHP's operating mechanism to introduce PHP modules, PHP has a total of three modules: the kernel, the Zend engine, and the extension layer, the PHP kernel is used to handle requests, file streams, error handling and other related operations, Zend Engine (ZE) to convert the source files into machine language, Then run it on the virtual machine; the extension layer is a set of functions, class libraries, and streams that PHP uses to perform certain operations. For example, we need the MySQL extension to connect to the MySQL database, and when Ze executes the program it may be necessary to connect several extensions, then ze will give control to the extension, and so on after the specific task is processed and then returned;
Finally, ZE returns the results of the program to the PHP kernel, which then transmits the results to the SAPI layer, which is eventually output to the browser.
PHP is simple, but mastering is not an easy thing to do. We have to know how it works in addition to using it.
PHP is a dynamic language for web development. Specifically, it is a software framework that implements a large number of components in the C language. More narrowly, you can think of it as a powerful UI framework.
What is the purpose of understanding the PHP bottom-up implementation? Dynamic language needs to understand it first, memory management, framework model worthy of our reference, through the expansion of development to achieve more powerful functions, optimize the performance of our program.
1. PHP Design concept and Features
Multi-process Model: Because PHP is a multi-process model, different requests between non-interference, so that a request to ensure that the suspension will not affect the overall service, of course, with the development of the Times, PHP has long supported multithreaded models.
Weakly typed languages: Unlike C + +, Java, C # and other languages, PHP is a weakly typed language. The type of a variable is not deterministic at first, and implicit or explicit type conversions may occur in the run, and the flexibility of this mechanism is very convenient and efficient in web development, and is detailed in the following PHP variables.
Engine (Zend) + component (EXT) mode reduces internal coupling.
The middle tier (SAPI) is isolated from Web server and PHP.
The syntax is simple and flexible, without too many specifications. The disadvantage leads to a mix of styles, but the poor programmer does not write too outrageous a program that harms the whole world.
2. PHP four-tier system
The core architecture of PHP is as follows:
As you can see from the diagram, PHP is a 4-tier system from bottom to top:
Zend Engine: Zend as a whole with pure C implementation, is the core part of PHP, it will be PHP code translation (lexical, parsing and so on a series of compilation process) for the execution of opcode processing and implementation of the corresponding processing methods, the implementation of basic data structures (such as Hashtable, OO), Memory allocation and management, providing the corresponding API method for external invocation, is the core of all, all the peripheral functions are around Zend implementation.
Extensions: Around the Zend Engine, Extensions provides a variety of basic services through a component-based approach, and our common set of built-in functions (such as the array series), standard libraries, etc. are implemented through extension. Users can also implement their own extension to achieve functional expansion, performance optimization and other purposes (such as paste in the PHP middle layer, Rich text parsing is the typical application of extension).
SAPI:SAPI full name is the server application programming Interface, that is, the service-side application programming interface, SAPI through a series of hook functions, so that PHP can interact with the peripheral data, which is a very elegant and successful PHP design, By SAPI successfully decoupling PHP itself from the upper-layer application, PHP can no longer consider how to implement compatibility for different applications, and the application itself can be handled differently for its own characteristics.
Upper application: This is the PHP program we usually write, through different sapi ways to get a variety of application patterns, such as the implementation of Web applications through webserver, the command line to run scripts, and so on.
If PHP is a car, then the frame of the car is PHP itself, Zend is the engine (engine), ext under the various components of the car's wheels, SAPI can be seen as a road, cars can run on different types of highways, and a PHP program is the implementation of a car running on the road. So we need: the engine with excellent performance + the right wheels + the right runway.
3. Sapi
As mentioned earlier, SAPI through a series of interfaces, so that external applications can exchange data with PHP and can be implemented according to different application characteristics of specific processing methods, some of our common SAPI are:
Apache2handler: This is the use of Apache as a webserver, using the mod_php mode of operation of the process, is now the most widely used.
CGI: This is another direct interaction between Webserver and PHP, known as the FASTCGI protocol, which has been fastcgi+php more and more recently this year, and is the only way to support asynchronous webserver.
CLI: Application Mode for command line invocation
4. PHP Execution Flow &opcode
Let's take a look at the process through which the PHP code executes.
As can be seen from the diagram, PHP implements a typical dynamic language execution process: After getting a piece of code, after the lexical parsing, parsing and other stages, the source program will be translated into instructions (opcodes), and then zend the virtual machine to execute the instructions in sequence to complete the operation. PHP itself is implemented in C, so the final call is also a C function, in fact, we can think of PHP as a C developed software.
The core of PHP execution is a translation of an instruction, that is, opcode.
OpCode is the most basic unit of PHP program execution. A opcode consists of two parameters (OP1,OP2), a return value, and a handler function. The PHP program is eventually translated into a set of opcode processing functions in the order of execution.
A few common processing functions:
Zend_assign_spec_cv_cv_handler: Variable Assignment ($a = $b)
Zend_do_fcall_by_name_spec_handler: Function call
Zend_concat_spec_cv_cv_handler: string concatenation $a. $b
Zend_add_spec_cv_const_handler: addition operation $a +2
Zend_is_equal_spec_cv_const: judging equality $a ==1
Zend_is_identical_spec_cv_const: judging equality $a ===1
5. hashtable-Core Data structure
Hashtable is the core data structure of Zend, in PHP almost used to implement all common functions, we know that the PHP array is its typical application, in addition, inside the Zend, such as function symbol table, global variables, etc. are also based on hash table to achieve.
PHP Hash table has the following features:
Support for typical key->value queries
Can be used as an array
Adding, removing Nodes is O (1) complexity
Key supports mixed type: Associative number combined index array
Value supports mixed type: Array ("string", 2332)
Supports linear traversal: such as foreach
Zend Hash table implements a typical hash list hash structure and provides a forward and backward traversal of the array by attaching a doubly linked list. Its structure is as follows:
It can be seen that in hash table, there are both key->value forms of hash structure and bidirectional linked list mode, which makes it very convenient to support fast lookup and linear traversal.
Hash structure: The hash structure of Zend is a typical hash table model, which solves the conflict by means of a linked list. It is important to note that the Zend hash table is a self-growing data structure that, when the number of hash tables is full, dynamically expands and re-positions the elements in twice-fold ways. The initial size is 8. In addition, in the Key->value Quick Find, Zend itself has done some optimization, through the way of space-changing time to speed up. For example, in each element, a variable nkeylength is used to identify the length of the key for quick determination.
Doubly linked list: Zend hash table The linear traversal of elements is realized through a linked list structure. In theory, it is enough to do a traversal using a one-way list, the reason is to use a doubly linked list, the main purpose is to quickly delete, avoid traversal. Zend hash table is a composite structure that, when used as an array, supports common associative arrays that can also be used as sequential index numbers, and even allow a mixture of 2.
PHP associative arrays: Associative arrays are typical hash_table applications. A single query process takes the following steps (as you can see from the code, this is a common hash query process and adds some quick judgments to speed up lookups.) ):
Getkeyhashvalue H;index = n & ntablemask; Bucket *p = Arbucket[index];while (p) { if ((p->h = = h) & (P->nkeylength = = nkeylength)) { RETURN p-> data; } P=p->next;}
PHP indexed array: The index array is our common array, accessed by subscript. For example $arr [0],zend Hashtable internal normalization, for the index type key is also assigned a hash value and Nkeylength (0). The internal member variable nnextfreeelement is the maximum ID currently assigned to it, and automatically adds one after each push. It is this normalization that enables PHP to implement associative and non-associative blending. Because of the specificity of the push operation, the index key in the PHP array order is not determined by the subscript size, but by the push of the successive decision. For example $arr [1] = 2; $arr [2] = 3; For a double type of key,zend Hashtable will treat him as an index key
6. PHP variables
PHP is a weakly typed language that itself does not strictly differentiate between types of variables. PHP does not need to specify a type when declaring a variable. PHP may perform implicit conversions of variable types during a program run. As with other strongly typed languages, you can also display type conversions in your program. PHP variables can be divided into simple types (int, string, bool), collection type (array resource object), and constant (const). All of the above variables are the same structure zval at the bottom.
Zval is another very important data structure in Zend, used to identify and implement PHP variables with the following data structures:
Zval mainly consists of three parts:
Type: Specifies the types described by the variable (integers, strings, arrays, etc.)
Refcount&is_ref: Used to implement reference counting (described later)
Value: The core part that stores the actual data of the variable
Zvalue is the actual data used to hold a variable. Because there are many types to store, Zvalue is a union and thus a weak type is implemented.
The PHP variable type and its actual storage correspondence are as follows:
Is_long-Lvalue
Is_double-Dvalue
Is_array-HT
Is_string-Str
Is_resource-Lvalue
Reference counting is widely used in areas such as memory reclamation, string manipulation, and so on. A variable in PHP is a typical application of reference counting. The reference count of Zval is implemented through member variables is_ref and Ref_count, and by reference counting, multiple variables can share the same data. Avoid the large amount of consumption caused by frequent copying.
When an assignment is performed, Zend points the variable to the same zval while ref_count++, corresponding to the ref_count-1 when the unset operation. Only Ref_count minus 0 o'clock will actually perform the destroy operation. If it is a reference assignment, Zend modifies is_ref to 1.
PHP variables share data by reference counting, so what if you change one of the variable values? When attempting to write a variable, Zend discovers that the variable points to the Zval is shared by more than one variable, it copies a copy of Ref_count 1 zval, and decrements the zval of the original refcount, this process is called "Zval separation." Visible, the copy operation is only performed when there is a write operation, so it is also called copy-on-write (copy on Zend).
For reference variables, the requirement is the opposite of the non-reference type, and the variable that references the assignment must be bundled, and modifying a variable modifies all the bundle variables.
Integer, floating-point number is one of the basic types in PHP and is also a simple variable. for integers and floating-point numbers, the corresponding values are stored directly in the Zvalue. The types are long and double, respectively.
As can be seen from the zvalue structure, for the integer type, and C and other strongly typed languages, PHP is not distinguished from int, unsigned int, long, long long, and so on, for it, there is only one type of integer is long. Thus, it can be seen that in PHP, the value range of integers is determined by the number of compiler bits rather than fixed.
For floating-point numbers, like integers, it does not distinguish between a float and a double but only a double type.
What if the integer range is out of bounds in PHP? This situation is automatically converted to double type, this must be careful, a lot of trick are generated from this.
Like integers, character variables are the underlying type and simple variable in PHP. The Zvalue structure shows that in PHP, strings are made up of pointers and length structures that point to actual data, which is similar to string in C + +. Since the length is represented by an actual variable, unlike C, its string can be 2 binary data (inclusive), while in PHP, the string length strlen is an O (1) operation.
PHP will reallocate memory to generate new strings when adding, modifying, appending string operations. Finally, for security reasons, PHP will still add the end of a string when it is generated
Common string stitching method and speed comparison:
Suppose there are 4 variables: $strA = ' 123 '; $strB = ' 456 '; $intA = 123; intb=456;
Now for the following several string stitching method to do a comparison and description:
$res = $strA. $strB and $res = "$strA $strb"
In this case, Zend will re-malloc a piece of memory and handle it accordingly, at a speed generally
$strA = $strA. $strB
This is the fastest, Zend will be directly relloc on the current stra basis, to avoid duplicate copies
$res = $intA. $intB
This is slower, because implicit format conversion is required, and the actual program should be careful to avoid
$strA = sprintf ("%s%s", $strA. $strB);
This is the slowest way, because sprintf is not a language structure in PHP, it takes more time to recognize and process the format itself, and the mechanism itself is malloc. However, the sprintf is the most readable, in practice can be flexibly selected according to the specific circumstances.
The PHP array is naturally implemented by Zend Hashtable.
How is the foreach operation implemented? A foreach to an array is done by traversing a doubly linked list in the Hashtable. For indexed arrays, the foreach traversal efficiency is much higher than for a for, eliminating the Key->value lookup. The count operation calls the Hashtable->numofelements,o (1) operation directly. For a string such as ' 123 ', the Zend is converted to its integer form. $arr [' 123 '] and $arr[123] are equivalent
The resource type variable is one of the most complex variables in PHP and is a composite structure.
PHP's Zval can represent a wide range of data types, but it is difficult to adequately describe the custom data types. Because there is no effective way to depict these composite structures, there is no way to use traditional operators for them. To solve this problem, you only need to refer to pointers through an inherently arbitrary identifier (label), which is called a resource.
In Zval, for Resource,lval to be used as a pointer, point directly to the address where the resource resides. Resource can be any composite structure, we are familiar with mysqli, Fsock, memcached, etc. are resources.
How to use resources:
Registration: For a custom data type, you want to use it as a resource. Registration is required first, and Zend assigns it a globally unique label.
Get a resource variable: for a resource, Zend maintains a hash_tale that id-> the actual data. For a resource, only its ID is recorded in Zval. Fetch is returned by the ID in hash_table to find the specific value.
Resource destruction: The data types of the resources are varied. Zend itself has no way of destroying it. Therefore, the user is required to provide the destruction function when registering the resource. When a resource is unset, Zend calls the corresponding function to complete the destructor. Remove it from the global Resource table at the same time.
A resource can reside for a long time, not just after all the variables referencing it go out of scope, even after a request has ended and a new request has been made. These resources are called persistent resources because they persist throughout the life cycle of the SAPI, unless specifically destroyed. In many cases, the persistence of resources can improve performance to some extent. For example, our common mysql_pconnect, persistent resources allocate memory through PEMALLOC, which is not released at the end of the request.
For Zend, there is no distinction between the two.
How are local variables and global variables implemented in PHP? For a request, PHP can see two symbol tables (symbol_table and active_symbol_table) at any time, where the former is used to maintain global variables. The latter is a pointer to the currently active variable symbol table, and when the program enters a function, Zend assigns it a symbol table x and points active_symbol_table to a. The distinction between global and local variables is realized in this way.
Gets the value of the variable: the PHP symbol table is implemented through hash_table, assigning a unique identifier to each variable, and retrieving the corresponding zval from the table when it gets returned.
Global variables are used in functions: In functions, we can use global variables by explicitly declaring global. A reference to a variable with the same name in Active_symbol_table is created in the symbol_table and is created if there is no variable with the same name in Symbol_table.