Speaking of PHP's operating mechanism to introduce the PHP module, PHP has a total of three modules: the kernel, Zend engine, as well as the expansion layer, the PHP kernel is used to handle requests, file flow, error handling and other related operations, Zend Engine (ZE) to convert the source file into machine language, Then run it on the virtual machine; the extension layer is a set of functions, class libraries, and streams that PHP uses to perform certain actions. For example, we need a MySQL extension to connect to the MySQL database, and when Ze executes the program it may be necessary to connect several extensions, when Ze gives control to the extension, and so on, and then returns it after processing a particular task;
Finally, ZE returns the results of the program to the PHP kernel, which then transmits the results to the SAPI layer and eventually outputs it to the browser.
PHP is easy to say, but mastering is not a simple thing to do. Besides being used, we have to know how it works at the bottom.
PHP is a dynamic language suitable for web development. Specifically, it is a software framework that contains a large number of components in C language. More narrowly, you can think of it as a powerful UI framework.
What is the purpose of understanding the underlying implementation of PHP? Dynamic language to be like a good first to understand it, memory management, framework model is worthy of our reference, through the expansion of development to achieve more powerful functions, optimize the performance of our programs.
1. PHP Design concept and characteristics
Multi-process Model: Because PHP is a multi-process model, different requests between non-interference, so as to ensure that a request to hang out will not affect the overall service, of course, with the development of the Times, PHP has already supported the multithreaded model.
Weakly typed language: Unlike languages like C/S, Java, and C #, PHP is a weakly typed language. The type of a variable is not always fixed at the outset, and it is possible that implicit or explicit type conversions may occur in the runtime, and the flexibility of this mechanism is very convenient and efficient in web development and is detailed in later PHP variables.
Engine (Zend) + component (EXT) mode reduces internal coupling.
The middle tier (SAPI) isolates Web servers and PHP.
The syntax is simple and flexible, without too many specifications. Weaknesses lead to mixed styles, but poor programmers do not write programs that are too outrageous to jeopardize the overall situation.
2. PHP's four-tier system
The core architecture of PHP is the following diagram:
As you can see from the diagram, PHP is a 4-tier system from bottom to top:
Zend Engine: Zend Whole with pure C implementation, is the kernel part of PHP, it will PHP code translation (lexical, parsing and so on a series of compiling process) for the executable opcode processing and implementation of the corresponding processing methods, the implementation of the basic data structure (such as Hashtable, OO), Memory allocation and management, provides the corresponding API method for external call, is the core of all, all the peripheral functions around Zend implementation.
Extensions: Around the Zend Engine, Extensions provides a variety of basic services through a modular approach, our common built-in functions (such as array series), standard libraries, etc. are implemented through the extension, Users can also implement their own extension to achieve functional expansion, performance optimization and other purposes (such as paste is using the PHP middle tier, Rich text parsing is the typical application of extension).
SAPI:SAPI full name is the server application programming Interface, which is the service-side application programming interface, SAPI through a series of hook functions, so that PHP can interact with the perimeter data, which is a very elegant and successful PHP design, By SAPI the PHP itself and the top application decoupling isolation, PHP can no longer consider how to be compatible with different applications, and the application itself can be implemented in accordance with their own characteristics of different ways.
Upper application: This is our usual PHP program, through different sapi ways to get a variety of application patterns, such as through the webserver to implement Web applications, the command line to run the script and so on.
If PHP is a car, then the frame of the car is PHP itself, Zend is the engine (engine), ext below the various components are the wheels of the car, SAPI can be seen as a road, the car can run on different types of highways, and a PHP program is the implementation of the car running on the road. Therefore, we need: excellent performance of the engine + the right wheels + the right runway.
3. Sapi
As mentioned earlier, SAPI through a series of interfaces so that external applications can exchange data with PHP and can implement specific processing methods according to different application characteristics, some of our common SAPI are:
Apache2handler: This is the use of Apache as a webserver, using the mod_php mode of operation, is now the most widely used.
CGI: This is another direct interaction between Webserver and PHP, the famous fastcgi protocol, which has been increasingly used fastcgi+php this year, and is the only way to support asynchronous webserver.
CLI: Application Mode for command-line invocation
4. PHP's execution process &opcode
Let's take a look at the process through which the PHP code executes.
As you can see from the diagram, PHP implements a typical dynamic language execution process: After getting a piece of code, after the lexical parsing, parsing, and so on, the source program will be translated into one instruction (opcodes), and then zend the virtual machine in sequence to perform these instructions to complete the operation. PHP itself is implemented in C, so the final call is also the function of C, in fact, we can think of PHP as a C developed software.
The core of PHP's execution is the translation of a single instruction, or opcode.
OpCode is the most basic unit of PHP program execution. A opcode consists of two parameters (OP1,OP2), return values, and processing functions. PHP programs are ultimately translated into sequential execution of a set of opcode processing functions.
A few common processing functions:
Zend_assign_spec_cv_cv_handler: Variable Assignment ($a = $b)
Zend_do_fcall_by_name_spec_handler: Function call
Zend_concat_spec_cv_cv_handler: string concatenation $a. $b
Zend_add_spec_cv_const_handler: addition operation $a +2
Zend_is_equal_spec_cv_const: Judgment equal $a ==1
Zend_is_identical_spec_cv_const: Judgment equal $a ===1
5. hashtable-Core Data structure
Hashtable is the core data structure of Zend, in PHP almost used to implement all the common functions, we know that the PHP array is its typical application, in addition, in the Zend interior, such as function symbol table, global variables are also based on hash table to achieve.
PHP's hash table has the following features:
Support for typical key->value queries
Can be used as an array
Add, delete node is O (1) complexity
Key supports mixed types: Simultaneous presence of associative index arrays
Value supports mixed types: Array ("string", 2332)
Supports linear traversal: such as foreach
Zend Hash table implements a typical hash form structure, and provides a forward and backward traversal array function by attaching a bidirectional linked list. The structure of the diagram is as follows:
As you can see, in the hash table, both the Key->value form of the hash structure, as well as bidirectional linked list mode, so that it can easily support fast lookup and linear traversal.
Hash structure: The hash structure of Zend is a typical hash table model, which solves the conflict by the way of linked list. Note that the Zend hash table is a growing data structure that, when the hash table is full, dynamically expands and places the new element in twice-fold mode. The initial size is 8. In addition, in the Key->value fast lookup, the Zend itself has done some optimization, through the space change time to speed up the way. For example, in each element, a variable nkeylength is used to identify the length of the key for quick determination.
Two-way linked list: Zend hash table The linear traversal of the elements is achieved through a linked list structure. Theoretically, do traversal using one-way linked list is enough, the reason for the use of two-way linked lists, the main purpose is to quickly delete, to avoid traversal. Zend hash table is a composite structure that, when used as an array, supports common associative arrays and can be used as sequential index numbers, and even allows for a mixture of 2.
PHP associative arrays: Associative arrays are typical hash_table applications. Once the query process is followed by a few steps (as can be seen from the code), this is a common hash query process and adds some quick decision acceleration lookups. ):
Getkeyhashvalue h;
Index = n & ntablemask;
Bucket *p = Arbucket[index];
while (p) {
if ((p->h = h) & (P->nkeylength = = nkeylength)) {return
p->data;
}
p=p->next;
}
PHP indexed array: An array of indices is our common array, accessed by subscript. For example $arr [0],zend Hashtable is normalized internally, and the hash value and Nkeylength (0) are equally assigned to the index type key. The internal member variable nnextfreeelement is the maximum ID currently assigned, and automatically adds one after each push. It is this normalization process that allows PHP to implement associative and unrelated blending. Because of the particularity of the push operation, the sequence of index key in the PHP array is not determined by the subscript size, but by the push decision. For example $arr [1] = 2; $arr [2] = 3; for a double type key,zend Hashtable will treat him as an index key
6. PHP variables
PHP is a weakly typed language, and it does not strictly differentiate between types of variables. PHP does not need to specify a type when declaring a variable. PHP may perform an implicit conversion of the variable type during the run of the program. As with other strongly typed languages, type conversions can also be displayed in programs. PHP variables can be grouped into simple types (int, string, bool), collection type (array resource object), and constants (const). All of the above variables are the same structure zval at the bottom.
Zval is another very important data structure in Zend that identifies and implements PHP variables with data structures as follows:
Zval mainly consists of three parts:
Type: Specifies the types described by the variable (integers, strings, arrays, and so on)
Refcount&is_ref: Used to implement reference count (described later)
Value: The core section, which stores the actual data for a variable
Zvalue is the actual data that is used to hold a variable. Because there are many types to store, the Zvalue is a union and thus the weak type is implemented.
The PHP variable type and its actual storage correspond to the following:
Is_long-> Lvalue
Is_double-> Dvalue
Is_array-> HT
Is_string-> Str
Is_resource-> Lvalue
Reference counting is widely used in such places as memory recycling, string manipulation, and so on. Variables in PHP are typical applications of reference counting. Zval reference counts are implemented through member variables is_ref and Ref_count, and multiple variables can share the same data by reference counting. Avoid the large amount of consumption that is caused by frequent copies.
When an assignment is performed, the Zend points the variable to the same zval and ref_count++, and the corresponding ref_count-1 when the unset operation occurs. Only Ref_count minus 0 o'clock will actually perform the destroy operation. If it is a reference assignment, then Zend modifies is_ref to 1.
PHP variables share data by reference counting, so what if you change the value of one of these variables? When an attempt is made to write a variable, Zend finds that the variable points to a zval that is shared by multiple variables, copies a zval of Ref_count 1 and decrements the zval of the original RefCount, a process known as "Zval separation." Visible, only when there is a write operation Zend copy operation, so also known as Copy-on-write (write-time copy)
For reference variables, in contrast to the unreferenced type, the variable referencing the assignment must be bundled, and modifying a variable modifies all the bundle variables.
Integers, floating-point numbers are one of the underlying types in PHP and are also a simple variable. For integers and floating-point numbers, the corresponding values are stored directly in the Zvalue. The types are long and double respectively.
As you can see from the zvalue structure, for an integer type, unlike a strongly typed language such as C, PHP does not distinguish between types such as int, unsigned int, long, long, and for it, there is only one type of integer that is long. From this, it can be seen that in PHP, the integer value range is determined by the number of compiler bits, rather than fixed invariant.
For floating-point numbers, like integers, it does not distinguish between float and double but only a double type.
In PHP, what if the integer range crosses the line? This is automatically converted to a double type, which must be careful and many trick are generated.
As with integers, character variables are the underlying type and simple variables in PHP. The Zvalue structure shows that in PHP, strings are made up of pointers and length structures that point to actual data, which is similar to string in C + +. Because the length is represented by an actual variable, and C is different, its string can be 2 data (included), while in PHP, the string length strlen is an O (1) operation.
When you add, modify, and append string operations, PHP will reassign memory to generate a new string. Finally, for security reasons, PHP will still be added at the end of the generation of a string
Common string concatenation method and speed comparison:
The following 4 variables are assumed: $strA = ' 123 '; $strB = ' 456 '; $intA = 123; intb=456;
Now make a comparison and description of the following string concatenation methods:
$res = $strA. $strB and $res = "$strA $strb"
In this case, Zend will malloc a piece of memory and handle it accordingly, at a rate that is generally
$strA = $strA. $strB
This is the fastest, Zend will be directly relloc based on the current stra to avoid duplicate copies
$res = $intA. $intB
This is slower because of the need to do implicit format conversion, the actual writing program should also pay attention to avoid
$strA = sprintf ("%s%s", $strA. $strB);
This is the slowest way, because sprintf in PHP is not a language structure, itself for format recognition and processing need to spend more time, and its own mechanism is malloc. However, the sprintf way is the most readable, in practice can be flexibly selected according to the specific circumstances.
The PHP array is implemented naturally through Zend Hashtable.
How does a foreach operation implement? The foreach of an array is accomplished by traversing a two-way list in the Hashtable. For an indexed array, the foreach traversal efficiency is much higher than for, eliminating the Key->value lookup. The count operation invokes the Hashtable->numofelements,o (1) operation directly. For a string such as ' 123 ', Zend is converted to its integer form. $arr [' 123 '] and $arr[123] are equivalent
The resource type variable is one of the most complex variables in PHP and a composite structure.
PHP's Zval can represent a wide range of data types, but it is difficult to fully describe a custom data type. Because there is no effective way to depict these composite structures, there is no way to use traditional operators with them. To solve this problem, you only need to refer to a pointer by an inherently arbitrary identifier (label), which is called a resource.
In Zval, the resource,lval is used as a pointer to directly point to the address where the resource resides. Resource can be arbitrary composite structure, we are familiar with the mysqli, Fsock, memcached and so are resources.
How to use resources:
Registration: For a custom data type, you want to use it as a resource. First you need to register, and Zend assigns it a globally unique label.
Gets a resource variable: for resources, Zend maintains a hash_tale that id-> the actual data. For a resource, only its ID is recorded in the Zval. When the fetch is found, a specific value is returned through the ID in the hash_table.
Resource destruction: The data types of resources are varied. Zend itself had no way of destroying it. Therefore, users are required to provide destruction functions when registering resources. When the resource is unset, Zend calls the corresponding function to complete the destructor. Delete it from the global Resource table at the same time.
A resource can reside for a long time, not just after all variables referencing it are out of scope, even after a request has ended and a new request has been generated. These resources are called persistent resources because they persist throughout the life cycle of the SAPI unless deliberately destroyed. In many cases, persistent resources can improve performance to some extent. For example, our common mysql_pconnect, the persistent resource allocates memory through PEMALLOC, so that it is not released at the end of the request.
For Zend, there is no distinction between the two itself.
How are local variables and global variables implemented in PHP? For a request, PHP can see two symbol tables (symbol_table and active_symbol_table) at any time, in which the former is used to maintain global variables. The latter is a pointer to the currently active variable symbol table, and when the program enters a function, Zend assigns it a symbol table x and points the active_symbol_table to a. In this way, we can realize the distinction between global and local variables.
Get the value of the variable: PHP's symbol table is implemented through the hash_table, for each variable is assigned a unique identification, when obtained according to the identity from the table to find the corresponding zval return.
Use global variables in functions: In functions, we can use global variables by explicitly declaring global. In Active_symbol_table, a reference to a variable of the same name in Symbol_table is created, and if a variable with the same name is not in symbol_table, it is created first.