PHP is simple, but it is not easy to be proficient. In addition to using it, we also need to know its underlying working principles. PHP is a dynamic language suitable for web development. Specifically, it is a software framework that uses C language to implement a large number of components. In a more narrow sense, we can regard it as a powerful UI framework. What is the purpose of PHP underlying implementation? A dynamic language should be like a good one. First, you must understand it, memory management, and box.
PHP is simple, but it is not easy to be proficient. In addition to using it, we also need to know its underlying working principles.
PHP is a dynamic language suitable for web development. Specifically, it is a software framework that uses C language to implement a large number of components. In a more narrow sense, we can regard it as a powerful UI framework.
What is the purpose of PHP underlying implementation? A dynamic language should be like a good one. First, we need to understand it. memory management and framework models are worth learning from. more and more powerful functions can be achieved through extended development to optimize the performance of our programs.
1. PHP design concept and features
- Multi-process model: because PHP is a multi-process model, different requests do not interfere with each other. This ensures that failure of a request will not affect the overall service. of course, with the development of the times, PHP already supports multi-threaded models.
- Weak language: unlike C/C ++, Java, and C #, PHP is a weak language. The type of a variable is not determined at the beginning. implicit or explicit type conversion may occur only during running. the flexibility of this mechanism is very convenient and efficient in web development, the details will be detailed in the following PHP variables.
- The engine (Zend) + component (ext) mode reduces internal coupling.
- The middle layer (sapi) isolates web server and PHP.
- The syntax is simple and flexible, and there are not many specifications. Disadvantages lead to mixed styles, but even worse programmers won't write programs that are too outrageous to harm the whole world.
2. four-layer PHP System
The core architecture of PHP is as follows:
From the figure, we can see that PHP is a layer-4 system from bottom to top:
- Zend Engine: Zend is implemented in pure C and is the kernel part of PHP. it translates PHP code (lexical, syntax parsing, and other compilation processes) it can process and implement corresponding processing methods, implement basic data structures (such as hashtable and oo), allocate and manage memory, and provide corresponding api methods for external calls, is the core of everything, and all peripheral functions are implemented around Zend.
- Extensions: Around the Zend Engine, extensions provides various basic services in a component-based manner. our common built-in functions (such as the array series) and standard libraries are implemented through extension, you can also implement your own extension as needed to achieve function expansion and performance optimization (for example, the PHP middle layer being used by the Post Bar and rich text parsing are typical applications of extension ).
- Sapi: the full name of Sapi is Server Application Programming Interface, that is, the Server Application Programming Interface. Sapi enables PHP to interact with peripheral data through a series of hook functions, this is a very elegant and successful PHP design. through sapi, PHP itself is successfully decoupled from upper-layer applications. PHP can no longer consider how to be compatible with different applications, applications can also implement different processing methods based on their own characteristics.
- Upper-layer applications: this is the PHP program we usually write and various application modes are obtained through different sapi methods, for example, you can use webserver to implement web applications and run them in script mode under the command line.
If PHP is a car, the framework of the car is PHP itself. Zend is the engine of the car. the components below Ext are the wheel of the car. Sapi can be seen as a road, A car can run on different types of roads, and a PHP program is executed to run the car on the road. Therefore, we need an engine with excellent performance + a proper wheel + a correct runway.
3. Sapi
As mentioned above, Sapi enables external applications to exchange data with PHP through a series of interfaces and implement specific processing methods based on different application features. some common SAPIs include:
- Apache2handler: apache is used as the webserver and the processing method used in mod_PHP mode is also the most widely used one.
- Cgi: this is another direct interaction method between webserver and PHP, namely the famous fastcgi protocol. more and more applications have been made in fastcgi + PHP this year, it is also the only method supported by asynchronous webserver.
- Cli: application mode of command line call
4. PHP execution process & opcode
Let's take a look at the PHP code execution process.
As shown in the figure, PHP implements a typical dynamic language execution process: after a piece of code is obtained, after the lexical parsing, syntax parsing, and other stages, the source program will be translated into commands (opcodes), and then the ZEND virtual machine will execute these commands sequentially to complete the operation. PHP itself is implemented using C, so all the functions that are finally called are C functions. In fact, we can regard PHP as a software developed by C.
The core of PHP execution is a translated command, namely opcode.
Opcode is the most basic unit for PHP program execution. An opcode consists of two parameters (op1, op2), return value, and processing function. PHP programs are eventually translated into a group of opcode processing functions for sequential execution.
Several common processing functions:
ZEND_ASSIGN_SPEC_CV_CV_HANDLER: variable allocation ($ a = $ B) ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER: function call ZEND_CONCAT_SPEC_CV_CV_HANDLER: string concatenation? $ A. $ bZEND_ADD_SPEC_CV_CONST_HANDLER: addition operation $ a + 2ZEND_IS_EQUAL_SPEC_CV_CONST: equals? $ A = 1ZEND_IS_IDENTICAL_SPEC_CV_CONST: equals? $ A = 1
5. HashTable-core data structure
HashTable is the core data structure of zend. it is used in PHP to implement almost all common functions. we know that PHP arrays are typical applications. In addition, within zend, function symbol tables and global variables are also implemented based on hash tables.
PHP hash table has the following features:
- Supports typical key-> value queries
- Can be used as an array
- Adding or deleting nodes is O (1) complexity
- Key supports the hybrid type: the associated number composite index array exists at the same time.
- Value supports the mixed type: array ("string", 2332)
- Linear traversal is supported, such as foreach.
Zend hash table implements a typical hash table hash structure. It also adds a two-way linked list to provide forward and reverse array traversal functions. Its structure is as follows:
As you can see, in the hash table, there are both hash structures in the key-> value form and two-way linked list mode, which makes it very convenient to support fast search and linear traversal.
- Hash structure: the Zend hash structure is a typical hash table model that uses a linked list to resolve conflicts. Note that the zend hash table is a self-increasing data structure. when the number of hash tables is full, it will dynamically expand and refresh the element location by 2x. The initial size is 8. In addition, zend itself has made some optimizations during key-> value quick search to speed up the process by changing the space for time. For example, a variable nKeyLength is used in each element to identify the key length for quick determination.
- Bidirectional linked list: Zend hash table uses a linked list structure to implement linear traversal of elements. Theoretically, it is enough to use a one-way linked list for traversal. the purpose of using a two-way linked list is to quickly delete and avoid traversing. Zend hash table is a composite structure. when used as an array, it supports common associated arrays and can be used as sequential index numbers, or even a mixture of two.
- PHP join array: join array is a typical hash_table application. A query process goes through the following steps (the code shows that this is a common hash query process and adds some quick determination to accelerate the search .) :
getKeyHashValue h;index = n & nTableMask;Bucket *p = arBucket[index];while(p) { if((p->h == h) && (p->nKeyLength == nKeyLength)) {????? RETURN p->data;??????}????p=p->next;}RETURN FALTURE;
- PHP index array: index array is a common array that is accessed by subscript. For example, $ arr [0] and Zend HashTable are normalized internally. the hash value and nKeyLength (0) are also allocated to the index key ). The inner member variable nNextFreeElement is the largest id currently allocated. it is automatically added after each push. Only in this way can PHP achieve a mixture of association and non-association. Due to the special nature of the push operation, the order of index keys in the PHP array is not determined by the subscript size, but by the push sequence. For example, $ arr [1] = 2; $ arr [2] = 3; for keys of the double type, Zend HashTable treats them as index keys.
6. PHP variables
PHP is a weak type language and does not strictly distinguish the type of variables. PHP does not need to specify the type when declaring variables. PHP may implicitly convert the variable type during the program running. Similar to other strong-type languages, the program can also convert the displayed type. PHP variables can be divided into simple types (int, string, bool), set type (array resource object), and constant (const ). All the above variables are in the same structure zval at the underlying layer.
Zval is another important data structure in zend. it is used to identify and implement PHP variables. its data structure is as follows:
Zval consists of three parts:
- Type: specifies the type (integer, string, array, etc.) described in the variable)
- Refcount & is_ref: used to implement reference count (details later)
- Value: The core part that stores the actual data of the variable.
Zvalue is used to save the actual data of a variable. Due to the need to store multiple types, zvalue is a union, which also achieves a weak type.
The PHP variable type corresponds to its actual storage as follows:
IS_LONG???-> lvalueIS_DOUBLE?-> dvalueIS_ARRAY??-> htIS_STRING?-> strIS_RESOURCE?-> lvalue
Reference counting is widely used in memory Collection, string operations, and other places. Variables in PHP are typical applications of reference counting. The reference count of Zval is implemented by the member variables is_ref and ref_count. by reference counting, multiple variables can share the same data. Avoid high consumption caused by frequent copies.
During the value assignment operation, zend points the variable to the same zval at the same time ref_count ++, the corresponding ref_count-1 during the unset operation. The destroy operation is performed only when the value of ref_count is reduced to 0. If a value is referenced, zend modifies is_ref to 1.
If the PHP variable shares data through reference counting, what if it changes the value of one of the variables? When Zend tries to write a variable, if zval pointed to by the variable is shared by multiple variables, it will copy a zval with ref_count as 1 and decrease the refcount of the original zval, this process is called "zval separation ". It can be seen that zend performs the copy operation only when a write operation occurs. Therefore, it is also called copy-on-write (copy at write time)
For referenced variables, the requirement is different from that for non-referenced variables. variables that reference values must be bundled. modifying a variable modifies all bound variables.
Integers and floating-point numbers are one of the basic types in PHP and are also simple variables. For integers and floating-point numbers, the corresponding values are directly stored in zvalue. The data types are long and double.
From the zvalue structure, we can see that for integer types, PHP does not distinguish int, unsigned int, long, long, and other types from strong language such as c. for it, an integer has only one type, that is, long. From this, we can see that in PHP, the value range of an integer is determined by the number of digits of the compiler, rather than being fixed.
For floating-point numbers, similar to integers, it does not distinguish float from double, but only has the double type.
In PHP, what should I do if the integer range is out of bounds? In this case, it will be automatically converted to the double type. this must be noted that many trick are generated from this.
Like integers, character variables are also basic and simple variables in PHP. The zvalue structure shows that in PHP, strings are composed of pointers and length structs pointing to actual data, which is similar to strings in c ++. Because an actual variable represents the length, different from c, its string can be binary data (including \ 0), and in PHP, evaluate the string length. strlen is an O (1) operation.
When adding, modifying, and appending strings, PHP will re-allocate the memory to generate new strings. Finally, for security reasons, \ 0 is still added at the end of a string generated by PHP.
Comparison of common string concatenation methods and speed:
Suppose there are four variables: $ strA = '000000'; $ strB = '000000'; $ intA = 123; intB = 456;
Now we will compare and describe the following string concatenation methods:
$ Res? =? $ StrA. $ strB and $ res? = "$ StrA $ strB"
In this case, zend will re-malloc a piece of memory and perform corresponding processing, the speed is normal
$strA?=?$strA.$strB
This is the fastest speed. zend will directly relloc the current strA to avoid duplicate copies.
$res?=?$intA.$intB
This speed is slow. because implicit format conversion is required, you should avoid it as much as possible in actual programming.
$strA?= sprintf (“%s%s”,$strA.$strB);
This is the slowest way, because sprintf is not a language structure in PHP, it takes a lot of time for format recognition and processing, and the mechanism is also malloc. However, the sprintf method is the most readable and can be flexibly selected based on actual conditions.
PHP arrays are implemented by Zend HashTable.
How to implement the foreach operation? Foreach of an array is done by traversing the two-way linked list in hashtable. For index arrays, the efficiency of using foreach traversal is much higher than for, saving the key-> value Search. The count operation directly calls the HashTable-> NumOfElements, O (1) operation. For a string like '123', zend is converted to its integer form. $ Arr ['123456'] is equivalent to $ arr [123 ].
Resource Type variables are the most complex and composite variables in PHP.
PHP's zval can represent a wide range of data types, but it is difficult to fully describe the custom data types. Since there is no effective way to describe these composite structures, there is no way to use traditional operators for them. To solve this problem, you only need to reference the pointer through a essentially arbitrary identifier (label), which is called a resource.
In zval, resource and lval are used as pointers and direct to the address of the resource. Resource can be any compound structure. the familiar mysqli, fsock, and memcached are all resources.
How to use resources:
- Registration: for a custom data type, you need to use it as a resource. First, you need to register it. zend will assign it a globally unique identifier.
- Get a resource variable: for resources, zend maintains a hash_tale id-> actual data. For a resource, only its id is recorded in zval. When fetch is performed, the specific value is found in hash_table by id and returned.
- Resource destruction: the data types of resources are diverse. Zend itself cannot destroy it. Therefore, you must provide the destroy function when registering resources. When a resource is unset, zend calls the corresponding function to complete the structure. Delete it from the global resource table.
Resources can reside for a long time, not only after all the variables that reference it are out of scope, or even after a request is completed and new requests are generated. These resources are referred to as persistent resources because they continuously exist throughout the SAPI lifecycle, unless they are specially destroyed. In many cases, persistent resources can improve performance to a certain extent. For example, in mysql_pconnect, persistent resources are allocated with memory through pemalloc, so that they will not be released at the end of the request. For zend, there is no distinction between the two.
How are local and global variables implemented in PHP? For a request, PHP can see two symbol tables (symbol_table and active_symbol_table) at any time. The former is used to maintain global variables. The latter is a pointer pointing to the currently active variable symbol table. when the program enters a function, zend assigns it a symbol table x and points active_symbol_table to. Global and local variables are distinguished in this way.
Get variable value: the PHP symbol table is implemented through hash_table. a unique identifier is assigned to each variable. the corresponding zval is returned from the table based on the identifier.
Use global variables in the function: in the function, we can use global variables by explicitly declaring global. Create a reference for the variable with the same name in symbol_table in active_symbol_table. If no variable with the same name exists in symbol_table, it is created first.