I. basic knowledge this chapter briefly introduces some internal mechanisms of the Zend Engine, which is closely related to Extensions and can also help us write more efficient PHP code. 1.1PHP variable storage 1.1.1zval structure Zend uses the zval structure to store PHP variable values. The structure is as follows: typedefunion_zvalue_value {long
I. basic knowledge
This chapter briefly introduces some internal mechanisms of the Zend Engine, which are closely related to Extensions and can also help us write more efficient PHP code.
1.1 storage of PHP variables
1.1.1 zval structure
Zend uses the zval structure to store the values of PHP variables. The structure is as follows:
- Typedef union _ zvalue_value {
- Long lval;/* long value */
- Double dval;/* double value */
- Struct {
- Char * val;
- Int len;
- } Str;
- HashTable * ht;/* hash table value */
- Zend_object_value obj;
- } Zvalue_value;
-
- Struct _ zval_struct {
- /* Variable information */
- Zvalue_value value;/* value */
- Zend_uint refcount;
- Zend_uchar type;/* active type */
- Zend_uchar is_ref;
- };
-
- Typedef struct _ zval_struct zval;
- "More-597"> Zend determines which member to access the value based on the type value. The available values are as follows:
IS_NULLN/
IS_LONG corresponds to value. lval
IS_DOUBLE corresponds to value. dval
IS_STRING corresponds to value. str
IS_ARRAY corresponds to value. ht
IS_OBJECT corresponds to value. obj
IS_BOOL corresponds to value. lval.
IS_RESOURCE corresponds to value. lval
Based on this table, we can find two interesting points: first, the PHP array is actually a HashTable, which explains why PHP supports correlated arrays; second, Resource is a long value, it usually stores a pointer, an internal array index, or something that only the creator knows. it can be considered as a handle.
1.1.1 reference count
The reference count is widely used in garbage collection, memory pool, and string. Zend implements a typical reference count. Multiple PHP variables can share the same zval by referencing the counter mechanism. The remaining two members of zval, is_ref and refcount, are used to support this sharing.
Obviously, refcount is used for counting. when the reference is increased or decreased, this value also increases and decreases accordingly. once it is reduced to zero, Zend will reclaim the zval.
What about is_ref?
1.1.2 zval status
In PHP, there are two types of variables: reference and non-reference. they are stored in Zend by reference count. For non-referenced variables, variables must be unrelated. when you modify a variable, other variables cannot be affected, this conflict can be solved using the Copy-On-Write mechanism. if Zend finds that the zval to which the variable points is shared by multiple variables, copy a zval whose refcount is 1 and decrease the refcount of the original zval. this process is called "zval separation ". However, for referenced variables, the requirements are different from those for non-referenced variables. variables that reference values must be bundled. if you modify a variable, all bound variables are modified.
It can be seen that it is necessary to point out the current zval status to cope with these two situations separately. is_ref is for this purpose, it indicates whether all the variables pointing to the zval currently adopt reference assignment -- either all references or none. Modify another variable. Zend will execute Copy-On-Write only when the is_ref of zval is 0, that is, it is not referenced.
1.1.3 zval status switch
When all the value assignment operations on a zval are referenced or all are non-referenced, an is_ref should be sufficient. However, the world will never be so beautiful, and PHP cannot impose such restrictions on users. when we use both reference and non-reference assignment values, we must deal with them in particular.
I. check the following PHP code:
-
The entire process is as follows:
The first three sentences of this code point a, B, and c to a zval. is_ref = 1, refcount = 3; the fourth sentence is a non-reference value assignment, generally, you only need to increase the reference count. However, the target zval is a reference variable. simply adding the reference count is obviously incorrect, the Zend solution is to generate a zval copy for d separately.
The entire process is as follows:
1.1.1 Parameter transfer
The transfer of PHP function parameters is the same as that of variable assignment. non-reference transfer is equivalent to non-reference assignment. Reference transfer is equivalent to reference assignment, and may also lead to the execution of zval status switching. This will be mentioned later.
1.2 HashTable structure
HashTable is the most important and widely used data structure in the Zend Engine. it is used to store almost everything.
1.1.1 Data structure
The HashTable data structure is defined as follows:
- Typedef struct bucket {
- Ulong h; // stores the hash
- Uint nKeyLength;
- Void * pData; // point to value, which is a copy of user data.
- Void * pDataPtr;
- Struct bucket * pListNext; // pListNext and pListLast
- Struct bucket * pListLast; // double-stranded table of the entire HashTable
- Struct bucket * pNext; // pNext and pLast are used to form a hash
- Struct bucket * pLast; // double-stranded table
- Char arKey [1]; // key
- } Bucket;
-
- Typedef struct _ hashtable {
- Uint nTableSize;
- Uint nTableMask;
- Uint nNumOfElements;
- Ulong nNextFreeElement;
- Bucket * pInternalPointer;/* Used for element traversal */
- Bucket * pListHead;
- Bucket * pListTail;
- Bucket ** arBuckets; // hash array
- Dtor_func_t pDestructor; // This parameter is specified during HashTable initialization. it is called when the Bucket is destroyed.
- Zend_bool persistent; // whether to use the C memory allocation routine
- Unsigned char nApplyCount;
- Zend_bool bApplyProtection;
- # If ZEND_DEBUG
- Int inconsistent;
- # Endif
- } HashTable;
In general, Zend HashTable is a kind of linked list hash, and also optimized for linear traversal, as shown in the figure below:
HashTable contains two data structures: a linked list hash and a two-way linked list. The former is used for fast key-value query, and the latter is convenient for linear traversal and sorting, A Bucket is stored in both data structures.
Several explanations about the data structure:
L Why are two-way linked lists used in linked list hashes?
In general, you only need to perform operations by key for the hash of the linked list, but only a single-chain table is required. However, Zend sometimes needs to delete a given Bucket from the hash of the linked list, which can be very efficient with double-stranded tables.
L What does nTableMask do?
This value is used to convert the hash value to the lower mark of the arBuckets array. When a HashTable is initialized, Zend first allocates the nTableSize memory to the arBuckets array. the nTableSize value is 2 ^ n, that is, 10 * of the binary value, which is not less than the minimum size specified by the user *. NTableMask = nTableSize-1, that is, the binary 01 *. at this time, h & nTableMask falls into [0, nTableSize-1], and Zend uses it as the index to access the arBuckets array.
L What is pDataPtr?
Generally, when you insert a key-value pair, Zend copies the value and points pData to the value copy. The replication operation needs to call the Zend internal routine emalloc to allocate memory, which is a very time-consuming operation and consumes a larger part of memory than the value (the extra memory is used to store cookies ), if the value is small, it will cause a great waste. Considering that HashTable is mostly used to store pointer values, Zend introduces pDataPtr. when the value is as long as the pointer, Zend directly copies it to pDataPtr and points pData to pDataPtr. This avoids the emalloc operation and improves the Cache hit rate.
Why is the size of arKey only 1? Why not use pointers to manage keys?
ArKey is an array of keys, but its size is only 1, not enough to put down the key. The following code can be found in the initialization function of HashTable:
1 p = (Bucket *) pemalloc (sizeof (Bucket)-1 + nKeyLength, ht-> persistent );
It can be seen that Zend allocates enough memory for a Bucket to put itself and key down,
L The upper part is the Bucket, the lower part is the key, and the arKey "exactly" is the last element of the Bucket, so you can use the arKey to access the key. This method is the most common in memory management routines. when memory is allocated, the memory is actually allocated larger than the specified size. the upper half is usually called a cookie, it stores information about the memory, such as the block size, the last pointer, and the next pointer. this method is used by the baidu Transmit program.
You do not need to use pointers to manage keys to reduce one emalloc operation and increase the Cache hit rate. Another reason is that keys are fixed in most cases and will not be reallocated to the entire Bucket because the keys become longer. This also explains why value is not allocated as an array together -- because value is variable.
1.2.2 PHP array
Another question about HashTable is not answered, that is, what is nNextFreeElement?
Unlike normal hash columns, Zend HashTable allows users to directly specify the hash value, while ignoring the key, or even not specifying the key (in this case, nKeyLength is 0 ). At the same time, HashTable also supports the append operation. you do not need to specify the hash value. you only need to provide the value. in this case, Zend uses nNextFreeElement as the hash, and then increments nNextFreeElement.
This behavior of HashTable looks strange, because it will not be able to access value by key, and it is no longer a hash. The key to understanding the problem is that the PHP array is implemented using HashTable-the associated array uses a normal k-v ING to add elements to HashTable, and its key is the user-specified string; non-associated arrays directly use the array subscript as the hash value, and no key exists. When an array uses Association and non-association in combination, or array_push, nNextFreeElement is required.
Let's take a look at the value. The value of the PHP array directly uses the general structure zval. pData points to zval *. according to the introduction in the previous section, this zval * will be directly stored in pDataPtr. Because zval is used directly, the array elements can be of any PHP type.
Array traversal operations, such as foreach and each, are performed through the HashTable two-way linked list. pInternalPointer records the current position as the cursor.
1.2.3 Variable symbol table
In addition to arrays, HashTable is also used to store many other data, such as PHP functions, variable symbols, loaded modules, and class members.
A variable symbol table is equivalent to an associated array. its key is the variable name (obviously, it is not a good idea to use a long variable name), and the value is zval *.
At any time, the PHP code can see two variable symbol tables -- symbol_table and active_symbol_table -- the former is used to store global variables, known as the global symbol table; the latter is a pointer, point to the currently active variable symbol table, which is generally a global symbol table. However, every time you enter a PHP function (this refers to a function created using PHP code), Zend creates a variable symbol table for the function, and point active_symbol_table to the local symbol table. Zend always uses active_symbol_table to access variables, thus implementing scope control of local variables.
However, if the function is locally accessed with a variable marked as global, Zend will perform special processing -- create a reference to the variable with the same name in symbol_table in active_symbol_table. If no variable with the same name exists in symbol_table, it will be created first.
1.3 memory and files
Resources owned by programs generally include memory and files. for common programs, these resources are process-oriented. when a process ends, the operating system or library C automatically recycles resources that we did not explicitly release.
However, the PHP program has its own particularity. it is based on pages. when a page is running, it also applies for resources such as memory or files. However, when the page is running, the operating system or C library may not know the need for resource recovery. For example, we compile php into apache as a module and run apache in prefork or worker mode. In this case, apache processes or threads are reused. the memory allocated to the php page will remain in the memory until the core is output.
To solve this problem, Zend provides a set of memory allocation APIs, which serve the same purpose as the corresponding functions in C. The difference is that these functions allocate memory from Zend's memory pool, and they can be automatically recycled based on the page. In our module, these APIs should be used for the memory allocated to the page, instead of the C routine. otherwise, Zend will try efree our memory at the end of the page, and the result is usually crush.
Emalloc ()
Efree ()
Estrdup ()
Estrndup ()
Ecalloc ()
Erealloc ()
In addition, Zend also provides a set of macros, such as VCWD_xxx, to replace the C library and the corresponding file APIs of the operating system. These macros can support PHP virtual working directories, always use them in module code. For specific macro definition, see PHP source code "TSRM/tsrm_virtual_cwd.h ". You may notice that the close operation is not provided in all the macros because the close object is an opened resource and does not involve the file path. Therefore, you can directly use the C or operating system routine; similarly, operations such as read/write also directly use C or operating system routines.