PHP Kernel Introduction and Extension Development Guide-basic knowledge _php skills

Source: Internet
Author: User
Tags garbage collection php source code zend
First, basic knowledge
This chapter briefly describes the internal mechanisms of some Zend engines, which are closely related to extensions and can help us write more efficient PHP code.
Storage of 1.1 PHP variables
1.1.1 Zval Structure
Zend uses the ZVAL structure to store the values of PHP variables, which are as follows:
Copy Code code as follows:

typedef Union _ZVALUE_VALUE {
Long lval; /* Long Value * *
Double Dval; /* Double Value * *
struct {
Char *val;
int Len;
} str;
HashTable *ht; /* Hash Table value * *
Zend_object_value obj;
} Zvalue_value;
struct _zval_struct {
/* Variable Information * *
Zvalue_value value; /* Value * *
Zend_uint RefCount;
Zend_uchar type; /* Active type */
Zend_uchar Is_ref;
};
typedef struct _ZVAL_STRUCT Zval;
The <span id= "more-597" ></span>zend determines which member of value is accessed based on the type value, and the available values are as follows:

is_nulln/a

Is_long corresponds to Value.lval

Is_double corresponds to Value.dval

Is_string corresponds to Value.str

Is_array corresponds to Value.ht

Is_object corresponds to Value.obj

Is_bool corresponds to Value.lval.

Is_resource corresponds to Value.lval

Here are two interesting places to find: The first is that the PHP array is actually a hashtable, which explains why PHP can support associative arrays, and secondly, resource is a long value that is usually stored in a pointer, The index of an internal array, or something that only the creator knows, can be regarded as a handle

1.1.1 Reference count

Reference counts are widely used in garbage collection, memory pools, and strings, and Zend a typical reference count. Multiple PHP variables can be used to share the remaining two members of the same zval,zval by reference counting mechanism is_ref and refcount to support this sharing.

Obviously, the refcount is used for counting, and when the reference is added or subtracted, the value is incremented and decremented accordingly, and Zend will reclaim the Zval once it is reduced to zero.

So what about Is_ref?

1.1.2 Zval State

In PHP, there are two types of variables--references and unreferenced, which are stored in reference counting in Zend. For unreferenced variables, it requires that the variables are irrelevant, and when modifying a variable, the other variables cannot be affected, and the copy-on-write mechanism is used to resolve the conflict-when an attempt is made to write a variable, Zend finds that the zval that the variable points to is shared by multiple variables, It copies a copy of the zval of RefCount 1 and decrements the refcount of the original Zval, which is called "Zval separation." However, for reference variables, in contrast to the unreferenced type, the variables that reference the assignment must be bundled, and modifying a variable modifies all the bundle variables.

It is obvious that it is necessary to point out the state of the current zval to deal with both cases, and is_ref is to indicate whether all the current variables pointing to the Zval are assigned by reference-either all references or none at all. To modify a variable at this point, the Zend executes copy-on-write only if it finds that its zval is_ref is 0, that is, not a reference.

1.1.3 Zval State Switching

When all assignment operations on a zval are references or are unreferenced, a is_ref is sufficient to handle. However, the world will not be so beautiful, PHP can not make this restriction on users, when we mix the use of reference and unreferenced assignment, we have to do special processing.

Situation I, look like the following PHP code:

<!--p $a = 1; $b = & $a; $c = & $b; $d = $c; Inserts a unreferenced--> in a heap of reference assignments



The whole process looks like this:

The first three sentences of this code will point A, B, and C to a zval, its is_ref=1, refcount=3; the third sentence is an unreferenced assignment, which usually requires only an increase in the reference count, whereas the target zval belongs to the reference variable, and simply increasing the reference count is obviously wrong. The Zend solution is to generate a separate copy of Zval for D.

The whole process looks like this:

1.1.1 Parameter Pass

The transfer of PHP function parameters is the same as the variable assignment, which is equivalent to unreferenced assignment, which is equivalent to a reference assignment, and may also result in the execution of a zval state switch. This will also be mentioned later.

1.2 Hashtable structure

Hashtable is the most important and widely used data structure in the Zend engine, and it is used to store almost everything.

1.1.1 Data structure

The hashtable data structure is defined as follows:
Copy Code code as follows:

typedef struct BUCKET {
ULONG H; Storing hash
UINT Nkeylength;
void *pdata; Point to Value, is a copy of the user's data
void *pdataptr;
struct bucket *plistnext; Plistnext and Plistlast composition
struct bucket *plistlast; The whole Hashtable double linked list
struct bucket *pnext; Pnext and Plast are used to make up a hash correspondence
struct bucket *plast; The double linked list
Char arkey[1]; Key
} Bucket;
typedef struct _HASHTABLE {
UINT Ntablesize;
UINT Ntablemask;
UINT Nnumofelements;
ULONG Nnextfreeelement;
Bucket *pinternalpointer; /* Used for element traversal * *
Bucket *plisthead;
Bucket *plisttail;
Bucket **arbuckets; Hash array
dtor_func_t Pdestructor; Specified when Hashtable is initialized, called when destroying bucket
Zend_bool persistent; Whether to take a C memory allocation routine
unsigned char napplycount;
Zend_bool bapplyprotection;
#if Zend_debug
int inconsistent;
#endif
} HashTable;

In general, Zend's Hashtable is a linked-list hash, which is also optimized for linear traversal, as shown here:


Hashtable contains two kinds of data structure, a linked list hash and a bidirectional linked list, the former is used for fast key-value query, the latter convenient linear traversal and sorting, a bucket in both data structures.
a few explanations of this data structure:
Why is a two-way list used in the L-list hash?
The general chain list hash only needs to operate according to key, only need single chain list is enough. However, Zend sometimes need to remove a given bucket from a list hash, and using a doubly linked list can be very efficient.
L What is Ntablemask?
This value is used for the conversion of the hash value to the subscript of the arbuckets array. When initializing a hashtable,zend first allocates the ntablesize size of memory for the arbuckets array, Ntablesize takes the smallest 2^n that is not less than the user-specified size, that is, the binary 10*. Ntablemask = ntablesize–1, the 01* of the binary, at which time H & Ntablemask fall exactly in [0, ntablesize–1], Zend to access the Arbuckets array as index.
L What is pdataptr?
Typically, when a user inserts a key-value pair, Zend copies the value and points the pdata to the value copy. The copy operation needs to call the Zend internal routine emalloc to allocate memory, which is a time-consuming operation and consumes a chunk of memory larger than the value (the extra memory is used to store cookies), which can be a big waste if the value is small. Given that Hashtable is used to hold pointer values, Zend introduces Pdataptr, and when value is as small as the pointer, Zend copies it directly into Pdataptr and points pdata to pdataptr. This avoids the emalloc operation, but also helps to improve the cache hit rate.
Why is the Arkey size only 1? Why not use pointer management key?
Arkey is an array of keys, but its size is only 1, and is not sufficient to drop the key. The following code can be found in the Hashtable initialization function:
1p = (Bucket *) pemalloc (sizeof (Bucket)-1 + nkeylength, ht->persistent);
Visible, Zend for a bucket allocated a piece enough to put down their own and key memory,
The upper part of the L is bucket, the lower part is key, and Arkey "exactly" is the last element of bucket, so you can use Arkey to access the key. This approach is most common in memory management routines, when allocating memory, it is actually allocating more memory than the specified size, the upper half of which is usually called a cookie, which stores information about the memory, such as block size, previous pointer, next pointer, etc. This method is used by the transmit program of Baidu.
Without pointer management key, it is to reduce the emalloc operation, but also can improve the cache hit rate. Another necessary reason is that key is fixed in most cases, and will not cause the whole bucket to be redistributed because the key becomes longer. This also explains why value is not allocated together as an array--because value is variable.
1.2.2 PHP Array
There's still one question about Hashtable. No answer, what is nnextfreeelement?
Unlike the general hash, the Zend Hashtable allows the user to specify the hash value directly, ignoring the key and even not specifying the key (at this point, the Nkeylength is 0). At the same time, Hashtable also supports append operations, users can not specify the hash value, only need to provide value, at this time, Zend with Nnextfreeelement as a hash, after the nnextfreeelement increment.
This behavior of Hashtable looks strange because it will not be able to access value by key, and is not a hash at all. The key to understanding the problem is that the PHP array is implemented using Hashtable-associative arrays use the normal k-v mappings to add elements to the Hashtable, the key to the user-specified string; The unassociated array uses the array subscript as the hash value, without the key; When the Association and the Association are mixed in an array, or when the Array_push operation is used, the nnextfreeelement is required.
Again, the value of the value,php array directly uses the general structure of Zval, and pdata points to zval*, as described in the previous section, this zval* will be stored directly in Pdataptr. Because the zval is used directly, the elements of the array can be any PHP type.
The traversal of an array, that is, foreach, each, is done through a Hashtable doubly linked list, and the current position is recorded pinternalpointer as a cursor.
1.2.3 Variable Symbol table
In addition to arrays, Hashtable is also used to store many other data, such as PHP functions, variable symbols, loaded modules, class members, and so on.
A variable symbol table is the equivalent of an associative array whose key is a variable name (visible, it is not a good idea to use a very long variable name), value is zval*.
At any moment, PHP code can see two variables symbol table--symbol_table and active_symbol_table--are used to store global variables, called global symbol tables, and the latter is a pointer to the currently active variable symbol table, which is usually the global symbol table. However, each time you enter a PHP function (this refers to a function created by the user using PHP code), Zend creates the variable symbol table for the function part and points the active_symbol_table to the local symbol table. Zend Always use active_symbol_table to access variables, which enables the scope control of local variables.
However, if a variable that is marked Global is accessed locally in the function, Zend will do special processing-creating a reference to a variable of the same name in the symbol_table in Active_symbol_table, which is created first if there is no variable of the same name in Symbol_table.
1.3 Memory and files
Programs have resources that generally include memory and files, and for normal programs, these resources are process-oriented, and when the process is finished, the operating system or the C library automatically reclaims those resources that we have not explicitly released.
However, the PHP program has its own particularity, it is based on the page, a page run will also apply for memory or files such resources, however, when the page is finished running, the operating system or C library may not know the need for resource recovery. For example, we compile PHP as a module into Apache and run Apache in Prefork or worker mode. In this case, the Apache process or thread is reused, and the memory allocated by the PHP page will remain memory until the core.
To address this problem, Zend provides a set of memory allocation APIs that function like the corresponding functions in C, unlike functions that allocate memory from Zend's own memory pool, and they can implement automatic page-recycling. In our module, the memory allocated for the page should use these APIs instead of the C routines, otherwise Zend would try to efree out our memory at the end of the page, and the result is usually crush.
Emalloc ()
Efree ()
Estrdup ()
Estrndup ()
Ecalloc ()
Erealloc ()
In addition, Zend provides a set of VCWD_XXX macros to replace the corresponding file APIs for C libraries and operating systems that support the virtual working directory of PHP and should always be used in module code. See the PHP source code "tsrm/tsrm_virtual_cwd.h" for specific definitions of macros. You may notice that all of those macros do not provide a close operation because the close object is an open resource, does not involve a file path, so you can use the C or OS routines directly; Similarly, operations such as read/write are routines that use either C or the operating system directly.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.