The garbage collection mechanism is a dynamic storage allocation solution that automatically releases allocated memory blocks that are no longer needed by the program. PHP also implements dynamic memory management at the language layer. the dynamic management of memory saves developers from tedious memory management. the garbage collection mechanism is a dynamic storage allocation solution. It automatically releases allocated memory blocks that are no longer needed by the program. The process of automatic memory recovery is called garbage collection. The garbage collection mechanism allows programmers to focus less on program memory allocation and devote more energy to business logic. Among the popular languages, the garbage collection mechanism is a common feature of the new generation of languages, such as Python, PHP, Eiffel, C #, Ruby, and so on. Although garbage collection is a popular practice, it is not long enough. The Lisp system developed by MIT in 1960s already has its presence. However, due to the immature technical conditions at that time, the garbage collection mechanism became a seemingly beautiful technology, it was not until the emergence of Java in 1990s that the garbage collection mechanism was widely used.
PHP also implements dynamic memory management at the language layer, which is described in the previous chapter in detail. the dynamic memory management saves developers from tedious memory management. In addition, PHP also provides a language-layer garbage collection mechanism, so that programmers do not have to worry too much about program memory allocation.
Before PHP5.3, PHP only had simple garbage collection based on reference count. when the reference count of a variable changes to 0, PHP will destroy this variable in the memory, the garbage here cannot be called garbage. In addition, PHP will release the content of this process/thread after the end of a lifecycle. this method determines that PHP does not need to worry too much about memory leakage in the early stage. However, with the development of PHP, the increase of PHP developers and the expansion of the business scope carried by PHP, a more comprehensive garbage collection mechanism is introduced in PHP5.3. The new garbage collection mechanism solves the problem of reference memory leakage that cannot process loops. The garbage Collection mechanism in PHP5.3 uses the synchronization algorithm in the Concurrent Cycle Collection in Reference Counted Systems in the document Reference counting system. We will not repeat the introduction of this algorithm. the official documents in PHP are illustrated in the following illustration: collection cycle (Collecting Cycles ).
As mentioned above, in PHP, the main memory management method is reference counting. the purpose of introducing the garbage collection mechanism is to break the circular reference in reference counting, this prevents memory leakage. The garbage collection mechanism is based on PHP's dynamic memory management. PHP5.3 introduces the garbage collection mechanism and has some changes in the basic structure of variable storage, as shown below:
The code is as follows:
Struct _ zval_struct {
/* Variable information */
Zvalue_value value;/* value */
Zend_uint refcount _ gc;
Zend_uchar type;/* active type */
Zend_uchar is_ref _ gc;
};
Compared with versions earlier than PHP5.3, the reference counting field refcount and the referenced field is_ref both add _ gc to the end of the field for the new garbage collection mechanism. In the source code style of PHP, a large number of macros are very distinctive. These macros are equivalent to an interface layer, which shields some underlying implementations below the interface layer, such as ALLOC_ZVAL macros. before PHP5.3, this macro directly calls the PHP memory management allocation function emalloc to allocate memory, the size of the allocated memory is determined by the variable type. After the garbage collection mechanism is introduced, the ALLOC_ZVAL macro uses the new garbage collection unit structure directly. the allocated size is the same, all of which are the memory size occupied by the zval_gc_info struct. after the memory is allocated, initializes the garbage collection mechanism of this struct. The following code:
The code is as follows:
/* The following macroses override macroses from zend_alloc.h */
# Undef ALLOC_ZVAL
# Define ALLOC_ZVAL (z )\
Do {\
(Z) = (zval *) emalloc (sizeof (zval_gc_info ));\
GC_ZVAL_INIT (z );\
} While (0)
The zend_gc.h file is in zend. the first row of h is referenced: # include "zend_gc.h" to replace macro ALLOC_ZVAL and other macros in the zend_alloc.h file referenced in the second row in the new macro, the key change is the change of the allocated memory size and allocated content. in the previous pure memory allocation, the content of the garbage collection mechanism was added. all the content is included in the zval_gc_info structure:
The code is as follows:
Typedef struct _ zval_gc_info {
Zval z;
Union {
Gc_root_buffer * buffered;
Struct _ zval_gc_info * next;
} U;
} Zval_gc_info;
A zval structure is assigned to any variable stored in the zval container. this structure ensures that it is aligned with the memory allocated with the zval variable, so that the zval_gc_info type pointer is forcibly converted, it can be used as zval. There is a consortium behind the zval field: u. U includes the buffered field in the gc_root_buffer structure and the next field in the zval_gc_info structure. One of these two fields indicates the root node cached by the garbage collection mechanism, and the other is the next node in the zval_gc_info list. the node cached by the garbage collection mechanism is either the root node or the list node, can be reflected here. ALLOC_ZVAL will call GC_ZVAL_INIT to initialize zval_gc_info, which replaces zval. it will set the buffered field of member u in zval_gc_info to NULL, this field has a value only when it is put into the garbage collection buffer; otherwise it will always be NULL. All the variables in PHP exist in the form of zval variables. here, zval is replaced with zval_gc_info to achieve the integration of the garbage collection mechanism in the original system.
PHP's garbage collection mechanism is enabled by default in PHP5.3, but we can directly disable it through the configuration file. the corresponding configuration field is zend. enable_gc. This field is not found in the php. ini file by default. to disable this function, add zend. enable_gc = 0 or zend. enable_gc = off to php. ini. In addition to modifying php. ini to configure zend. enable_gc, you can call the gc_enable ()/gc_disable () function to enable/disable the garbage collection mechanism. The calling effect of these functions is the same as modifying the configuration item to enable or disable the garbage collection mechanism. In addition to these two functions, PHP provides the gc_collect_cycles () function to forcibly execute the cycle recycle when the root buffer is not full. Whether the garbage collection mechanism is enabled has some related operations and fields in the PHP source code. The zend. c file contains the following code:
The code is as follows:
Static ZEND_INI_MH (OnUpdateGCEnabled )/*{{{*/
{
OnUpdateBool (entry, new_value, new_value_length, mh_arg1, mh_arg2, mh_arg3, stage TSRMLS_CC );
If (GC_G (gc_enabled )){
Gc_init (TSRMLS_C );
}
Return SUCCESS;
}
/*}}}*/
ZEND_INI_BEGIN ()
ZEND_INI_ENTRY ("error_reporting", NULL, ZEND_INI_ALL, OnUpdateErrorReporting)
STD_ZEND_INI_BOOLEAN ("zend. enable_gc", "1", ZEND_INI_ALL, OnUpdateGCEnabled, gc_enabled, zend_gc_globals, gc_globals)
# Ifdef ZEND_MULTIBYTE
STD_ZEND_INI_BOOLEAN ("detect_unicode", "1", ZEND_INI_ALL, OnUpdateBool, detect_unicode, zend_compiler_globals, compiler_globals)
# Endif
ZEND_INI_END ()
The operation function corresponding to zend. enable_gc is ZEND_INI_MH (OnUpdateGCEnabled). If the garbage collection mechanism is enabled, that is, GC_G (gc_enabled) is true, the gc_init function is called to initialize the garbage collection mechanism. The gc_init function is in row zend/zend_gc.c 121. this function determines whether the garbage collection mechanism is enabled. if so, the entire mechanism is initialized, that is, you can directly call malloc to allocate 10000 gc_root_buffer memory space to the entire cache list. Here, 10000 is hard-coded in the code. the macro GC_ROOT_BUFFER_MAX_ENTRIES exists. if you need to modify this value, you need to modify the source code and re-compile PHP. The gc_init function calls the gc_reset function after pre-allocating memory to reset some global variables used by the entire mechanism. for example, set the gc running count (gc_runs) and the gc garbage count (collected) to 0, set the previous node and next node of the two-way linked list header node to point to itself. In addition to the global variables used for the garbage collection mechanism, there are other variables that use more, which are described as follows:
The code is as follows:
Typedef struct _ zend_gc_globals {
Zend_bool gc_enabled;/* whether to enable the garbage collection mechanism */
Zend_bool gc_active;/* in progress */
Gc_root_buffer * buf;/* pre-allocated buffer array. the default value is 10000 (preallocated arrays of buffers )*/
Gc_root_buffer roots;/* list of root nodes (list of possible roots of cycles )*/
Gc_root_buffer * unused;/* list of unused buffers )*/
Gc_root_buffer * first_unused;/* points to the first unused buffer node (pointer to first unused buffer )*/
Gc_root_buffer * last_unused;/* points to the last unused buffer node, which indicates the end of use (pointer to last unused buffer )*/
Zval_gc_info * zval_to_free;/* temporary list of zval variables to be released (temporaryt list of zvals to free )*/
Zval_gc_info * free_list;/* temporary variable, starting with the list to be released */
Zval_gc_info * next_to_free;/* temporary variable, next location of the variable to be released */
Zend_uint gc_runs;/* gc running count */
Zend_uint collected;/* Number of garbage in gc */
// Omit...
}
When we use an unset operation to clear the memory occupied by this variable (it may only reduce the reference count by one), the corresponding item of the variable name will be deleted from the hash table of the current symbol, after all operations are completed, a destructor is called for the items deleted from the symbol table. the temporary variable will call zval_dtor, and the general variable will call zval_ptr_dtor.
Of course, we cannot find the unset function in the PHP function set, because it is a language structure. The intermediate code is ZEND_UNSET. you can find the implementation related to it in the Zend/zend_vm_execute.h file.
Zval_ptr_dtor is not a function, but a macro that looks a little like a function. In the Zend/zend_variables.h file, this macro points to the function _ zval_ptr_dtor. In row 424 of Zend/zend_execute_API.c, the function code is as follows:
The code is as follows:
ZEND_API void _ zval_ptr_dtor (zval ** zval_ptr ZEND_FILE_LINE_DC )/*{{{*/
{
# If DEBUG_ZEND> = 2
Printf ("Cing refcount for % x (% x): % d-> % d \ n", * zval_ptr, zval_ptr, Z_REFCOUNT_PP (zval_ptr), Z_REFCOUNT_PP (zval_ptr) -1 );
# Endif
Z_DELREF_PP (zval_ptr );
If (Z_REFCOUNT_PP (zval_ptr) = 0 ){
TSRMLS_FETCH ();
If (* zval_ptr! = & EG (uninitialized_zval )){
GC_REMOVE_ZVAL_FROM_BUFFER (* zval_ptr );
Zval_dtor (* zval_ptr );
Efree_rel (* zval_ptr );
}
} Else {
TSRMLS_FETCH ();
If (Z_REFCOUNT_PP (zval_ptr) = 1 ){
Z_UNSET_ISREF_PP (zval_ptr );
}
GC_ZVAL_CHECK_POSSIBLE_ROOT (* zval_ptr );
}
}
/*}}}*/
From the code, we can clearly see the zval destructor. we have done the following two operations on the reference counting field:
If the reference count of a variable is 1, that is, the reference count is 0 after the value is reduced, the variable is cleared directly. If the current variable is cached, you need to clear the cache. if the reference count of the variable is greater than 1, that is, if the reference count is greater than 0, the variable is put into the garbage list. If a change has a reference, the reference is removed.
The GC_ZVAL_CHECK_POSSIBLE_ROOT macro is used to add variables to the garbage collection list. the corresponding function gc_zval_check_possible_root is used to recycle arrays and objects. For arrays and object variables, it calls the gc_zval_possible_root function.
The code is as follows:
ZEND_API void gc_zval_possible_root (zval * zv TSRMLS_DC)
{
If (UNEXPECTED (GC_G (free_list )! = NULL &&
GC_ZVAL_ADDRESS (zv )! = NULL &&
GC_ZVAL_GET_COLOR (zv) = GC_BLACK )&&
(GC_ZVAL_ADDRESS (zv) <GC_G (buf) |
GC_ZVAL_ADDRESS (zv)> = GC_G (last_unused ))){
/* The given zval is a garbage that is going to be deleted
* Currently running GC */
Return;
}
If (zv-> type = IS_OBJECT ){
GC_ZOBJ_CHECK_POSSIBLE_ROOT (zv );
Return;
}
GC_BENCH_INC (zval_possible_root );
If (GC_ZVAL_GET_COLOR (zv )! = GC_PURPLE ){
GC_ZVAL_SET_PURPLE (zv );
If (! GC_ZVAL_ADDRESS (zv )){
Gc_root_buffer * newRoot = GC_G (unused );
If (newRoot ){
GC_G (unused) = newRoot-> prev;
} Else if (GC_G (first_unused )! = GC_G (last_unused )){
NewRoot = GC_G (first_unused );
GC_G (first_unused) ++;
} Else {
If (! GC_G (gc_enabled )){
GC_ZVAL_SET_BLACK (zv );
Return;
}
Zv-> refcount _ gc ++;
Gc_collect_cycles (TSRMLS_C );
Zv-> refcount _ gc --;
NewRoot = GC_G (unused );
If (! NewRoot ){
Return;
}
GC_ZVAL_SET_PURPLE (zv );
GC_G (unused) = newRoot-> prev;
}
NewRoot-> next = GC_G (roots). next;
NewRoot-> prev = & GC_G (roots );
GC_G (roots). next-> prev = newRoot;
GC_G (roots). next = newRoot;
GC_ZVAL_SET_ADDRESS (zv, newRoot );
NewRoot-> handle = 0;
NewRoot-> u. pz = zv;
GC_BENCH_INC (zval_buffered );
GC_BENCH_INC (root_buf_length );
GC_BENCH_PEAK (root_buf_peak, root_buf_length );
}
}
}
As mentioned above, the gc_zval_check_possible_root function only recycles arrays and objects. However, in the gc_zval_possible_root function, variables of the object type call the GC_ZOBJ_CHECK_POSSIBLE_ROOT macro. For other variable types that can be used for garbage collection, the call process is as follows:
Check whether the zval node information has been put into the node buffer. if it has already been put into the node buffer, it is directly returned, which can optimize its performance. Then, process the object node and return it directly. no longer perform the subsequent operations to determine whether the node has been marked as purple. if it is purple, it is not added to the node buffer, this ensures that a node is added to the buffer only once.
Mark the color of the node as purple, indicating that the node has been added to the buffer. you do not need to add it again next time.
Find the location of the new node. if the buffer zone is full, perform the garbage collection operation.
Add the new node to the bidirectional linked list of the buffer.
In the gc_zval_possible_root function, when the buffer is full, the program calls the gc_collect_cycles function to perform garbage collection.
The most critical steps are::
The first line here is Step B of the algorithm in its official documentation. The algorithm uses a deep-first search to find all possible roots, and then removes the reference count in each variable container by 1, to ensure that "1" is not subtracted twice for the same variable container, use the gray mark that has been reduced by 1.
Row 3: This is step C of the algorithm. The algorithm uses deep-first searches for each root node again to check the reference count of each variable container. If the reference count is 0, the variable container is marked in white. If the number of references is greater than 0, restore the operation that uses deep preference search at this point and subtract the reference count by 1 (that is, the reference count plus 1), and then re-mark them with black.
The last step of the 630th-line algorithm is D. The algorithm traverses the root buffer to delete the variable container root (zval roots) from there, and checks whether there are variable containers marked White in the previous step. Each variable container marked with white is cleared. In [gc_collect_cycles ()-> gc_collect_roots ()-> zval_collect_white ()], we can see that the nodes marked with white will be added to the global variable zval_to_free list. This list is used in subsequent operations.
The garbage collection mechanism of PHP is marked in four colors during execution.
GC_WHITE indicates garbage in White
GC_PURPLE purple indicates that the buffer has been placed
GC_GREY gray indicates that a refcount minus one operation has been performed.
GC_BLACK Black is the default color, normal
The related markup and operation code is as follows:
The code is as follows:
# Define GC_COLOR 0x03
# Define GC_BLACK 0x00
# Define GC_WHITE 0x01
# Define GC_GREY 0x02
# Define GC_PURPLE 0x03
# Define GC_ADDRESS (v )\
(Gc_root_buffer *) (zend_uintptr_t) (v ))&~ GC_COLOR ))
# Define GC_SET_ADDRESS (v, )\
(V) = (gc_root_buffer *) (zend_uintptr_t) (v) & GC_COLOR) | (zend_uintptr_t) ())))
# Define GC_GET_COLOR (v )\
(Zend_uintptr_t) (v) & GC_COLOR)
# Define GC_SET_COLOR (v, c )\
(V) = (gc_root_buffer *) (zend_uintptr_t) (v ))&~ GC_COLOR) | (c )))
# Define GC_SET_BLACK (v )\
(V) = (gc_root_buffer *) (zend_uintptr_t) (v ))&~ GC_COLOR ))
# Define GC_SET_PURPLE (v )\
(V) = (gc_root_buffer *) (zend_uintptr_t) (v) | GC_PURPLE ))
The above bit-based flag state is frequently used in PHP source code, such as memory management, which is a more efficient and cost-effective solution. However, we may not use this method for fields during database design. it should be implemented in a more intuitive and readable way.