Discussion on the evolution of garbage collection algorithm (garbage Collection) in PHP5

Source: Internet
Author: User
Tags php source code scalar

Objective

PHP is a managed language, in PHP programming programmers do not need to manually handle the allocation and release of memory resources (using C to write PHP or Zend extensions), which means that PHP itself implements the garbage collection mechanism (garbage Collection). Now if you go to PHP official website (php.net) can see, currently PHP5 two branch version PHP5.2 and PHP5.3 is updated separately, this is because many projects still use 5.2 version of PHP, and 5.3 version of 5.2 is not fully compatible. PHP5.3 has made many improvements on the basis of PHP5.2, in which the garbage collection algorithm belongs to a relatively big change. This article will discuss the garbage collection mechanism of PHP5.2 and PHP5.3 respectively, and discuss the influence of this evolution and improvement on the programmer's writing PHP and the problems to be noticed.

Internal representation of PHP variables and associated memory objects

Garbage collection is ultimately the operation of variables and their associated memory objects, so before discussing the garbage collection mechanism of PHP, let's briefly describe the internal representations of variables and their memory objects in PHP (the representation in their C source code).

The PHP official document divides variables in PHP into two categories: scalar and complex. Scalar types include Boolean, Integer, float, and string, complex types include arrays, objects, and resources, and a null is special, it is not divided into any type, but is a separate category.

All of these types, within PHP unified with a structure called zval, in the PHP source code This structure name is "_zval_struct". The specific definition of zval in the PHP source code "zend/zend.h" file, the following is an excerpt of the relevant code.

typedef Union _ZVALUE_VALUE {    long lval;                  /* Long value */    double dval;                /* Double value */    struct {        char *val;        int len;    } STR;    HashTable *ht;              /* Hash Table value */    zend_object_value obj;} zvalue_value;  struct _zval_struct {/    * Variable information */    zvalue_value value;     /* Value */    zend_uint refcount__gc;    Zend_uchar type;    /* Active type */    Zend_uchar is_ref__gc;};

Where the Union "_zvalue_value" is used to represent the values of all variables in PHP, union is used because a zval can only represent one type of variable at a time. You can see that there are only 5 fields in _zvalue_value, but there are 8 types of data in PHP, so how does PHP internally represent 8 types with 5 fields? This is a clever place for PHP design, and it achieves the purpose of reducing fields by reusing fields. For example, in PHP, the Boolean, Integer, and resource (as long as the identifier of the storage resource) is stored through the Lval field; Dval is used to store floating-point, str stores strings, HT storage arrays (note that arrays in PHP are actually hash tables), and obj stores object types If all fields are set to 0 or NULL, then NULL in PHP is used to store 8 types of values in 5 fields.

The value in the current Zval, which is the type of value (_zvalue_value), exactly represents that type, which is determined by the type in "_zval_struct". _zval_struct is the specific implementation of Zval in C, each zval represents a variable memory object. In addition to the value and type, you can see that there are two fields in _zval_struct, refcount__gc and IS_REF__GC, from which you can conclude that the two guys are related to garbage collection. Yes, PHP's garbage collection relies on both of these fields. Where REFCOUNT__GC indicates that there are currently several variables referencing this zval, and IS_REF__GC indicates whether the current zval is referenced by reference, which sounds very awkward, which is related to the zval "write-on-copy" mechanism in PHP. Since this topic is not the focus of this article, it is no longer detailed here, so readers just need to remember that the REFCOUNT__GC field works.

Garbage collection algorithm in PHP5.2--reference counting

The memory recycling algorithm used in PHP5.2 is the famous reference counting, the algorithm is called "Reference counting", the idea is very intuitive and concise: to allocate a counter for each memory object, When a Memory object is established, the counter is initialized to 1 (so there is always a variable that references this object), and each time a new variable references this memory object, the counter is incremented by 1, and whenever a variable referencing this memory object is reduced, the counter is reduced by 1, and when the garbage collection mechanism works, Destroys and reclaims memory consumed by all memory objects that have a counter of 0. The memory object in PHP is Zval, and the counter is refcount__gc.

For example, the following PHP code demonstrates how the PHP5.2 counter works (counter values are obtained by Xdebug):

 
  Reference counting simple and intuitive, easy to achieve, but there is a fatal flaw, it is easy to cause memory leaks. Many friends may have realized that if there is a circular reference, then reference counting can lead to memory leaks. For example, the following code:
 
  

This code first establishes the array A, then lets the first element of a refer to a by reference, when the zval of a refcount becomes 2, then we destroy the variable A, at which point a initially points to the Zval RefCount is 1, but we can no longer manipulate it, Because it forms a circular self-reference, as shown in:

Where the gray part indicates that no longer exists. Because the refcount of the Zval pointed before a is 1 (referenced by the first element of its hashtable), the Zval is not destroyed by the GC, and this memory is leaked.

In particular, PHP is a symbol table that stores variable symbols, a global symbol table, and each complex type such as an array or object has its own symbol table, so the above code, A and a[0] is two symbols, but a is stored in the global symbol table, and a[0] stored in the symbol table of the array itself, where a and a[0] refer to the same zval (of course symbol A was later destroyed). Readers are expected to be aware of the zval relationship between symbols.

This disclosure may not be important when PHP is used only for dynamic page scripting, because the lifetime of a dynamic page script is short, and PHP guarantees that all its resources will be released when the script is finished. But PHP has evolved so far not only as a dynamic page script, but if PHP is used in long-life scenarios, such as automated test scripts or deamon processes, memory leaks that accumulate after multiple loops can be severe. It's not that I'm sensational, a company I've interned with Deamon processes that are written by PHP to interact with data storage servers.

Because of this flaw in reference counting, PHP5.3 improved the garbage collection algorithm.

PHP5.3 garbage collection Algorithm--concurrent Cycle Collection in Reference counted Systems

PHP5.3 's garbage collection algorithm is still based on reference counting, but instead of using a simple count as a recycling guideline, it uses a synchronous recovery algorithm, which is used by IBM engineers in paper concurrent Cycle Collection in Reference Presented in counted systems.

This algorithm is quite complex, from the paper 29 pages of the number I think you can see, so I do not intend (nor the ability) to complete the discussion of this algorithm, interested friends can read the above mentioned paper (strongly recommended, this paper is very wonderful).

Here, I can only describe the basic idea of this algorithm in general.

First PHP allocates a fixed-size "root buffer", which is used to hold a fixed number of zval, which is 10,000 by default, and needs to be modified if necessary to modify the constants in the source code ZEND/ZEND_GC.C gc_root_buffer_max_ Entries and then recompile.

From the above we can know that a zval if there is a reference, it is either referenced by a symbol in the global symbol table, or by another symbol in the zval that represents the complex type. So there are some possible roots (root) in the zval. Here we will not discuss how PHP found these possible roots, this is a very complex problem, in short, PHP has a way to find these possible root zval and put them into the root buffer.

When the root buffer is in fulfilment, PHP performs garbage collection, which is the following algorithm:

1, the root in each root buffer zval follow the depth-first traversal algorithm to traverse all the zval that can be traversed, and the refcount of each zval minus 1, and in order to avoid the same zval several times minus 1 (because a different root can traverse to the same zval), Each time a zval is reduced by 1, it is marked as "reduced".

2, again for each buffer in the root zval depth first traversal, if a zval refcount is not 0, then add 1, otherwise keep it as 0.

3. Empty all the roots in the root buffer (note that these zval are purged from the buffer instead of destroying them), and then destroy all Zval with RefCount 0 and reclaim their memory.

It doesn't matter if you don't fully understand it, just remember that the PHP5.3 garbage collection algorithm has the following characteristics:

1, not every time the RefCount reduced to enter the recovery cycle, only the root buffer after the fulfilment of the garbage collection at the beginning.

2, can solve the circular reference problem.

3. You can always keep the memory leak below a threshold value.

Performance comparison of PHP5.2 and PHP5.3 garbage collection algorithms

Due to my current limitations, I will not redesign the experiment, but directly refer to the experiment in PHP manual, for a comparison of the performance of the two, please refer to the relevant chapters in PHP Manual: http://www.php.net/manual/en/ features.gc.performance-considerations.php.

The first is the memory leak test, the following direct reference to PHP manual in the experimental Code and test results diagram:

 
  self = $a;    if ($i% = = 0)    {        echo sprintf ('%8d: ', $i), Memory_get_usage ()-$baseMemory, "\ n";    }}? >

PHP Memory Leak test

You can see that in scenarios where a cumulative memory leak is likely to occur, PHP5.2 has a persistent cumulative memory leak, while PHP5.3 always controls memory leaks below a threshold value (related to the root buffer size).

In addition, there is a comparison of performance:

 
  self = $a;}  Echo memory_get_peak_usage (), "\ n";? >

This script executes 1 million loops, which makes the delay time enough to compare.

This script is then run using the CLI mode, respectively, when you turn on memory reclamation and turn off memory reclamation:

Time Php-dzend.enable_gc=0-dmemory_limit=-1-n example2.php# andtime php-dzend.enable_gc=1-dmemory_limit=-1-n exampl e2.php

In my machine environment, the runtime is 6.4s and 7.2s respectively, you can see PHP5.3 garbage collection mechanism will be slower, but the impact is not big.

PHP configuration related to garbage collection algorithm

You can turn on or off the PHP garbage collection mechanism by modifying the zend.enable_gc in php.ini, or you can turn on or off the PHP garbage collection mechanism by calling gc_enable () or gc_disable (). Even if the garbage collection mechanism is turned off in PHP5.3, PHP will still record the possible root buffer, but PHP will not automatically run garbage collection when the root buffer is up, and of course, you can force memory reclamation by calling the Gc_collect_cycles () function manually at any time.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.