Evolution of the garbage collection algorithm (GarbageCollection) in PHP5-PHP Tutorial

Source: Internet
Author: User
The evolution of GarbageCollection in PHP5. PHP is a managed language. in PHP programming, programmers do not need to manually allocate and release memory resources (except for PHP or Zend extensions written in C ), this means that PHP itself is a hosting language. in PHP programming, programmers do not need to manually handle the allocation and release of memory resources (except for PHP or Zend extensions written in C ), this means that PHP implements the Garbage Collection mechanism (Garbage Collection ). Now, if you go to the PHP official website (php.net), we can see that the two branch versions of PHP5, PHP5.2 and PHP5.3, are updated separately, because many projects still use PHP of version 5.2, the 5.3 version is not fully compatible with 5.2. PHP5.3 has made many improvements based on PHP5.2, and the garbage collection algorithm is a big change. This article will discuss the garbage collection mechanism of PHP5.2 and PHP5.3 respectively, and discuss the impact of this evolution and improvement on Programming PHP and the issues that need attention.

Internal representation of PHP variables and associated memory objects

Garbage collection is essentially an operation on variables and their associated memory objects. so before discussing the garbage collection mechanism of PHP, first, we will briefly introduce the internal representation of variables and their memory objects in PHP (the representation in its C source code ).

In the official PHP documentation, variables in PHP are divided into two types: scalar type and complex type. Scalar types include Boolean, integer, floating point, and string types. complex types include arrays, objects, and resources. a NULL type is special. it is not classified into any type, but a single type.

All these types are represented by a zval structure in PHP. in PHP source code, the structure is named "_ zval_struct ". Zval is defined in the "Zend/zend. h" file of the PHP source code. The following is an excerpt from the relevant code.

typedef union _zvalue_value {long lval;/* long value */double dval;/* double value */struct {char *val;int len;} str;HashTable *ht;/* hash table value */zend_object_value obj;} zvalue_value;struct _zval_struct {/* Variable information */zvalue_value value;/* value */zend_uint refcount__gc;zend_uchar type;/* active type */zend_uchar is_ref__gc;};

The consortium "_ zvalue_value" is used to represent the values of all variables in PHP. union is used here because a zval can only represent one type of variable at a time. We can see that _ zvalue_value contains only five fields, but PHP calculates that NULL has eight data types. how does PHP internally use five fields to represent eight types? This is a clever place in PHP design. it achieves the goal of reducing fields by reusing fields. For example, in PHP, Boolean, integer, and resource (as long as the resource identifier is stored) are stored through the lval field; dval is used to store floating point type; str stores strings; ht storage array (note that the array in PHP is actually a hash table), while obj stores the object type. If all fields are set to 0 or NULL, it indicates NULL in PHP, in this way, 8 types of values are stored with 5 fields.

The type of value in zval is determined by the type in "_ zval_struct. _ Zval_struct is the specific implementation of zval in C language. each zval represents a variable memory object. In addition to value and type, we can see that _ zval_struct has two fields refcount _ gc and is_ref _ gc. from its suffix, we can conclude that these two types are related to garbage collection. Yes, PHP's garbage collection relies on these two fields. Here, refcount _ gc indicates that several variables currently reference this zval, while is_ref _ gc indicates whether the current zval is referenced by reference. This sounds very bad, this is related to the "Write-On-Copy" mechanism of zval in PHP. This topic is not the focus of this article, so we will not detail it here, you only need to remember the role of the refcount _ gc field.

Garbage collection algorithm in PHP5.2 -- Reference Counting

The memory recycling algorithm used in PHP5.2 is the famous Reference Counting. the Chinese translation of this algorithm is called "Reference Counting". Its idea is very intuitive and concise: allocate a counter for each memory object, when a memory object is created, the counter is initialized to 1 (so there is always a variable referencing this object). after each new variable references this memory object, the counter is incremented by 1, each time you reduce a variable that references this memory object, the counter is reduced by 1. when the garbage collection mechanism operates, all the memory objects whose counters are 0 are destroyed and their occupied memory is recycled. In PHP, the memory object is zval, and the counter is refcount _ gc.

For example, the following PHP code demonstrates how the PHP5.2 counter works (the counter value is obtained through xdebug.org ):


$ Val1 = 100; // zval (val1). refcount_gc = 1;
$ Val2 = $ val1; // zval (val1 ). refcount_gc = 2, zval (val2 ). refcount_gc = 2 (because it is Write on copy, the current val2 and val1 reference a zval)
$ Val2 = 200; // zval (val1). refcount_gc = 1, zval (val2). refcount_gc = 1 (here val2 creates a zval)
Unset ($ val1); // zval (val1). refcount_gc = 0 (the zval referenced by $ val1 is no longer available and will be recycled by GC)

?>

Reference Counting is simple, intuitive, and easy to implement, but it has a fatal defect, that is, it may easily cause memory leakage. Many of my friends may have realized that if circular references exist, Reference Counting may cause memory leakage. For example, the following code:


$ A = array ();
$ A [] = & $;
Unset ($ );

?>

This code first creates array a, and then points the first element of a to a by reference. then, the refcount of zval of a is changed to 2, and then we destroy the variable, at this time, the refcount of zval initially pointed by a is 1, but we can no longer operate on it, because it forms a circular self-reference, as shown in:

The gray part indicates that it no longer exists. Because the refcount of zval pointed to by a is 1 (referenced by the first element of its HashTable), this zval will not be destroyed by GC, and this part of memory will be leaked.

In particular, PHP stores the variable symbols through the Symbol Table. there is a Symbol Table globally, each complex type, such as an array or object, has its own symbol table. Therefore, in the code above, a and a [0] are two symbols, but a is stored in the global symbol table, a [0] is stored in the symbol table of the array, and here a and a [0] reference the same zval (of course, symbol a is destroyed later ). I hope that readers will pay attention to the relationship between the zval of the Symbol.

This leakage may not be important when PHP is only used for dynamic page scripts. because the lifecycle of dynamic page scripts is very short, PHP will ensure that all its resources will be released after the script is executed. However, PHP is not just as simple as dynamic page scripts until now. if PHP is used in scenarios with a long lifecycle, such as automated test scripts or deamon processes, the memory leakage accumulated after multiple cycles may be very serious. This is not surprising. a company I once worked as an intern used the deamon process written in PHP to interact with the Data Storage Server.

Due to this defect in Reference Counting, PHP5.3 improved the garbage collection algorithm.

Garbage Collection algorithm in PHP5.3 -- Concurrent Cycle Collection in Reference Counted Systems

PHP5.3's garbage collection algorithm is still based on reference counting, but does not use simple counting as a collection criterion, but uses a synchronous collection algorithm, this algorithm was proposed by IBM engineers in the paper Concurrent Cycle Collection in Reference Counted Systems.

This algorithm is quite complex. I can see it from the number of 29 pages in this paper, so I don't plan to (or fully discuss it, if you are interested, you can read the paper mentioned above (I strongly recommend that this paper be a wonderful one ).

Here, I can only briefly describe the basic idea of this algorithm.

First, PHP will allocate a fixed size "root buffer", which is used to store a fixed number of zval, which is 10,000 by default, if you need to modify it, you need to modify the constant GC_ROOT_BUFFER_MAX_ENTRIES in the source code Zend/zend_gc.c and re-compile it.

We can see from the above that a zval can be referenced either by a symbol in the global symbol table or by another symbol in the zval that represents a complex type. Therefore, there are some possible root in zval ). For the moment, we will not discuss how PHP discovers these possible roots. this is a very complicated issue. In short, PHP can find these possible root zval and put them into the root buffer.

When the root buffer is full, PHP executes garbage collection. the garbage collection algorithm is as follows:

1. traverse all zval that can be traversed based on the depth-first traversal algorithm in each root buffer, and reduce the refcount of each zval by 1, at the same time, in order to avoid multiple reductions of 1 for the same zval (because different roots may be able to traverse the same zval ), each time a zval is reduced by 1, it is marked as "subtracted ".

2. traverse the root zval depth in each buffer zone first. if the refcount of a zval is not 0, add 1 to it; otherwise, leave it 0.

3. clear all the roots in the root buffer (note that these zval values are cleared from the buffer rather than destroyed), destroy all zval whose refcount is 0, and reclaim its memory.

It doesn't matter if you cannot fully understand it. remember that PHP5.3's garbage collection algorithm has the following features:

1. not every time the refcount is reduced, it enters the recycle cycle. garbage collection starts only when the root buffer is full.

2. circular references can be solved.

3. memory leakage can always be kept below a threshold.

Performance Comparison between PHP5.2 and PHP5.3 garbage collection algorithms

Due to my current conditions, I will not re-design the experiment. Instead, I will directly reference the experiment in PHP Manual. For more information about the performance comparison between the two, see the relevant chapter in PHP Manual: http://www.php.net/manual/en/features.gc.performance.

The first is the memory leak test. the experiment code and result diagram in PHP Manual are directly referenced below:

Class Foo
{
Public $ var = '3. 1415962654 ';
}

$ BaseMemory = memory_get_usage ();

For ($ I = 0; $ I <= 100000; $ I ++)
{
$ A = new Foo;
$ A-> self = $;
If ($ I % 500 = 0)
{
Echo sprintf ('% 8d:', $ I), memory_get_usage ()-$ baseMemory, "n ";
}
}
?>

It can be seen that in scenarios that may cause accumulative memory leakage, PHP5.2 has a sustained cumulative memory leakage, while PHP5.3 always controls the memory leakage below a threshold (related to the root buffer size ).

Performance comparison:

Class Foo
{
Public $ var = '3. 1415962654 ';
}

For ($ I = 0; $ I <= 1000000; $ I ++)
{
$ A = new Foo;
$ A-> self = $;
}

Echo memory_get_peak_usage (), "n ";
?>

This script runs 1000000 cycles to make the latency time adequate for comparison, and then runs the script in CLI mode when the memory recycle is enabled and the memory recycle is disabled:

Time php-dzend. enable_gc = 0-dmemory_limit =-1-n example2.php
# And
Time php-dzend. enable_gc = 1-dmemory_limit =-1-n example2.php

In my machine environment, the running time is 6.4s and 7.2 s respectively. we can see that the garbage collection mechanism of PHP5.3 is slower, but the impact is not great.

PHP configuration related to the garbage collection algorithm

You can modify zend. enable_gc in php. ini to enable or disable PHP's garbage collection mechanism, or call gc_enable () or gc_disable () to enable or disable PHP's garbage collection mechanism. Even if the garbage collection mechanism is disabled in PHP5.3, PHP still records possible root-to-root buffer, but PHP does not automatically run garbage collection when the root buffer is full. of course, you can call the gc_collect_cycles () function to forcibly recycle memory at any time.

This article is based on the signature-non-commercial use of the 3.0 license agreement, you are welcome to reprint, deduction, but you must keep the signature of this article Zhang Yang (including links), and not the user's commercial purpose.

Compile (except for PHP or Zend extensions written in C), which means PHP itself...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.