Detailed PHP garbage collection mechanism tutorial

Source: Internet
Author: User
Tags benchmark constant garbage collection memory usage scalar zend

This section will illustrate the characteristics of the new garbage collection mechanism (i.e. GC) in PHP 5.3.

Each PHP variable exists in a variable container called "Zval". A Zval variable container that includes two bytes of extra information in addition to the type and value of the variable. The first is "Is_ref", which is a bool value that identifies whether the variable belongs to a reference set (reference set). With this byte, the PHP engine can distinguish between ordinary variables and reference variables, and because PHP allows users to use custom references by using &, the Zval variable container also has an internal reference counting mechanism to optimize memory usage. The second extra byte is "RefCount", which represents the number of variables (also called symbol symbols) that point to the Zval variable container. All symbols exist in a table of symbols, where each symbol has scope (scope), and those main scripts (such as those that are requested by the browser) and each function or method also have scopes.

When a variable is assigned a constant value, a Zval variable container is generated, as in the following example:

Example #1 Creating a new Zval container

<?php
$a = "new string";
?>

In the example above, the new variable A is generated in the current scope. and a variable container of type string and value new string is generated. In the additional two bytes of information, "Is_ref" is set to FALSE by default because there are no custom reference builds. "RefCount" is set to 1 because there is only one variable using this variable container. Note that when the value of "RefCount" is 1 o'clock, the value of "Is_ref" is always false. If you have installed the»xdebug, you can display the values of "RefCount" and "Is_ref" by calling the function Xdebug_debug_zval ().

Example #2 displaying zval information

<?php
Xdebug_debug_zval (' a ');
?>

The above routines will output:

A: (refcount=1, is_ref=0) = ' new String '

Assigning a variable to another variable increases the number of references (refcount).

Example #3 increasing refcount of a zval

<?php
$a = "new string";
$b = $a;
Xdebug_debug_zval (' a ');
?>

The above routines will output:

A: (refcount=2, is_ref=0) = ' new String '

At this point, the number of references is 2 because the same variable container is associated with variable A and variable B. PHP does not copy the generated variable container when it is not necessary. The variable container is destroyed when "RefCount" becomes 0 o'clock. When any variable associated with a variable container leaves its scope (for example, when the function ends), or when a function unset () is called on a variable, "refcount" is reduced by 1, as the following example illustrates:

Example #4 decreasing zval refcount

<?php
$a = "new string";
$c = $b = $a;
Xdebug_debug_zval (' a ');
Unset ($b, $c);
Xdebug_debug_zval (' a ');
?>

The above routines will output:

A: (refcount=3, is_ref=0) = ' new String '

A: (refcount=1, is_ref=0) = ' new String '

If we execute the unset ($a) Now, the variable container containing the type and value is removed from memory.

Compound type (compound Types)

When you think about composite types like array and object, things are a little bit more complicated. Unlike the value of a scalar (scalar) type, variables of array and type object have their own symbol tables in their members or properties. This means that the following example generates three Zval variable containers.

Example #5 Creating a array Zval

<?php
$a = array (' meaning ' => ' life ', ' number ' => 42);
Xdebug_debug_zval (' a ');
?>

The output of the above routine is similar to the following:

A: (Refcount=1, is_ref=0) =array (
' Meaning ' => (refcount=1, is_ref=0) = ' life ',
' Number ' => (refcount=1, is_ref=0) =42
)

These three zval variable containers are: a,meaning and number. The rules for increasing and reducing "refcount" are the same as those mentioned above. Next, we add an element to the array and set its value to the value of an existing element in the group:

Example #6 Adding already existing element to a array

<?php
$a = array (' meaning ' => ' life ', ' number ' => 42);
$a [' life '] = $a [' meaning '];
Xdebug_debug_zval (' a ');
?>

The output of the above routine is similar to the following:

A: (Refcount=1, is_ref=0) =array (
' Meaning ' => (refcount=2, is_ref=0) = ' life ',
' Number ' => (refcount=1, is_ref=0) = 42,
' Life ' => (refcount=2, is_ref=0) = ' life '
)

From the above Xdebug output information, we see that the original array element and the newly added array element are associated with the Zval variable container of the same "RefCount" 2. Although the Xdebug output shows two zval variable containers with a value of ' life ', it is actually the same. The function xdebug_debug_zval () does not display this information, but you can see it by displaying the memory pointer information.

Deleting an element in an array is similar to deleting a variable from the scope. After deletion, the "refcount" value of the container in the array is reduced, and again, when "RefCount" is 0 o'clock, the variable container is removed from memory, and the following example illustrates:

Example #7 Removing an element from an array

<?php
$a = array (' meaning ' => ' life ', ' number ' => 42);
$a [' life '] = $a [' meaning '];
unset ($a [' meaning '], $a [' number ']);
Xdebug_debug_zval (' a ');
?>

The output of the above routine is similar to the following:

A: (Refcount=1, is_ref=0) =array (
' Life ' => (refcount=1, is_ref=0) = ' life '
)

Now, when we add an array itself as an element of this array, things become interesting and the next example illustrates this. In the example we added the reference operator, or PHP will generate a copy.

Example #8 Adding the array itself as an element of it self

<?php
$a = Array (' one ');
$a [] =& $a;
Xdebug_debug_zval (' a ');
?>

The output of the above routine is similar to the following:

A: (refcount=2, is_ref=1) =array (
0 => (refcount=1, is_ref=0) = ' One ',
1 => (refcount=2, is_ref=1) = ...
)

You can see the array variable (a) and also the second element (1) of the array pointing to the variable container "RefCount" is 2. The "..." in the output above shows a recursive operation, which obviously means "..." to the original array.

Like just now, calling unset on a variable deletes the symbol, and the number of references in the variable container it points to is reduced by 1. So, if we call unset on the variable $a after executing the above code, then the variable $a and array element "1″ refers to the variable container with a reference number minus 1, from" 2″ to "1″. The following example shows:

Example #9 unsetting $a

(Refcount=1, is_ref=1) =array (
0 => (refcount=1, is_ref=0) = ' One ',
1 => (refcount=1, is_ref=1) = ...
)

Problem cleaning variable containers (Cleanup Problems)

The container cannot be cleared because the array element "1" still points to the array itself, although no longer has any symbol in the scope pointing to the struct (that is, the variable container). Because no other symbol points to it, the user has no way to clear the structure, resulting in a memory leak. Thankfully, PHP will clear the data structure at the end of the request, but it will cost a lot of memory before the PHP clears. This often happens if you want to implement an analytic algorithm, or do something else like a child element that points to its parent element. Of course, the same thing happens to the object, which is more likely to happen because the object is always implicitly referenced.

If the above happens only one or two times it doesn't matter, but if there are thousands of or even hundreds of thousands of times of memory leaks, this is obviously a big problem. In a long-running script, such as a daemon (deamons) that essentially does not end a request, or a large suite (sets) in a unit test, when unit tests are made on the template component of the EZ component library, the latter (refers to the large suite in the unit test) There will be problems. It will need to consume 2GB of memory, while the general test server does not have such a large memory space.

Traditionally, a reference-counting memory mechanism, like previous PHP, was unable to handle the circular reference memory leaks. However, 5.3.0 PHP uses the synchronization algorithm in the article» Reference counting system (Concurrent Cycle Collection in Reference counted) to handle this memory leak problem.

The complete description of the algorithm is somewhat beyond the scope of this section, and will only describe the underlying part. First, we need to establish some basic rules, and if a reference count increases, it will continue to be used, of course, no longer in the garbage. If the reference count is reduced to zero, the containing variable container is cleared (free). That is, the garbage cycle is generated only when the reference count is reduced to a value other than 0 (garbage cycle). Second, in a garbage cycle, find out which part is garbage by checking whether the reference count is minus 1 and checking which variable containers have a reference number of zero.

To avoid having to check garbage cycles that all reference counts may be reduced, this algorithm puts all possible roots (possible roots are zval variable containers) in the root buffer (which is marked with purple, known as suspected garbage), This ensures that each possible garbage root (possible garbage root) appears only once in the buffer. A garbage collection operation is performed on all different variable containers within the buffer only when the root buffer is full. See step a in the diagram.

In step B, simulate the deletion of each purple variable. When you simulate a deletion, you may reduce the number of normal variable references that are not purple "1″, and if a normal variable reference count becomes 0, do a mock deletion of the generic variable again. Each variable can only be simulated to delete once, after the simulated deletion is marked as gray (the original said to ensure that the same variable container will not be reduced two times "1″, wrong)."

In step C, simulate restoring each purple variable. The recovery is conditional, and the variable's reference count is greater than 0 o'clock to simulate recovery. Similarly, each variable can only be recovered once, after the recovery is marked black, basically is the inverse of step B. So the rest of the heap failed to recover is the deletion of the Blue node, in step D to traverse out really deleted.

The algorithm is simulated delete, simulated recovery, real deletion, all using a simple traversal can (most typical deep search traversal). The complexity is positively correlated with the number of nodes that perform the impersonation operation, not just the purple ones that are suspected of being garbage.

Now that you have a basic understanding of the algorithm, let's look back at how this integrates with PHP. By default, the PHP garbage collection mechanism is open, and then there is a php.ini setting that allows you to modify it: ZEND.ENABLE_GC.

When the garbage collection mechanism is open, the loop lookup algorithm described above is executed whenever the root buffer is full. The root buffer has a fixed size, can save 10,000 possible root, of course you can modify the PHP source file zend/zend_gc.c in the constant gc_root_buffer_max_entries, and then recompile PHP, to modify this 10,000 value. When the garbage collection mechanism shuts down, the loop lookup algorithm never executes, however, it is possible that the root will always exist in the root buffer, regardless of whether the garbage collection mechanism is activated in the configuration.

When the garbage collection mechanism closes, if the root buffer is full of possible roots, more likely roots will not be logged. The possible roots that are not recorded will not be parsed by this algorithm. If they are part of the cycle reference cycle, they will never be purged and lead to a memory leak.

Even if the garbage collection mechanism is not available, it is possible that the root is logged because the record may be faster than the root if the garbage collection mechanism is checked after each possible root is found. But the garbage collection and analysis mechanism itself takes a lot of time.

In addition to modifying the configuration zend.enable_gc, the garbage collection mechanism can be turned on and off by calling the Gc_enable () and gc_disable () functions separately. Calling these functions is the same as modifying a configuration entry to turn on or off the garbage collection mechanism. Cycle recycling can be enforced even when a possible root buffer is not full. You can call the Gc_collect_cycles () function to achieve this. This function returns the number of cycles that are recycled using this algorithm.

The reason to allow the garbage collection mechanism to be turned on and off and allow for autonomous initialization is that some part of your application may be highly time-sensitive. In this case, you may not want to use the garbage collection mechanism. Of course, shutting down the garbage collection mechanism for some part of your application is at risk of a possible memory leak because some of the possible roots may not be stored in a limited root buffer. Therefore, it might be wise to call the Gc_collect_cycles () function before you can invoke the Gc_disable () function to free up memory. Because this clears all possible roots that have been stored in the root buffer, and then when the garbage collection mechanism is turned off, it leaves an empty buffer to have more space to store the possible root.

Factors considered in terms of performance

In the previous section we have simply mentioned that the recovery may have subtle performance effects, but this is only when comparing PHP 5.2 with PHP 5.3. Although in PHP 5.2, the record may be a bit slower than the root of the total no record, and other modifications to PHP run-time in PHP 5.3 reduce this performance loss.

There are two main areas that have an impact on performance. The first is memory footprint savings, and the other is the increased execution time of the garbage collection mechanism when performing memory cleanup (run-time delay). We will look at these two areas.

Savings in memory footprint

First, the whole reason for implementing the garbage collection mechanism is to save memory footprint by cleaning up the variables referenced by the loop once the prerequisites are met. In PHP execution, a garbage collection is performed once the root buffer is full or the gc_collect_cycles () function is invoked. In the following illustration, the memory footprint of the following scripts in PHP 5.2 and PHP 5.3 is shown, excluding the basic memory that PHP itself uses when the script starts.

Example #1 Memory Usage Example

<?php
Class Foo
{
Public $var = ' 3.1415962654 ';
}
$baseMemory = Memory_get_usage ();
for ($i = 0; $i <= 100000; $i + +)
{
$a = new Foo;
$a->self = $a;
If ($i% 500 = 0)
{
echo sprintf ('%8d: ', $i), Memory_get_usage ()-$baseMemory, "n";
}
}
?>

In this very theoretical example, we create an object in which an attribute is set to refer back to the object itself. In the next iteration of the loop (iteration), a typical memory leak occurs when a variable in the script is replicated again. In this example, two variable containers are leaked (object containers and property containers), but only one possible root can be found: The variable that is unset. After 10,000 repetitions (which produces a total of 10,000 possible roots), the garbage collection mechanism is executed when the root buffer is full, and the potential root memory of those associations is freed. This is easy to see from the jagged memory footprint of PHP 5.3. After each execution of 10,000 repetitions, a garbage collection is performed and the associated reused reference variable is released. In this example, because the data structure of the leak is very simple, the garbage collection mechanism itself does not have to do much work. From this diagram, you can see that the maximum memory footprint of PHP 5.3 is approximately 9 Mb, while the memory footprint of PHP 5.2 increases.

Increased execution time (run-time slowdowns)

The second area of garbage collection that affects performance is the amount of time it takes to free up leaked memory. To see how much time it takes, we slightly changed the script above, there were more repetitions and deleted the memory footprint calculation in the loop, and the second script code was as follows:

Example #2 GC Performance influences

<?php
Class Foo
{
Public $var = ' 3.1415962654 ';
}
for ($i = 0; $i <= 1000000; $i + +)
{
$a = new Foo;
$a->self = $a;
}
Echo memory_get_peak_usage (), "n";
?>

We'll run this script two times, one at a time by configuring ZEND.ENABLE_GC to open the garbage collection mechanism, and the other time it shuts down.

Example #3 Running The above script

Time Php-dzend.enable_gc=0-dmemory_limit=-1-n example2.php
# and
Time Php-dzend.enable_gc=1-dmemory_limit=-1-n example2.php

On my machine, the first command lasts about 10.7 seconds, while the second command takes 11.4 seconds. Increased by 7% in time. However, the peak memory footprint was reduced by 98% in the execution of this script, down from 931Mb to 10Mb. This benchmark is not very scientific, or it does not represent data for real applications, but it does show the benefits of the garbage collection mechanism in terms of memory footprint. The good news is that for this script, when there are more loops referencing variables in the execution, the more memory is saved, the percentage increase per time is 7%.

Internal GC statistical information for PHP

Inside PHP, you can show more information about how the garbage collection mechanism works. But to display this information, you need to recompile PHP to make benchmark and data-collecting code available. You need to be in accordance with your will to run./configure, set the environment variable cflags to-dgc_bench=1. The following command string is the thing to do:

Example #4 recompiling PHP to enable GC benchmarking

Export Cflags=-dgc_bench=1
./config.nice
Make clean
Make

When you use the newly compiled PHP binaries to perform the above example code again, you will see the following information after PHP execution is completed:

Example #5 GC Statistics

GC Statistics
-------------
runs:110
collected:2072204
Root Buffer length:0
Root Buffer peak:10000

Possible Remove from Marked
Root Buffered Buffer Grey
--------  --------  -----------  ------
Zval 7175487 1491291 1241690 3611871
Zobj 28506264 1527980 677581 1025731

The main information is counted in the first block. You can see that the garbage collection mechanism has been run 110 times, and that in all 110 runs, over 2 million of the memory allocations have been released. As long as the garbage collection mechanism runs at least once, the root buffer peak (peak) is always 10000.

Conclusion

In general, the garbage collection mechanism in PHP can increase the time consuming only when the recycle algorithm does run. But in normal (smaller) scripts there should be no performance impact at all.

However, the memory savings will allow more of this script to run on your server at the same time as there is a cyclic recycle mechanism running in the usual script. Because the total memory used is not up to the upper limit.

This benefit is especially noticeable in long-running scripts such as long test suites or daemon scripts. At the same time, for»PHP-GTK applications, which typically run longer than web scripts, the new garbage collection mechanism should greatly alter the perception that memory leaks are difficult to resolve.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.