Evolution of the garbage collection algorithm in PHP5. PHP is a managed language. in PHP programming, programmers do not need to manually allocate and release memory resources (except for PHP or Zend extensions written in C ), this means that PHP implements spam. PHP is a hosted language, in PHP programming, programmers do not need to manually allocate and release memory resources (except for PHP or Zend extensions written in C ), this means that PHP implements the Garbage Collection mechanism (Garbage Collection ). Now, if you go to the PHP official website (php.net), we can see that the two branch versions of PHP5, PHP5.2 and PHP5.3, are updated separately, because many projects still use PHP of version 5.2, the 5.3 version is not fully compatible with 5.2. PHP5.3 has made many improvements based on PHP5.2, and the garbage collection algorithm is a big change. This article will discuss the garbage collection mechanism of PHP5.2 and PHP5.3 respectively, and discuss the impact of this evolution and improvement on Programming PHP and the issues that need attention.
Internal representation of PHP variables and associated memory objects
Garbage collection is essentially an operation on variables and their associated memory objects. so before discussing the garbage collection mechanism of PHP, first, we will briefly introduce the internal representation of variables and their memory objects in PHP (the representation in its C source code ).
In the official PHP documentation, variables in PHP are divided into two types: scalar type and complex type. Scalar types include Boolean, integer, floating point, and string types. complex types include arrays, objects, and resources. a NULL type is special. it is not classified into any type, but a single type.
All these types are represented by a zval structure in PHP. in PHP source code, the structure is named "_ zval_struct ". Zval is defined in the "Zend/zend. h" file of the PHP source code. The following is an excerpt from the relevant code.
- typedef union _zvalue_value {
- long lval; /* long value */
- double dval; /* double value */
- struct {
- char *val;
- int len;
- } str;
- HashTable *ht; /* hash table value */
- zend_object_value obj;
- } zvalue_value;
-
- struct _zval_struct {
- /* Variable information */
- zvalue_value value;
- /* value */
- zend_uint refcount__gc;
- zend_uchar type; /* active type */
- zend_uchar is_ref__gc;
- };
The consortium "_ zvalue_value" is used to represent the values of all variables in PHP. union is used here because a zval can only represent one type of variable at a time. We can see that _ zvalue_value contains only five fields, but PHP calculates that NULL has eight data types. how does PHP internally use five fields to represent eight types? This is a clever place in PHP design. it achieves the goal of reducing fields by reusing fields. For example, in PHP, Boolean, integer, and resource (as long as the resource identifier is stored) are stored through the lval field; dval is used to store floating point type; str stores strings; ht storage array (note that the array in PHP is actually a hash table), while obj stores the object type. If all fields are set to 0 or NULL, it indicates NULL in PHP, in this way, 8 types of values are stored with 5 fields.
The type of value in zval is determined by the type in "_ zval_struct. _ Zval_struct is the specific implementation of zval in C language. each zval represents a variable memory object. In addition to value and type, we can see that _ zval_struct has two fields refcount _ gc and is_ref _ gc. from its suffix, we can conclude that these two types are related to garbage collection. Yes, PHP's garbage collection relies on these two fields. Here, refcount _ gc indicates that several variables currently reference this zval, while is_ref _ gc indicates whether the current zval is referenced by reference. This sounds very bad, this is related to the "Write-On-Copy" mechanism of zval in PHP. This topic is not the focus of this article, so we will not detail it here, you only need to remember the role of the refcount _ gc field.
Garbage collection algorithm in PHP5.2 -- Reference Counting
The memory recycling algorithm used in PHP5.2 is the famous Reference Counting. the Chinese translation of this algorithm is called "Reference Counting". Its idea is very intuitive and concise: allocate a counter for each memory object, when a memory object is created, the counter is initialized to 1 (so there is always a variable referencing this object). after each new variable references this memory object, the counter is incremented by 1, each time you reduce a variable that references this memory object, the counter is reduced by 1. when the garbage collection mechanism operates, all the memory objects whose counters are 0 are destroyed and their occupied memory is recycled. In PHP, the memory object is zval, and the counter is refcount _ gc.
For example, the following PHP code demonstrates how the PHP5.2 counter works (the counter value is obtained through xdebug ):
-
-
- $ Val1 = 100; // zval (val1). refcount_gc = 1;
- $ Val2 = $ val1; // zval (val1 ). refcount_gc = 2, zval (val2 ). refcount_gc = 2 (because it is Write on copy, the current val2 and val1 reference a zval)
- $ Val2 = 200; // zval (val1). refcount_gc = 1, zval (val2). refcount_gc = 1 (here val2 creates a zval)
- Unset ($ val1); // zval (val1). refcount_gc = 0 (the zval referenced by $ val1 is no longer available and will be recycled by GC)
-
- ?>
Reference Counting is simple, intuitive, and easy to implement, but it has a fatal defect, that is, it may easily cause memory leakage. Many of my friends may have realized that if circular references exist, Reference Counting may cause memory leakage. For example, the following code:
-
-
- $a = array();
- $a[] = & $a;
- unset($a);
-
- ?>
This code first creates array a, and then points the first element of a to a by reference. then, the refcount of zval of a is changed to 2, and then we destroy the variable, at this time, the refcount of zval initially pointed by a is 1, but we can no longer operate on it, because it forms a circular self-reference, as shown in:
Compile (except for PHP or Zend extensions written in C), which means PHP implements garbage...