Reference and counting of php variables internal reference and counting of rule variables
Inside the engine, a PHP variable is saved in the "zval" structure, which contains the type and value information of the variable. this is stored in the internal storage of the variable in the previous article: the value and type have already been introduced. This structure also has two other fields, one being "is_ref" (This field is is_ref _ gc in version 5.3.2 ), this field is a Boolean value used to identify whether a variable is a reference. through this field, the PHP engine can distinguish between common variables and reference variables. In PHP code, you can use the & operator symbol to create a reference variable. the is_ref field of zval inside the reference variable is 1. Zval also has another field refcount (this field is refcount _ gc in version 5.3.2). This field is a counter that indicates how many variable names point to this zval container, when this field is 0, it indicates that no variable points to this zval, then zval can be released, which is an internal optimization of the engine memory. Consider the following code:
There are two variables $ a and $ B in the code. $ a is assigned to $ B through normal assignment, so that the value of $ B and $ a are equal, modifications to $ B do not affect $ a. In this code, if $ a and $ B correspond to two different zval values, this is obviously a waste of memory, and PHP developers will not let this happen. Therefore, $ a and $ B actually point to the same zval. The zval type is STRING and the value is "Hello world". there are two variables $ a and $ B pointing to it, so its refcount is 2, because it is a normal value assignment, therefore, the is_ref field is 0. This saves the memory overhead.
After $ a = "Hello world" is executed, the zval information corresponding to $ a is: a: (refcount = 1, is_ref = 0) = "Hello world"
After $ B = $ a is executed, the zval information corresponding to $ a is: a: (refcount = 2, is_ref = 0) = "Hello world"
Modify the previous code as follows:
In this way, $ a is assigned to $ B by referencing and assigning values.
After $ a = "Hello world" is executed, the zval information corresponding to $ a is: a: (refcount = 1, is_ref = 0) = "Hello world"
But after $ B = & $ a is executed, the zval information corresponding to $ a is: a: (refcount = 2, is_ref = 1) = "Hello world"
We can see that the is_ref field is set to 1, so the zval corresponding to $ a and $ B is a reference. In this way, we have a basic understanding of the reference and counting of variables in the engine. The following describes the separation of variables.
Variable separation copy on write
Consider the first code section, assign $ a to $ B in a common way, and the two variables inside still point to the same zval, at this time, if we change the value of $ B to "new string", the value of $ a is still "Hello world ":
$ A and $ B clearly point to the same zval. Why does $ B remain unchanged? this is the copy on write technology, simply put, when you assign a value to $ B again, $ B is separated from the previous zval. After separation, $ a and $ B point to different zval.
A well-known application of the replication technology during writing is in the unix operating system kernel. when a process calls the fork function to generate a sub-process, the parent and child processes have the same address space content. in earlier versions of the system, the child process copies all the content in the address space of the parent process during fork, for large programs, this process may have a great deal of overhead. even worse, many processes directly call exec to execute another program in the child process after fork, in this way, it took a lot of time to replace the address space copied by the parent process before it could be touched. this is obviously a great waste of resources, therefore, in later systems, the write-time replication technology was used. After fork, the address space of the child process simply points to the address space of the parent process, only when the sub-process needs to write the content in the address space will the sub-process be separated separately (usually in the unit of memory pages, in this way, it does not matter if the child process calls the exec function immediately, because it does not need to copy the content from the address space of the parent process, which saves the memory and improves the speed.
After $ B is separated from the zval to which $ a points, the refcount of zval is reduced by 1, which is changed from 2 to 1, indicating that this zval has another variable pointing to it, $. The $ B variable points to a new zval. the refcount of the new zval is 1 and the value is the string "new string". The approximate process is as follows:
$ A = "Hello world" // a: (refcount = 1, is_ref = 0) = "Hello world" $ B = $ a // a, B: (refcount = 2, is_ref = 0) = "Hello world" $ B = "new string" // a: (refcount = 1, is_ref = 0) = "Hello world" B: (refcount = 1, is_ref = 0) = "new string" (a split operation occurs)
This separation logic can be described as follows: a general value assignment operation is performed on a general variable a (isref = 0). If the count refcount of zval pointed by a is greater than 1, a new zval needs to be re-allocated for a, and the previous zval count refcount is reduced by 1.
The above is a normal value assignment. if it is a reference value assignment, let's look at this change process:
$a = "Hello world" //a: (refcount=1, is_ref=0)="Hello world"$b = &$a //a,b: (refcount=2, is_ref=1)="Hello world"$b = "new string" //a,b: (refcount=2, is_ref=1)="new string"
As you can see, assigning values to a zval of the reference type will not be separated. In fact, when we generate another referenced variable, a separation operation may occur, but the timing is somewhat different:
In the case of normal assignment, the separation operation takes place at $ B = "new string", that is, the zval separation operation is performed only when a new value is assigned to the variable.
When a value is referenced, the separation operation may take place at $ B = & $ a, that is, when the referenced variable is generated.
In case 1, we will not explain much. in case 2, we emphasize that separation may occur. in the previous code example, whether the separation is related to the refcount of zval Currently pointed to by $, when $ B = & $ a in the code, the refcount of zval pointed to by $ a is 1. in this case, the split operation is not required, but if refcount = 2, then a zval should be separated. For example, the following code:
When performing the reference assignment, $ a points to the zval refcount = 2, because $ a and $ c both point to this zval, so when $ B = & $, you need to perform a separation operation. this separation operation generates a zval with ref = 1 and the count is 2. because the two variables $ a and $ B point to the separated zval, the original zval refcount is reduced by 1, so only $ c points to zval1 with a value of "Hello world" and ref = 0, $ a and $ B point to zval2 with a value of "Hello world" and ref = 1. In this way, we operate zval1 for $ c and zval2 for $ a and $ B, so as to conform to the referenced features.
This process is roughly as follows:
$ A = "Hello world"; // a: (refcount = 1, is_ref = 0) = "Hello world" $ c = $ a; // a, c: (refcount = 2, is_ref = 0) = "Hello world" $ B = & $ a; // c: (refcount = 1, is_ref = 0) = "Hello world" a, B: (refcount = 2, is_ref = 1) = "Hello world" (separated operation) $ B = "new string"; // c: (refcount = 1, is_ref = 0) = "Hello world" a, B: (refcount = 2, is_ref = 1) = "new string"
Imagine what will happen if this separation is not performed? If no separation is performed, $ a, $ B, and $ c all point to the same zval. modifications to $ B will also affect $ c, this is obviously not in line with the PHP language features.
This separation logic can be expressed as: when a reference to a common variable a (isref = 0) is assigned to another variable B, if the refcount of a is greater than 1, perform a separation operation on a. After the separation, the isref of zval is equal to 1, and refcount is equal to 2.
With the above knowledge and separation logic, readers can easily analyze other situations. For example, if you assign a reference to variable a (isref = 1) to variable B, you need to reduce the refcount of zval pointed to before B by 1, then point B to zval of a, and add refcount of zval of a to 1 without any separation operation.
Combining these theories with actual code makes it easier for you to understand this process.
Functions of unset
Unset () is not a function, but a language structure. you can see the difference by viewing the compiled opcode. unset does not correspond to the opcode called by a function. So what did unset do? You can see the relevant content in the handler of the opcode corresponding to the unset. the main operation is to delete the symbols in the parameters from the current symbol table, for example, executing unset ($ a) in the global code ), the a symbol is deleted from the global symbol table. A global symbol table is a hash table. when this table is created, it provides an destructor for the items in the table. when we delete a from the symbol table, this destructor will be called for the item pointed to by symbol a (here is the zval pointer). The main function of this destructor is to reduce the refcount of zval corresponding to a by 1, if the refcount is 0, release the zval. Therefore, when we call unset, the memory space occupied by the variable may not be released. zval will be released only when the zval corresponding to this variable has no other variable pointing to it, otherwise, only the refcount minus 1 operation is performed.