This article provides a detailed analysis of the foreach problem in php. For more information, see
Preface:
The foreach structure is introduced in php4, which is a simple way to traverse arrays. Compared with the traditional for loop, foreach can more easily obtain key-value pairs. Before php5, foreach can only be used as an array; after php5, foreach can also be used to traverse objects (see traversing objects ). This article only discusses how to traverse arrays.
Although foreach is simple, it may lead to some unexpected behaviors, especially when the code involves references.
The following lists several cases to help us further understand the nature of foreach.
Question 1:
The code is as follows:
$ Arr = array (1, 2, 3 );
Foreach ($ arr as $ k =>&$ v ){
$ V = $ v * 2;
}
// Now $ arr is array (2, 4, 6)
Foreach ($ arr as $ k =>$ v ){
Echo "$ k", "=>", "$ v ";
}
First, start with the simple code. if we try to run the above code, we will find that the final output is 0 => 2 1 => 4 2 => 4.
Why not 0 => 2 1 => 4 2 => 6?
In fact, we can think that the foreach ($ arr as $ k => $ v) structure implies the following operations, assign the current 'key' and current 'value' of the array to the variables $ k and $ v respectively. The specific expansion is shown in the following figure:
The code is as follows:
Foreach ($ arr as $ k =>$ v ){
// Two values are implicitly assigned before the user code is executed.
$ V = currentVal ();
$ K = currentKey ();
// Continue to run the user code
......
}
Based on the above theory, we will analyze the first foreach again:
Because $ v is a reference, $ v = & $ arr [0], $ v = $ v * 2 is equivalent to $ arr [0] * 2, therefore, $ arr is changed to 2, 2, and 3.
2nd cycles, $ v = & $ arr [1], $ arr becomes 2, 4, 3
3rd cycles, $ v = & $ arr [2], $ arr is changed to 2, 4, and 6
Then the code enters the second foreach:
1st cycles, the implicit operation $ v = $ arr [0] is triggered, because $ v is still referenced by $ arr [2, it is equivalent to $ arr [2] = $ arr [0], and $ arr is changed to 2, 4, and 2.
2nd cycles, $ v = $ arr [1], that is, $ arr [2] = $ arr [1], $ arr is changed to 2, 4
3rd cycles, $ v = $ arr [2], that is, $ arr [2] = $ arr [2], $ arr is changed to 2, 4
OK. the analysis is complete.
How can this problem be solved? There is a reminder in the php manual:
Warning: $ value reference of the last element of the array is retained after the foreach loop. We recommend that you use unset () to destroy it.
The code is as follows:
$ Arr = array (1, 2, 3 );
Foreach ($ arr as $ k =>&$ v ){
$ V = $ v * 2;
}
Unset ($ v );
Foreach ($ arr as $ k =>$ v ){
Echo "$ k", "=>", "$ v ";
}
// Output 0 => 2 1 => 4 2 => 6
From this issue, we can see that references are likely to be accompanied by side effects. If you do not want to cause changes to the array content due to unintentional modifications, it is best to unset these references in time.
Question 2:
The code is as follows:
$ Arr = array ('A', 'B', 'C ');
Foreach ($ arr as $ k =>$ v ){
Echo key ($ arr), "=>", current ($ arr );
}
// Print 1 => B 1 => B 1 => B
This problem is even more strange. According to the manual, the key and current are respectively the key values of the current element in the array.
So why is the key ($ arr) always 1 and the current ($ arr) always B?
Use vld to view the compiled opcode:
We can see from the ASSIGN command in line 1 that assigns array ('A', 'B', 'C') to $ arr.
Since $ arr is CV and array ('A', 'B', 'C') is TMP, The ASSIGN command finds the actually executed function as ZEND_ASSIGN_SPEC_CV_TMP_HANDLER. Note that CV is a variable cache added after PHP5.1. it uses an array to save zval **, when the cached variable is used again, you do not need to search for the active symbol table, but directly retrieve it from the CV array. because the array access speed is much faster than that of the hash table, the efficiency can be improved.
The code is as follows:
Static int ZEND_FASTCALL ZEND_ASSIGN_SPEC_CV_TMP_HANDLER (ZEND_OPCODE_HANDLER_ARGS)
{
Zend_op * opline = EX (opline );
Zend_free_op free_op2;
Zval * value = _ get_zval_ptr_tmp (& opline-> op2, EX (Ts), & free_op2 TSRMLS_CC );
// Create the $ arr ** pointer in the CV array.
Zval ** variable_ptr_ptr = _ get_zval_ptr_ptr_cv (& opline-> op1, EX (Ts), BP_VAR_W TSRMLS_CC );
If (IS_CV = IS_VAR &&! Variable_ptr_ptr ){
......
}
Else {
// Assign an array value to $ arr
Value = zend_assign_to_variable (variable_ptr_ptr, value, 1 TSRMLS_CC );
If (! RETURN_VALUE_UNUSED (& opline-> result )){
AI_SET_PTR (EX_T (opline-> result. u. var). var, value );
PZVAL_LOCK (value );
}
}
ZEND_VM_NEXT_OPCODE ();
}
After the ASSIGN command is complete, the zval ** pointer is added to the CV array, and the pointer points to the actual array, which indicates that $ arr has been cached by the CV.
Next, execute the array loop operation. let's look at the FE_RESET command. its corresponding execution function is
ZEND_FE_RESET_SPEC_CV_HANDLER:
The code is as follows:
Static int ZEND_FASTCALL ZEND_FE_RESET_SPEC_CV_HANDLER (ZEND_OPCODE_HANDLER_ARGS)
{
......
If (......) {
......
} Else {
// Obtain the pointer to array through the CV array
Array_ptr = _ get_zval_ptr_cv (& opline-> op1, EX (Ts), BP_VAR_R TSRMLS_CC );
......
}
......
// Save the pointer pointing to array to zend_execute_data-> Ts (Ts is used to store temp_variable during code execution)
AI_SET_PTR (EX_T (opline-> result. u. var). var, array_ptr );
PZVAL_LOCK (array_ptr );
If (iter ){
......
} Else if (fe_ht = HASH_OF (array_ptr ))! = NULL ){
// Reset the internal pointer of the array
Zend_hash_internal_pointer_reset (fe_ht );
If (ce ){
......
}
Is_empty = zend_hash_has_more_elements (fe_ht )! = SUCCESS;
// Set EX_T (opline-> result. u. var). fe. fe_pos to save the internal pointer of the array.
Zend_hash_get_pointer (fe_ht, & EX_T (opline-> result. u. var). fe. fe_pos );
} Else {
......
}
......
}
Here we mainly store two important pointers in zend_execute_data-> Ts:
• EX_T (opline-> result. u. var). var ---- pointer to array
• EX_T (opline-> result. u. var). fe. fe_pos ---- pointer to the element inside the array
After the FE_RESET command is executed, the actual memory is as follows:
Next, let's continue to check FE_FETCH. its corresponding execution function is ZEND_FE_FETCH_SPEC_VAR_HANDLER:
The code is as follows:
Static int ZEND_FASTCALL ZEND_FE_FETCH_SPEC_VAR_HANDLER (ZEND_OPCODE_HANDLER_ARGS)
{
Zend_op * opline = EX (opline );
// Note that the pointer is obtained from EX_T (opline-> op1.u. var). var. ptr
Zval * array = EX_T (opline-> op1.u. var). var. ptr;
......
Switch (zend_iterator_unwrap (array, & iter TSRMLS_CC )){
Default:
Case ZEND_ITER_INVALID:
......
Case ZEND_ITER_PLAIN_OBJECT :{
......
}
Case ZEND_ITER_PLAIN_ARRAY:
Fe_ht = HASH_OF (array );
// Note:
// The FE_RESET command saves the pointer of the element inside the array in EX_T (opline-> op1.u. var). fe. fe_pos
// Obtain the pointer here
Zend_hash_set_pointer (fe_ht, & EX_T (opline-> op1.u. var). fe. fe_pos );
// Obtain the element value
If (zend_hash_get_current_data (fe_ht, (void **) & value) = FAILURE ){
ZEND_VM_JMP (EX (op_array)-> opcodes + opline-> op2.u. opline_num );
}
If (use_key ){
Key_type = zend_hash_get_current_key_ex (fe_ht, & str_key, & str_key_len, & int_key, 1, NULL );
}
// Move the internal pointer of the array to the next element.
Zend_hash_move_forward (fe_ht );
// Save the moved pointer to EX_T (opline-> op1.u. var). fe. fe_pos
Zend_hash_get_pointer (fe_ht, & EX_T (opline-> op1.u. var). fe. fe_pos );
Break;
Case ZEND_ITER_OBJECT:
......
}
......
}
According to the implementation of FE_FETCH, we generally understand what foreach ($ arr as $ k => $ v) does. It obtains the array element based on the zend_execute_data-> Ts pointer. after obtaining the element, it moves the pointer to the next position and saves it again.
To put it simply, because the internal pointer of the array has been moved to the second element in FE_FETCH in the first loop, when the foreach internally calls key ($ arr) and current ($ arr, actually, we get 1 and 'B '.
Why does it output 3 times 1 => B?
Let's continue to look at the SEND_REF commands for lines 9th and 13th, which means to push the $ arr parameter to the stack. Then, the DO_FCALL command is generally used to call the key and current functions. PHP is not compiled with the local machine code. Therefore, php uses the opcode command to simulate the actual CPU and memory usage.
Check SEND_REF in the PHP source code:
The code is as follows:
Static int ZEND_FASTCALL ZEND_SEND_REF_SPEC_CV_HANDLER (ZEND_OPCODE_HANDLER_ARGS)
{
......
// Obtain the $ arr Pointer from the CV.
Varptr_ptr = _ get_zval_ptr_ptr_cv (& opline-> op1, EX (Ts), BP_VAR_W TSRMLS_CC );
......
// Variable separation. here, an array is copied again for the key function.
SEPARATE_ZVAL_TO_MAKE_IS_REF (varptr_ptr );
Varptr = * varptr_ptr;
Z_ADDREF_P (varptr );
// Press the stack
Zend_vm_stack_push (varptr TSRMLS_CC );
ZEND_VM_NEXT_OPCODE ();
}
In the above code, SEPARATE_ZVAL_TO_MAKE_IS_REF is a macro:
The code is as follows:
# Define SEPARATE_ZVAL_TO_MAKE_IS_REF (ppzv )\
If (! PZVAL_IS_REF (* ppzv )){\
SEPARATE_ZVAL (ppzv );\
Z_SET_ISREF_PP (ppzv ));\
}
SEPARATE_ZVAL_TO_MAKE_IS_REF is mainly used. if the variable is not a reference, copy a new one in the memory. In this example, it copies array ('A', 'B', 'C. Therefore, the memory after variable separation is:
Note: After the variable separation is complete, the pointer in the CV array points to the New copied data, and the pointer in zend_execute_data-> Ts can still get the old data.
The following loop will not be described in detail. in combination:
• The foreach structure uses the blue array below to traverse a, B, and c in sequence.
• Key and current use the yellow array above, and its internal pointer always points to B
Now we understand why key and current always return the second element of array. since no external code acts on the copied array, its internal pointer will never move.
Question 3:
The code is as follows:
$ Arr = array ('A', 'B', 'C ');
Foreach ($ arr as $ k =>&$ v ){
Echo key ($ arr), '=>', current ($ arr );
} // Print 1 => B 2 => c =>
There is only one difference between this question and Question 2: foreach in this question uses references. View this question with VLD and find that it is the same as the opcode compiled by issue 2 code. Therefore, we use the issue 2 tracking method to gradually view the implementation of the opcode.
First, foreach will call FE_RESET:
The code is as follows:
Static int ZEND_FASTCALL ZEND_FE_RESET_SPEC_CV_HANDLER (ZEND_OPCODE_HANDLER_ARGS)
{
......
If (opline-> extended_value & ZEND_FE_RESET_VARIABLE ){
// Obtain the variable from the CV
Array_ptr_ptr = _ get_zval_ptr_ptr_cv (& opline-> op1, EX (Ts), BP_VAR_R TSRMLS_CC );
If (array_ptr_ptr = NULL | array_ptr_ptr = & EG (uninitialized_zval_ptr )){
......
}
Else if (Z_TYPE_PP (array_ptr_ptr) = IS_OBJECT ){
......
}
Else {
// Array traversal
If (Z_TYPE_PP (array_ptr_ptr) = IS_ARRAY ){
SEPARATE_ZVAL_IF_NOT_REF (array_ptr_ptr );
If (opline-> extended_value & ZEND_FE_FETCH_BYREF ){
// Set zval of the saved array to is_ref
Z_SET_ISREF_PP (array_ptr_ptr );
}
}
Array_ptr = * array_ptr_ptr;
Z_ADDREF_P (array_ptr );
}
} Else {
......
}
......
}
In Question 2, we have analyzed some FE_RESET implementations. Note that in this example, the value obtained by foreach is referenced. Therefore, during execution, FE_RESET will enter another branch that is different from the previous one.
Finally, FE_RESET sets the is_ref of the array to true. at this time, there is only one array of data in the memory.
Next, we will analyze SEND_REF:
The code is as follows:
Static int ZEND_FASTCALL ZEND_SEND_REF_SPEC_CV_HANDLER (ZEND_OPCODE_HANDLER_ARGS)
{
......
// Obtain the $ arr Pointer from the CV.
Varptr_ptr = _ get_zval_ptr_ptr_cv (& opline-> op1, EX (Ts), BP_VAR_W TSRMLS_CC );
......
// Variable separation. because the variable in the CV itself is a reference, a new array is not copied here.
SEPARATE_ZVAL_TO_MAKE_IS_REF (varptr_ptr );
Varptr = * varptr_ptr;
Z_ADDREF_P (varptr );
// Press the stack
Zend_vm_stack_push (varptr TSRMLS_CC );
ZEND_VM_NEXT_OPCODE ();
}
Macro SEPARATE_ZVAL_TO_MAKE_IS_REF only separates variables where is_ref = false. Since is_ref = true has been set for array, it will not be copied. In other words, the memory still has only one array data.
Explains why the first two cycles output 1 => B 2 => C. In the first loop FE_FETCH, move the pointer forward.
The code is as follows:
ZEND_API int zend_hash_move_forward_ex (HashTable * ht, HashPosition * pos)
{
HashPosition * current = pos? Pos: & ht-> pInternalPointer;
IS_CONSISTENT (ht );
If (* current ){
* Current = (* current)-> pListNext;
Return SUCCESS;
} Else
Return FAILURE;
}
Since the internal pointer has pointed to the last element of the array, moving forward will point to NULL. After pointing the internal pointer to NULL, we call key and current for the array, respectively, return NULL and false, indicating that the call fails. at this time, the echo does not contain any characters.
Question 4:
The code is as follows:
$ Arr = array (1, 2, 3 );
$ Tmp = $ arr;
Foreach ($ tmp as $ k =>&$ v ){
$ V * = 2;
}
Var_dump ($ arr, $ tmp); // print what?
This question has little to do with foreach, but since foreach is involved, let's discuss it together :)
The code first creates an array $ arr, and then assigns the array to $ tmp. in the following foreach loop, modifying $ v will apply to the array $ tmp, but it does not apply to $ arr.
Why?
This is because in php, the value assignment operation copies the value of a variable to another variable, so modifying one does not affect the other.
Topic: this is not applicable to the object type. from PHP5, objects are always assigned values by reference by default. for example:
The code is as follows:
Class {
Public $ foo = 1;
}
$ A1 = $ a2 = new;
$ A1-> foo = 100;
Echo $ a2-> foo; // output 100. $ a1 and $ a2 are actually references to the same object.
Return to the code in the question. now we can confirm that $ tmp = $ arr is actually a value copy. The entire $ arr array will be copied to $ tmp. Theoretically, after the value assignment statement is executed, there will be two identical arrays in the memory.
Maybe some colleagues may wonder if the array is large, isn't this kind of operation very slow?
Fortunately, php has a smarter solution. In fact, after $ tmp = $ arr is executed, there is only one array in the memory. View the zend_assign_to_variable implementation in the php source code (from php5.3.26 ):
The code is as follows:
Static inline zval * zend_assign_to_variable (zval ** variable_ptr_ptr, zval * value, int is_tmp_var TSRMLS_DC)
{
Zval * variable_ptr = * variable_ptr_ptr;
Zval garbage;
......
// The Left value is of the object type.
If (Z_TYPE_P (variable_ptr) = IS_OBJECT & Z_OBJ_HANDLER_P (variable_ptr, set )){
......
}
// When the left value is a reference
If (PZVAL_IS_REF (variable_ptr )){
......
} Else {
// Refcount _ gc = 1 on the left
If (Z_DELREF_P (variable_ptr) = 0 ){
......
} Else {
GC_ZVAL_CHECK_POSSIBLE_ROOT (* variable_ptr_ptr );
// Non-temporary variables
If (! Is_tmp_var ){
If (PZVAL_IS_REF (value) & Z_REFCOUNT_P (value)> 0 ){
ALLOC_ZVAL (variable_ptr );
* Variable_ptr_ptr = variable_ptr;
* Variable_ptr = * value;
Z_SET_REFCOUNT_P (variable_ptr, 1 );
Zval_copy_ctor (variable_ptr );
} Else {
// $ Tmp = $ arr will run here,
// Value is the pointer to the actual array data in $ arr, and variable_ptr_ptr is the pointer to the data pointer in $ tmp.
// Only the copy pointer does not actually copy the actual array
* Variable_ptr_ptr = value;
// Refcount _ gc value of value + 1. in this example, refcount _ gc is 1, and Z_ADDREF_P is 2.
Z_ADDREF_P (value );
}
} Else {
......
}
}
Z_UNSET_ISREF_PP (variable_ptr_ptr );
}
Return * variable_ptr_ptr;
}
It can be seen that $ tmp = $ arr is essentially copying the array pointer, and then automatically adding the array refcount to 1. the memory at this time is displayed in the chart, and there is still only one array:
Since there is only one array, why does $ arr remain unchanged when $ tmp is modified in the foreach loop?
Continue to check the ZEND_FE_RESET_SPEC_CV_HANDLER function in the PHP source code. this is an opcode handler whose OPCODE is FE_RESET. Before foreach starts, this function points the internal pointer of the array to its first element.
The code is as follows:
Static int ZEND_FASTCALL ZEND_FE_RESET_SPEC_CV_HANDLER (ZEND_OPCODE_HANDLER_ARGS)
{
Zend_op * opline = EX (opline );
Zval * array_ptr, ** array_ptr_ptr;
HashTable * fe_ht;
Zend_object_iterator * iter = NULL;
Zend_class_entry * ce = NULL;
Zend_bool is_empty = 0;
// Perform FE_RESET on the variable
If (opline-> extended_value & ZEND_FE_RESET_VARIABLE ){
Array_ptr_ptr = _ get_zval_ptr_ptr_cv (& opline-> op1, EX (Ts), BP_VAR_R TSRMLS_CC );
If (array_ptr_ptr = NULL | array_ptr_ptr = & EG (uninitialized_zval_ptr )){
......
}
// Foreach an object
Else if (Z_TYPE_PP (array_ptr_ptr) = IS_OBJECT ){
......
}
Else {
// Enter the branch at this meeting
If (Z_TYPE_PP (array_ptr_ptr) = IS_ARRAY ){
// Note the SEPARATE_ZVAL_IF_NOT_REF
// It will copy an array again
// Truly separates $ tmp and $ arr into two arrays in the memory.
SEPARATE_ZVAL_IF_NOT_REF (array_ptr_ptr );
If (opline-> extended_value & ZEND_FE_FETCH_BYREF ){
Z_SET_ISREF_PP (array_ptr_ptr );
}
}
Array_ptr = * array_ptr_ptr;
Z_ADDREF_P (array_ptr );
}
} Else {
......
}
// Reset the internal pointer of the array
......
}
From the code, we can see that the true execution of variable separation is not when the value assignment statement is executed, but when the variable is used, this is also the implementation of the Copy On Write mechanism in PHP.
After FE_RESET, the memory changes as follows:
Explains why foreach does not affect the original $ arr. For changes in ref_count and is_ref, you can read the specific implementation of handler and ZEND_SWITCH_FREE_SPEC_VAR_HANDLER (both in php-src/zend/zend_vm_execute.h) in detail :)