This article describes how to deeply parse the foreach statement in PHP to control the array loop, for more information, see foreach. it is a commonly used control statement in PHP for array loop.
Because of its convenience and ease of use, it naturally hides complicated implementation methods on the backend (transparent to users)
Today, we will analyze and analyze how foreach can traverse arrays (objects.
We know that PHP is a scripting language. that is to say, the PHP code written by the user will eventually be interpreted and executed by the PHP interpreter,
In particular, for PHP, all PHP code written by users will be translated into PHP virtual machine ZE virtual commands (OPCODES) for execution, regardless of the details, that is, any PHP script we write will eventually be translated into a single command, and then executed by the corresponding C function according to the command.
So what will foreach be translated?
foreach($arr as $key => $val){ echo $key . '=>' . $val . "\n";}
In the lexical analysis phase, foreach is recognized as a TOKEN: T_FOREACH,
In the syntax analysis phase, rules will be:
Unticked_statement: // statement not bound to ticks // omitted | T_FOREACH '('variable T_AS {zend_do_foreach_begin (& $1, & $2, & $3, & $4, 1 TSRMLS_CC);} foreach_variable foreach_optional_arg ')' {zend_do_foreach_cont (& $1, & $2, & $4, & $6, & $7 TSRMLS_CC );} foreach_statement {zend_do_foreach_end (& $1, & $4 TSRMLS_CC) ;}| T_FOREACH '('expr_without_variable T_AS {values (& $1, & $2, & $3, & $4, 0 TSRMLS_CC);} variable foreach_optional_arg ')' {zend_check_writable_variant (& $6); zend_do_foreach_cont (& $1, & $2, & $4, & $6, & $7 TSRMLS_CC);} foreach_statement {zend_do_foreach_end (& $1, & $4 TSRMLS_CC);} // omitted;
By carefully analyzing this syntax rule, we can find that:
foreach($arr as $key => $val){echo $key . ‘=>' . $val .””;}
Will be analyzed as follows:
T_FOREACH '(' variable T_AS { zend_do_foreach_begin('foreach', '(', $arr, 'as', 1 TSRMLS_CC); } foreach_variable foreach_optional_arg(T_DOUBLE_ARROW foreach_variable) ')' { zend_do_foreach_cont('foreach', '(', 'as', $key, $val TSRMLS_CC); } foreach_satement {zend_do_foreach_end('foreach', 'as');}
Then, let's take a look at foreach_statement:
It is actually a code block, reflecting our echo $ key. '=>'. $ val ."";
T_ECHO expr;
Obviously, the core of implementing foreach is the following three functions:
- Zend_do_foreach_begin
- Zend_do_foreach_cont
- Zend_do_foreach_end
Zend_do_foreach_begin (the code is too long, and the pseudo code is directly written) mainly:
1. record the current number of opline rows (recorded for future redirects)
2. RESET the array (internal pointer pointing to the first element)
3. get the temporary variable ($ val)
4. set the OPCODE FE_FETCH to get the variable, and save the temporary variable in step 1.
4. record the number of rows for obtaining the variable OPCODES
For zend_do_foreach_cont:
1. determine whether to reference according to the u. EA. type of foreach_variable.
2. adjust the FE_FETCH method generated in zend_do_foreach_begin based on whether the request is referenced.
3. initialize the loop based on the number of OPCODES rows recorded in zend_do_foreach_begin (mainly processing the loop inside the Loop: do_begin_loop)
Last zend_do_foreach_end:
1. set ZEND_JMP OPCODES according to the number of rows recorded in zend_do_foreach_begin
2. set the next opline of the loop body based on the current number of rows to jump out of the loop.
3. end the loop (processing cycle: do_end_loop)
4. clear temporary variables
Of course, the foreach_satement statement code will be filled in between zend_do_foreach_cont and zend_do_foreach_end in the syntax analysis phase.
In this way, the foreach OPCODES line is implemented.
For example, for the instance code starting with us, the final generated OPCODES is:
filename: /home/huixinchen/foreach.phpfunction name: (null)number of ops: 17compiled vars: !0 = $arr, !1 = $key, !2 = $valline # op fetch ext return operands------------------------------------------------------------------------------- 2 0 SEND_VAL 1 1 SEND_VAL 100 2 DO_FCALL 2 'range' 3 ASSIGN !0, $0 3 4 FE_RESET $2 !0, ->14 5 FE_FETCH $3 $2, ->14 6 ZEND_OP_DATA ~5 7 ASSIGN !2, $3 8 ASSIGN !1, ~5 4 9 CONCAT ~7 !1, '-' 10 CONCAT ~8 ~7, !2 11 CONCAT ~9 ~8, '%0A' 12 ECHO ~9 5 13 JMP ->5 14 SWITCH_FREE $2 7 15 RETURN 1 16* ZEND_HANDLE_EXCEPTION
We have noticed that the op2 operand of FE_FETCH is 14, that is, the opline after JMP. that is to say, if FE_FETCH fails after obtaining the last array element, it will jump to row 14th opline, thus, the end of the loop is achieved.
The op1 operations of the 15 lines of opline point to FE_FETCH, that is, to jump to the 5th line of opline unconditionally, thus implementing a loop.
Appendix:
Void token (znode * foreach_token, znode * token, znode * array, znode * as_token, int variable token) {zend_op * opline; zend_bool is_variable; zend_bool push_container = 0; zend_op handle; if (variable) {// is an anonymous array if (zend_is_function_or_method_call (array) {// is the function returned value is_variable = 0;} else {is_variable = 1 ;} /* use parentheses to record the number of opline rows in FE_RESET */open_brackets_toke N-> u. opline_num = get_next_op_number (CG (active_op_array); zend_do_end_variable_parse (BP_VAR_W, 0 TSRMLS_CC); // Obtain the array/object and if (CG (active_op_array) -> last> 0 & CG (active_op_array)-> opcodes [CG (active_op_array)-> last-1]. opcode = ZEND_FETCH_OBJ_W) {/* Only lock the container if we are fetching from a real container and not $ this */if (CG (active_op_array)-> opcodes [CG (act) Ive_op_array)-> last-1]. op1.op _ type = IS_VAR) {CG (active_op_array)-> opcodes [CG (active_op_array)-> last-1]. extended_value | = ZEND_FETCH_ADD_LOCK; push_container = 1 ;}} else {is_variable = 0; open_brackets_token-> u. opline_num = get_next_op_number (CG (active_op_array);} foreach_token-> u. opline_num = get_next_op_number (CG (active_op_array); // Reset Opline number opline = get_next_op (CG (activ E_op_array) TSRMLS_CC); // generates the Reset array Opcode opline-> opcode = ZEND_FE_RESET; opline-> result. op_type = IS_VAR; opline-> result. u. var = get_temporary_variable (CG (active_op_array); opline-> op1 = * array; SET_UNUSED (opline-> op2); opline-> extended_value = is_variable? Failed: 0; dummy_opline.result = opline-> result; if (push_container) {dummy_opline.op1 = CG (active_op_array)-> opcodes [CG (active_op_array)-> last-2]. op1;} else {znode tmp; tmp. op_type = IS_UNUSED; token = tmp;} zend_stack_push (& CG (foreach_copy_stack), (void *) & dummy_opline, sizeof (zend_op); as_token-> u. opline_num = get_next_op_number (CG (active_op_array); // record the starting point of the loop Opline = get_next_op (CG (active_op_array) TSRMLS_CC); opline-> opcode = ZEND_FE_FETCH; opline-> result. op_type = IS_VAR; opline-> result. u. var = get_temporary_variable (CG (active_op_array); opline-> op1 = dummy_opline.result; // array operated opline-> extended_value = 0; SET_UNUSED (opline-> op2 ); opline = get_next_op (CG (active_op_array) TSRMLS_CC); opline-> opcode = ZEND_OP_DATA; // The affiliated operand when the key is used. when the foreach does not contain the key, it becomes abnormal. Slightly SET_UNUSED (opline-> op1); SET_UNUSED (opline-> op2); SET_UNUSED (opline-> result);} void forward (znode * foreach_token, const znode * handle, const znode * as_token, znode * value, znode * key TSRMLS_DC) {zend_op * opline; znode dummy, value_node; zend_bool limit = 0; opline = & CG (active_op_array) -> opcodes [as_token-> u. opline_num]; // get FE_FETCH Opline if (key-> op_type! = IS_UNUSED) {znode * tmp; // exchange key and val tmp = key; key = value; value = tmp; opline-> extended_value | = ZEND_FE_FETCH_WITH_KEY; // indicates that both key and val} if (key-> op_type! = IS_UNUSED) & (key-> u. EA. type & ZEND_PARSED_REFERENCE_VARIABLE) {// The key cannot be referenced to obtain zend_error (E_COMPILE_ERROR, "Key element cannot be a reference");} if (value-> u. EA. type & ZEND_PARSED_REFERENCE_VARIABLE) {// obtain the value assign_by_ref = 1 by reference; if (! (Opline-1)-> extended_value) {// Determine whether the array is an anonymous array zend_error (E_COMPILE_ERROR, "Cannot create references to elements of a temporary array expression");} opline-> extended_value | = ZEND_FE_FETCH_BYREF; // specify to obtain CG (active_op_array) by reference) -> opcodes [foreach_token-> u. opline_num]. extended_value | = ZEND_FE_RESET_REFERENCE; // reset the original array} else {zend_op * foreach_copy; zend_op * fetch = & CG (acti Ve_op_array)-> opcodes [foreach_token-> u. opline_num]; zend_op * end = & CG (active_op_array)-> opcodes [open_brackets_token-> u. opline_num];/* Change "write context" into "read context" */fetch-> extended_value = 0;/* reset ZEND_FE_RESET_VARIABLE */while (fetch! = End) {-- fetch; if (fetch-> opcode = ZEND_FETCH_DIM_W & fetch-> op2.op _ type = IS_UNUSED) {zend_error (E_COMPILE_ERROR, "Cannot use [] for reading");} fetch-> opcode-= 3; /* FETCH_W-> FETCH_R */}/* prevent double SWITCH_FREE */zend_stack_top (& CG (foreach_copy_stack), (void **) & foreach_copy ); foreach_copy-> op1.op _ type = IS_UNUSED;} value_node = opline-> result; if (assign_by_ref) {zend_do_en D_variable_parse (value, BP_VAR_W, 0 TSRMLS_CC); // obtain the value (reference) zend_do_assign_ref (NULL, value, & value_node TSRMLS_CC ); // specify that the value node type is IS_VAR} else {zend_do_assign (& dummy, value, & value_node TSRMLS_CC); // Obtain the copy value zend_do_free (& dummy TSRMLS_CC );} if (key-> op_type! = IS_UNUSED) {znode key_node; opline = & CG (active_op_array)-> opcodes [as_token-> u. opline_num + 1]; opline-> result. op_type = IS_TMP_VAR; opline-> result. u. EA. type = 0; opline-> result. u. opline_num = equals (CG (active_op_array); key_node = opline-> result; equals (& dummy, key, & key_node TSRMLS_CC); zend_do_free (& dummy TSRMLS_CC );} do_begin_loop (TSRMLS_C); INC_BPC (CG (active_op_array);} void token (znode * foreach_token, znode * as_token TSRMLS_DC) {zend_op * container_ptr; zend_op * opline = get_next_op (CG (active_op_array) TSRMLS_CC); // Generate JMP opcode opline-> opcode = ZEND_JMP; opline-> op1.u. opline_num = as_token-> u. dependencies; // Set JMP to FE_FETCH opline row SET_UNUSED (opline-> op1); SET_UNUSED (opline-> op2); CG (active_op_array)-> opcodes [foreach_token-> u. opline_num]. op2.u. opline_num = get_next_op_number (CG (active_op_array); // Set the opline row CG (active_op_array) that jumps out of the loop-> opcodes [as_token-> u. opline_num]. op2.u. opline_num = get_next_op_number (CG (active_op_array); // same as do_end_loop (as_token-> u. opline_num, 1 TSRMLS_CC); // Set zend_stack_top (& CG (foreach_copy_stack), (void **) & container_ptr); Terminal (container_ptr TSRMLS_CC) for nested loops ); zend_stack_del_top (& CG (foreach_copy_stack); DEC_BPC (CG (active_op_array); // Set for PHP interactive mode}
At the same time, it should be noted that whether the foreach is a value or a reference in use.
When traversing an array in php, you can use for or foreach. the syntax of foreach is: foreach ($ arr as $ k => $ v ). Traverse the array and assign the index to $ k. The value of the array is assigned to $ v. in this case, whether to pass the value or reference the value is assigned here. Let's take a look at the following example:
$ Arr = array ('id' => 1, 'name' => 'name1'), array ('id' => 2, 'name' => 'name2'),); foreach ($ arr as $ obj) {$ obj ['id'] = $ obj ['id']; $ obj ['name'] = $ obj ['name']. '-modify';} print_r ($ arr); // output result Array ([0] => Array ([id] => 1 [name] => name1) [1] => Array ([id] => 2 [name] => name2 ))
We can see that the $ arr operation in the foreach loop does not affect the $ arr element. Therefore, the value assignment here is to pass the value instead of the reference. What should I do if I need to modify the element value in $ arr? You can add a "&" symbol before the variable, for example:
foreach ($arr as &$obj) { $obj['id'] = $obj['id']; $obj['name'] = $obj['name'] . '-modify';}
Let's look at another example. the array stores objects,
$ Arr = array (object) (array ('id' => 1, 'name' => 'name1'), (object) (array ('id' => 2, 'name' => 'name2'),); foreach ($ arr as $ obj) {$ obj-> name = $ obj-> name. '-modify';} print_r ($ arr); // output result Array ([0] => stdClass Object ([id] => 1 [name] => name1-modify) [1] => stdClass Object ([id] => 2 [name] => name2-modify ))
Now we can see that the object in the original array has been modified, so the value assignment here is to pass the reference instead of passing the value.
Based on the above, we can conclude that if the elements in the array are of the common type, the value is transmitted, and the object type elements are stored as the transfer address.