PHP kernel exploration variables (2)-Understanding references
Main content of this article:
Introduction to the symbolic table and zval reference principle back to the original question I. Introduction
I wrote an article about references a long time ago. Many of the principles I wrote at that time were not clear. Recently, when reading a report by Derick Rethans (home: http://derickrethans.nl/Github: https://github.com/derickr) Daniel, I found an article about the PHP reference mechanism, that is, this PDF. this article describes the principles of reference counting, reference passing, reference returning, and global parameters from the zval and symbol tables, we recommend that you have time to read the original version. I believe there will be a lot of GAINS.
Let's talk about the theme today.
We know that many languages provide a reference mechanism, which allows us to use different names (or symbols) to access the same content. The reference definition in the PHP manual is: "referencing in PHP means accessing the same variable content with different names. This is not like the pointer of C. Instead, the reference is the alias of the symbol table. ", In other words, references implement some form of" binding ". For example, the interview questions we often encounter are examples of reference:
$a = array(1,2,3,4);foreach($a as &$v){ $v *= $v;} foreach($a as $v){ echo $v;}
Aside from the output of this question, we will follow the footsteps of the predecessors of Derick Rethans today to unveil the mysteries of reference step by step.
Ii. symbol table and zval
Before starting the reference principle, it is necessary to give a simple description of the terms that appear repeatedly in the text. The most important and important one is: 1. symbol table 2. zval.
1. symbol table
Computer Language is a tool for people to communicate with machines, but unfortunately, we rely on our survival and proud high-level language to execute it directly on computers, because computers can only understand some form of machine language. This means that advanced languages must be compiled (or interpreted) before they can be understood and executed by computers. During this process, many complex processes such as lexical analysis, syntax analysis, semantic analysis, intermediate code generation and optimization are required, the Compilation Program may need to repeatedly use information such as identifiers in the source program (such as variable type check and semantic check in the semantic analysis phase), which are stored in different symbol tables. The symbol table stores the names and attributes of identifiers in the source program. The information may include types, Storage types, scopes, storage allocation information, and other additional information. Many compiler symbol tables use Hashtable to efficiently insert and query symbol table items. We can simply understand that a symbol table is a hashtable or map that saves the symbol name and the attributes of the symbol. For example, for programs:
$str = 'this is a test'; function foo( $a, $b ){ $tmp = 12; return $tmp + $a + $b;} function to(){ }
OnePossibleThe symbol table (not the actual symbol table) is similar to this structure:
Instead of focusing on the specific structure of the symbol table, we only need to know that each function, class, and namespace have its own independent symbol table (separated from the global symbol table ). Speaking of this, I suddenly remembered one thing. When I first started programming with PHP, when I was reading the extract () function manual"Import variables from the array to the current symbol table"The meaning of this sentence cannot be understood, but it is also said to the predecessors"We do not recommend that you use extract ($ _ POST) and extract ($ _ GET) to extract variables."The suggestions are very distressing. In fact, the abuse of extract will not only cause serious security problems, but also pollute the current symbol table ).
So what is active symbol table?
We know that almost all PHP code execution processes start from the global scope, scan sequentially, and execute sequentially. If a function is called, it enters the internal execution of the function. After the function is executed, it is returned to the calling program for further execution. This means that there must be a mechanism to differentiate the symbol tables to be used in different stages, otherwise it will cause confusion in compilation and execution. Active symbol table is the symbol table used to mark the current activity (at this time, there should be at least a global symbol table and an active symbol table for the activity, usually, active symbol table refers to global symbol table ). The symbol table is not created from the beginning, but is constantly added and updated as the Compilation Program scans. When you call a function, zend (PHP language interpretation engine) creates a symbol table for the function and points the active symbol table to the symbol table. That is to say, the symbol table used at any time should be the current active symbol table.
The above is all the contents of the symbol table. Let's take a look at the key content:
The symbol table records the name-attribute pairs of symbols in the program. This information is crucial for compilation and execution. A symbol table is similar to a map or hashtable symbol table. Instead of creating a map or hashtable symbol table at the beginning, it is a process of continuous addition and updating. An active symbol table is a pointer to the active symbol table.
For more information, see:
1. http://www.scs.stanford.edu/11wi-cs140/pintos/specs/sysv-abi-update.html/ch4.symtab.html
2. http://arantxa.ii.uam.es /~ Modonnel/Compilers/04_symboltablesiers
2. Zval
In the previous blog (PHP kernel exploration variable (1) Zval), we have some knowledge about the structure and basic principles of zval. If you are not familiar with zval, take a look at it first. To facilitate reading, we will re-post the zval structure:
struct _zval_struct { zvalue_value value; /* value */ zend_uint refcount__gc; /* variable ref count */ zend_uchar type; /* active type */ zend_uchar is_ref__gc; /* if it is a ref variable */};typedef struct _zval_struct zval;
Iii. References
1. Reference count
As mentioned in the previous section, zval is the real container at the bottom layer of PHP variables. To save space, not every variable has its own zval container. For example, for a value assignment (assign-by-value) operation: $ a = $ B (assuming $ B and $ a are not referenced variables), Zend does not open up new spaces for $ B variables, instead, point the and B symbols in the symbol table to the same zval. Zval separation is performed only when one of the variables changes. This is called the COW (Copy-on-write) mechanism, which can save memory and improve efficiency to a certain extent.
To implement the above mechanism, we need to mark the zval reference state. In the zval structure, refcount _ gc is used for counting. This value records how many variables point to the zval, in the above assignment operation, $ a = $ B will increase the refcount value of the zval of the original $ B. In this regard, the last time (PHP kernel exploration variable (1) Zval) has been explained in detail, and I will not go into details here.
2. Function Parameters
During script execution, the global symbol table almost always exists. However, in addition to the global symbol table, other symbol tables are actually generated: for example, during a function call, Zend creates the internal symbol table of the function to store the information of the internal variables of the function. After the function call is completed, Zend deletes the symbol table. Next we will take a simple function call as an example to introduce the variable and zval status changes during parameter passing. The test script we use is:
function do_zval_test($s){ $s = "change "; return $s;} $a = "before";$b = do_zval_test($a);
Let's analyze it step by step:
(1). $ a = "before ";
This will open up a new zval (refcount = 1, is_ref = 0) for the $ a variable, as shown below:
(2). function call do_zval_test ($)
Due to function calls, Zend will create a separate symbol table (including the symbol s inside the function) for the function do_zval_test. At the same time, because $ s is actually a function parameter, therefore, it does not create a new zval for $ s, but points to the zval of $. At this time, the refcount of zval pointed to by $ a should be 3 ($ a, $ s and function call stack respectively ):
a: (refcount=3, is_ref=0)='before func'
As shown in:
(3). Execute $ s = "change" in the function"
Because the value of $ s has changed, zval separation is executed to generate a new zval for s special copy:
(4). the function returns return $ s; $ B = do_zval_test ($ ).
$ B shares zval with $ s. Prepare to destroy the symbol table in the function:
(5) destroy the symbol table in the function and return to the Global environment: <喎?"http: www.bkjia.com kf ware vc " target="_blank" class="keylink"> Vc3ryb25np1_vcd4kpha + IDxpbWcgc3JjPQ = "http://www.2cto.com/uploadfile/Collfiles/20141129/20141129083533169.jpg" alt = "\">
Here, by the way, when you use debug_zval_dump () and other functions to view zval's refcount, the value of zval's refcount will be increased by 1, therefore, the actual refcount value should be the printed refcount minus 1, as shown below:
$src = "string";debug_zval_dump($src);
The result is:
string(6) "string" refcount(2)
3. Introduction
Same as above, we should directly go to the code and analyze it step by step (this example is relatively simple. for integrity, we should analyze it a little ):
$a = "simple test";$b = &a;$c = &a; $b = 42;unset($c);unset($b);
The correspondence between the variable and zval is shown in: (it can be seen that unset only deletes the variable from the symbol table and reduces the refcount value corresponding to zval)
In the last step worth noting, after unset ($ B), zval's is_ref value becomes 0 again.
What is the case if it is a combination of assign-by-reference and assign-by-value scripts?
Our test script:
(1). assign values after normal assignment
$a = "src";$b = $a;$c = &$b;
For specific procedures, see:
(2). assign values first and then assign values normally.
$a = "src";$b = &$a;$c = $a;
For details, see:
4. Pass reference
Similarly, parameters passed to the function can also be passed as references, so that the value of the variable can be modified within the function. As an instance, we still use the script in 2 (function passing parameters), but the parameter is changed to the reference format:
function do_zval_test(&$s){ $s = "after"; return $s;} $a = "before";$b = do_zval_test($a);
This is basically the same as the parameter passing process of the above function. The difference is that the transfer of reference changes the value of $. In addition, after the function call ends, $ a's is_ref is restored to 0:
It can be seen that, compared with the normal value transfer, the difference between the reference transfer is:
(1) In Step $ s = "change"; 3rd, a zval is not created for $ s, but is pointed to the same zval as $ a. is_ref = 1 for this zval.
(2) Step 2. $ S = "change"; after execution, the value of $ a is changed indirectly because is_ref of zval is 1.
5. Return a reference
Another feature supported by PHP is reference return. We know that in C/C ++, when the function returns a value, a copy of the value is actually generated, and no copy is generated when the reference is returned, this method of returning a reference can save memory and improve efficiency to a certain extent. In PHP, this is not exactly the case. So what is a reference return? The PHP manual says this :"When you want to use a function to find the variable to which the reference should be bound"Isn't it confusing? It's completely cloudification? In fact, this is described in the English manual"Returning by reference is useful when you want to use a function to find to which variable a reference shocould be bound". Extract the main and key points in the text and we can get the following information:
(1). the reference is bound to a variable.
(2). This variable is not determined, but obtained through the function (otherwise, we can use a common reference ).
This actually illustrates the limitations of returning a reference: A function must return a variable instead of an expression. Otherwise, the following problems may occur:
PHP Notice: Only variable references shoshould be returned by reference in xxx (see Note in the PHP Manual ).
So how does the reference return work? For example, for the following example:
function &find_node($key,&$tree){ $item = &$tree[$key]; return $item;} $tree = array(1=>'one',2=>'two',3=>'three');$node =& find_node(3,$tree);$node ='new';
What has Zend done? Let's look at it step by step.
(1). $ tree = array (1 => 'one', 2 => 'two', 3 => 'three ')
As before, this will add the tree symbol to the Global symbol table and generate the zval of the variable. At the same time, zval is generated for each element of the array $ tree:
tree: (refcount=1, is_ref=0)=array ( 1 => (refcount=1, is_ref=0)='one', 2 => (refcount=1, is_ref=0)='two', 3 => (refcount=1, is_ref=0)='three')
As shown in:
(2). find_node (3, & $ tree)
Because the function is called, Zend enters the function and creates the internal symbol table of the function. Meanwhile, because the passed parameter is a reference parameter, zval's is_ref is marked as 1, the value of refcount is increased to 3 (Global tree, internal tree, and function stack ):
(3) $ item = & $ tree [$ key];
Because item is referenced by $ tree [$ key] (in this example, $ key is 3 ), therefore, update the is_ref and refcount values of $ tree [$ key] pointing to zval:
(4) return $ item and bind a reference:
(5) The function returns and destroys the local symbol table.
The is_ref of zval corresponding to the tree restores 0, refcount = 1, $ tree [3] is bound to the $ node variable, any change to this variable will indirectly change $ tree [3]:
(6) changing the value of $ node is reflected to the $ tree node. $ node = "new ':
Note: To use the reference to return, you must explicitly use the & Symbol in both the function definition and function call.
6. Global keywords
PHP allows us to use the Global keyword inside the function to reference global variables (local variables of the function are referenced when the Global keyword is not added). For example:
$var = "outside";function inside(){ $var = "inside"; echo $var; global $var; echo $var;} inside();
The output isInsideoutside
We only know that the global keyword is bound to a local variable and a global variable. What is the specific mechanism?
Use the following script for testing:
$var = "one"; function update_var($value){ global $var; unset($var); global $var; $var = $value;} update_var('four');echo $var;
The specific analysis process is as follows:
(1). $ var = 'one ';
As before, this will add the var symbol to the global symbol table and create the corresponding zval:
(2). update_var ("four ')
A zval is created because the string is passed directly instead of the variable. The is_ref = 0 and ref_count = 2 (the parameter $ value and the function stack respectively) of the zval are as follows:
(3) global $ var
In the global $ var statement, two things are actually executed:
(1) Insert a local var symbol in the symbol table inside the function.
(2) create a reference between the local $ var and the global variable $ var.
(4) unset ($ var );
Note that unset is only deleted.Function internalThe var symbol in the symbol table instead of deleting the global one. At the same time, update the refcount value of the original zval and the is_ref reference mark (reference unbinding ):
(5). global $ var
Same as 3. Create a reference for local $ var and global $ var again:
(6) $ var = $ value;
Change the zval value corresponding to $ var. Because of the reference, the global $ var value also changes:
(7) The function returns and destroys the local symbol table (return to the initial starting point, but everything is quite different ):
Accordingly, we can summarize the processes and features of the global keyword:
When global is declared in the function, a local variable is generated within the function and the global variableCreate reference. Any change to the global variable in the function indirectly changes the value of the global variable. The unset local variable of the function does not affect global, but is unbound from the global variable. Iv. Back to the initial issueNow we have a basic understanding of references. Let's go back to the original question:
$a = array(1,2,3);foreach($a as &$v){ $v *= $v;} foreach($a as $v){ echo $v;}
What happened in this process?
(1). $ a = array (1, 2, 3 );
This will generate the zval of $ a in the global symbol table and also generate the corresponding zval for each element:
(2). foreach ($ a as & $ v) {$ v * = $ v ;}
Here, because it is a reference binding, it is equivalent to executing the element in the array:
$v = &$a[0];$v = &$a[1];$v = &$a[2];
The execution process is as follows:
We found that after the foreach execution is completed,$ V = & $ a [2].
(3) The second foreach Loop
foreach($a as $v){ echo $v;}
This time, because it is a normal assign-by-value assignment form, it is similar to the execution:
$v = $a[0];$v = $a[1];$v = $a[2];
Don't forget that $ v is now a reference of $ a [2]. Therefore, the value of $ a [2] is changed indirectly during the value assignment process.
The process is as follows:
Therefore, the output result is 144.
Appendix: zval debugging method in this article.
If you want to view the zval changes in a process, the best way is to add the debugging code before and after the process. For example
$a = 123;xdebug_debug_zval('a');$b=&$a;xdebug_debug_zval('a');
With drawing, you can get an intuitive zval update process.
References:
Http://en.wikipedia.org/wiki/Symbol_tablehttp://arantxa.ii.uam.es /~ Modonnel/Compilers/04_symboltablesi.htm http: // web.cs.wpi.edu /~ Kal/courses/cs4533/module5/myst.html http://www.cs.dartmouth.edu /~ Mckeeman/cs48/mxcom/doc/TypeInference.pdf http://www.cs.cornell.edu/courses/cs412/2008sp/lectures/lec12.pdfhttp://php.net/manual/zh/language.references.return.phphttp://stackoverflow.com/questions/10057671/how-foreach-actually-worksDue to the rush of writing, there will inevitably be errors in this article. You are welcome to discuss them.