The main content of this article: The introduction symbol table and the zval reference principle returned to the original question 1. A long time ago, I wrote an article about the reference, many principles are not clearly stated. Recently, before reading DerickRethans (home: derickrethans. nlGithub: github. comderickr ),
The main content of this article: The introduction symbol table and the zval reference principle returned to the original question 1. A long time ago, I wrote an article about the reference, many principles are not clearly stated. Before reading Derick Rethans (home: http://derickrethans.nl/Github: https://github.com/derickr)
Main content of this article:
- Introduction
- Symbol table and zval
- Reference Principle
- Back to the original question
I. Introduction
I wrote an article about references a long time ago. Many of the principles I wrote at that time were not clear. Recently, when reading a report by Derick Rethans (home: http://derickrethans.nl/Github: https://github.com/derickr) Daniel, I found an article about the PHP reference mechanism, that is, this PDF. this article describes the principles of reference counting, reference passing, reference returning, and global parameters from the zval and symbol tables, we recommend that you have time to read the original version. I believe there will be a lot of GAINS.
Let's talk about the theme today.
We know that many languages provide a reference mechanism, which allows us to use different names (or symbols) to access the same content. The reference definition in the PHP manual is: "referencing in PHP means accessing the sameVariableContent. This is not like the pointer of C. Instead, the reference is the alias of the symbol table. ", In other words, references implement some form of" binding ". For example, the interview questions we often encounter are examples of reference:
$a = array(1,2,3,4);foreach($a as &$v){ $v *= $v;}foreach($a as $v){ echo $v;}
Aside from the output of this question, we will follow the footsteps of the predecessors of Derick Rethans today to unveil the mysteries of reference step by step.
Ii. symbol table and zval
Before starting the reference principle, it is necessary to give a simple description of the terms that appear repeatedly in the text. The most important and important one is: 1. symbol table 2. zval.
1. symbol table
Computer Language is a tool for people to communicate with machines, but unfortunately, we rely on our survival and proud high-level language to execute it directly on computers, because computers can only understand some form of machine language. This means that advanced languages must be compiled (or interpreted) before they can be understood and executed by computers. During this process, many complex processes such as lexical analysis, syntax analysis, semantic analysis, intermediate code generation and optimization are required, the Compilation Program may need to repeatedly use information such as identifiers in the source program (for exampleVariableType check and semantic check in the semantic analysis phase). These information is stored in different symbol tables. The symbol table stores the names and attributes of identifiers in the source program. The information may include types, Storage types, scopes, storage allocation information, and other additional information. Many compiler symbol tables use Hashtable to efficiently insert and query symbol table items. We can simply understand that a symbol table is a hashtable or map that saves the symbol name and the attributes of the symbol. For example, for programs:
$str = 'this is a test';function foo( $a, $b ){ $tmp = 12; return $tmp + $a + $b;} function to(){}
OnePossibleThe symbol table (not the actual symbol table) is similar to this structure:
Instead of focusing on the specific structure of the symbol table, we only need to know that each function, class, and namespace have its own independent symbol table (separated from the global symbol table ). Speaking of this, I suddenly remembered one thing. When I first started programming with PHP, when I was reading the extract () function manual"From the ArrayVariableImport to the current symbol table"The meaning of this sentence cannot be understood, but it is also said to the predecessors"Extract ($ _ POST) and extract ($ _ GET) are not recommended for extraction.Variable"The suggestions are very distressing. In fact, the abuse of extract will not only cause serious security problems, but also pollute the current symbol table ).
So what is active symbol table?
We know that almost all PHP code execution processes start from the global scope, scan sequentially, and execute sequentially. If a function is called, it enters the internal execution of the function. After the function is executed, it is returned to the calling program for further execution. This means that there must be a mechanism to differentiate the symbol tables to be used in different stages, otherwise it will cause confusion in compilation and execution. Active symbol table is the symbol table used to mark the current activity (at this time, there should be at least a global symbol table and an active symbol table for the activity, usually, active symbol table refers to global symbol table ). The symbol table is not created from the beginning, but is constantly added and updated as the Compilation Program scans. When you call a function, zend (PHP language interpretation engine) creates a symbol table for the function and points the active symbol table to the symbol table. That is to say, the symbol table used at any time should be the current active symbol table.
The above is all the contents of the symbol table. Let's take a look at the key content:
- The symbol table records the name-attribute pairs of symbols in the program. This information is crucial for compilation and execution.
- Symbol table is similar to a map or hashtable
- A symbol table is not created from the beginning, but is constantly added and updated.
- An active symbol table is a pointer to the active symbol table.
For more information, see:
1. http://www.scs.stanford.edu/11wi-cs140/pintos/specs/sysv-abi-update.html/ch4.symtab.html
2. http://arantxa.ii.uam.es /~ Modonnel/Compilers/04_symboltablesiers
2.Zval
PHPKernelExplorationOfVariable(1) Zval), we have some knowledge about the structure and basic principles of zval. If you are not familiar with zval, take a look at it first. To facilitate reading, we will re-post the zval structure:
struct _zval_struct { zvalue_value value; /* value */ zend_uint refcount__gc; /* variable ref count */ zend_uchar type; /* active type */ zend_uchar is_ref__gc; /* if it is a ref variable */};typedef struct _zval_struct zval;
Iii. References
1. Reference count
As mentioned in the previous section, zval is a PHPVariableReal containers at the underlying layer, not everyVariableThey all have their own independent zval containers. For example, for the assign-by-value operation: $ a = $ B (assuming $ B, $ a is not a reference type.Variable), Zend is not $ BVariableOpen up a new space, but point the and B symbols in the symbol table to the same zval. Only one of themVariableThe zval separation operation is executed only when a change occurs. This is called the COW (Copy-on-write) mechanism, which can save memory and improve efficiency to a certain extent.
To implement the above mechanism, we need to mark the zval reference state. In the zval structure, refcount _ gc is used for counting. This value records the number of zval references.VariablePoint to this zval. In the above assignment operation, $ a = $ B will increase the refcount value of the original zval of $ B. In this regardKernelExplorationOfVariable(1) Zval.
2.Function Parameters
During script execution, the global symbol table almost always exists. However, in addition to the global symbol table, other symbol tables are actually generated: for example, during a function call, Zend creates the internal symbol table of the function, which is used to storeVariableAfter the function call is completed, the symbol table is deleted. Next we will take a simple function call as an example to introduce the process of passing parameters,VariableAnd zval status changes, the test script we use is:
function do_zval_test($s){ $s = "change "; return $s;}$a = "before";$b = do_zval_test($a);
Let's analyze it step by step:
(1). $ a = "before ";
This will be $VariableOpen a new zval (refcount = 1, is_ref = 0), as shown below:
(2). function call do_zval_test ($)
Due to function calls, Zend will create a separate symbol table (including the symbol s inside the function) for the function do_zval_test. At the same time, because $ s is actually a function parameter, therefore, it does not create a new zval for $ s, but points to the zval of $. At this time, the refcount of zval pointed to by $ a should be 3 ($ a, $ s and function call stack respectively ):
a: (refcount=3, is_ref=0)='before func'
As shown in:
(3). Execute $ s = "change" in the function"
Because the value of $ s has changed, zval separation is executed to generate a new zval for s special copy:
(4). the function returns return $ s; $ B = do_zval_test ($ ).
$ B shares zval with $ s. Prepare to destroy the symbol table in the function:
(5) destroy the symbol table in the function and return to the Global environment:
Here, by the way, when you use debug_zval_dump () and other functions to view zval's refcount, the value of zval's refcount will be increased by 1, therefore, the actual refcount value should be the printed refcount minus 1, as shown below:
$src = "string";debug_zval_dump($src);
The result is:
string(6) "string" refcount(2)
3.References
Same as above, we should directly go to the code and analyze it step by step (this example is relatively simple. for integrity, we should analyze it a little ):
$a = "simple test";$b = &a;$c = &a;$b = 42;unset($c);unset($b);
ThenVariableShows the ing relationship with zval: (it can be seen that the function of unset is onlyVariableDelete from the symbol table and reduce the refcount value of the corresponding zval)
In the last step worth noting, after unset ($ B), zval's is_ref value becomes 0 again.
What is the case if it is a combination of assign-by-reference and assign-by-value scripts?
Our test script:
(1). assign values after normal assignment
$a = "src";$b = $a;$c = &$b;
For specific procedures, see:
(2). assign values first and then assign values normally.
$a = "src";$b = &$a;$c = $a;
For details, see:
4. Pass reference
Similarly, parameters passed to the function can also be passed as references, so that they can be modified within the function.Variable. As an instance, we still use the script in 2 (function passing parameters), but the parameter is changed to the reference format:
function do_zval_test(&$s){ $s = "after"; return $s;}$a = "before";$b = do_zval_test($a);
This is basically the same as the parameter passing process of the above function. The difference is that the transfer of reference changes the value of $. In addition, after the function call ends, $ a's is_ref is restored to 0:
It can be seen that, compared with the normal value transfer, the difference between the reference transfer is:
(1) In Step $ s = "change"; 3rd, a zval is not created for $ s, but is pointed to the same zval as $ a. is_ref = 1 for this zval.
(2) Step 2. $ S = "change"; after execution, the value of $ a is changed indirectly because is_ref of zval is 1.
5. Return a reference
Another feature supported by PHP is reference return. We know that in C/C ++, when the function returns a value, a copy of the value is actually generated, and no copy is generated when the reference is returned, this method of returning a reference can save memory and improve efficiency to a certain extent. In PHP, this is not exactly the case. So what is a reference return? The PHP manual says this :"Reference return is used when you want to use the function to find the reference that should be boundVariableAbove"Isn't it confusing? It's completely cloudification? In fact, this is described in the English manual"Returning by reference is useful when you want to use a function to find to which variable a reference shocould be bound". Extract the main and key points in the text and we can get the following information:
(1). the reference return is to bind the reference toVariable.
(2). ThisVariableIt is not definite, but obtained through the function (otherwise we can use a common reference ).
This also illustrates the limitations of returning a reference: The function must returnVariableBut cannot be an expression. Otherwise, the following problems may occur:
PHP Notice: Only variable references shoshould be returned by reference in xxx (see Note in the PHP Manual ).
So how does the reference return work? For example, for the following example:
function &find_node($key,&$tree){ $item = &$tree[$key]; return $item;} $tree = array(1=>'one',2=>'two',3=>'three');$node =& find_node(3,$tree);$node ='new';
What has Zend done? Let's look at it step by step.
(1). $ tree = array (1 => 'one', 2 => 'two', 3 => 'three ')
As before, this will add the tree symbol to the Global symbol table and generateVariableZval. At the same time, zval is generated for each element of the array $ tree:
tree: (refcount=1, is_ref=0)=array ( 1 => (refcount=1, is_ref=0)='one', 2 => (refcount=1, is_ref=0)='two', 3 => (refcount=1, is_ref=0)='three')
As shown in:
(2). find_node (3, & $ tree)
Because the function is called, Zend enters the function and creates the internal symbol table of the function. Meanwhile, because the passed parameter is a reference parameter, zval's is_ref is marked as 1, the value of refcount is increased to 3 (Global tree, internal tree, and function stack ):
(3) $ item = & $ tree [$ key];
Because item is referenced by $ tree [$ key] (in this example, $ key is 3 ), therefore, update the is_ref and refcount values of $ tree [$ key] pointing to zval:
(4) return $ item and bind a reference:
(5) The function returns and destroys the local symbol table.
Is_ref of zval corresponding to the tree restores 0, refcount = 1, $ tree [3] is bound to $ nodeVariable.Variable$ Tree [3] will be indirectly changed for any change of:
(6) changing the value of $ node is reflected to the $ tree node. $ node = 'new ':
Note: To use the reference to return, you must explicitly use the & Symbol in both the function definition and function call.
6.GlobalKeywords
PHP allows us to use the Global keyword inside the function to reference the GlobalVariable(If the global keyword is not added, the local part of the function is referenced.Variable), For example:
$var = "outside";function inside(){ $var = "inside"; echo $var; global $var; echo $var;}inside();
The output isInsideoutside
We only know that the global keyword creates a localVariableAnd globalVariableSo what is the specific mechanism?
Use the following script for testing:
$var = "one"; function update_var($value){ global $var; unset($var); global $var; $var = $value;}update_var('four');echo $var;
The specific analysis process is as follows:
(1). $ var = 'one ';
As before, this will add the var symbol to the global symbol table and create the corresponding zval:
(2). update_var ('four ')
Because the string insteadVariableTherefore, a zval is created. The zval is_ref = 0 and ref_count = 2 (respectively the $ value parameter and the function stack) are shown as follows:
(3) global $ var
In the global $ var statement, two things are actually executed:
(1) Insert a local var symbol in the symbol table inside the function.
(2) create local $ var and globalVariableReferences between $ var.
(4) unset ($ var );
Note that unset is only deleted.Function internalThe var symbol in the symbol table instead of deleting the global one. At the same time, update the refcount value of the original zval and the is_ref reference mark (reference unbinding ):
(5). global $ var
Same as 3. Create a reference for local $ var and global $ var again:
(6) $ var = $ value;
Change the zval value corresponding to $ var. Because of the reference, the global $ var value also changes:
(7) The function returns and destroys the local symbol table (return to the initial starting point, but everything is quite different ):
Accordingly, we can summarize the processes and features of the global keyword:
- Declare global in the function. A localVariableAnd globalVariableCreate reference.
- In the functionVariableAny change operation of will indirectly change the globalVariable.
- Function unset localVariableIt does not affect global, but only disconnects from globalVariable.
Iv. Back to the initial issue
Now we have a basic understanding of references. Let's go back to the original question:
$a = array(1,2,3);foreach($a as &$v){ $v *= $v;}foreach($a as $v){ echo $v;}
What happened in this process?
(1). $ a = array (1, 2, 3 );
This will generate the zval of $ a in the global symbol table and also generate the corresponding zval for each element:
(2). foreach ($ a as & $ v) {$ v * = $ v ;}
Here, because it is a reference binding, it is equivalent to executing the element in the array:
$v = &$a[0];$v = &$a[1];$v = &$a[2];
The execution process is as follows:
We found that after the foreach execution is completed,$ V = & $ a [2].
(3) The second foreach Loop
foreach($a as $v){ echo $v;}
This time, because it is a normal assign-by-value assignment form, it is similar to the execution:
$v = $a[0];$v = $a[1];$v = $a[2];
Don't forget that $ v is now a reference of $ a [2]. Therefore, the value of $ a [2] is changed indirectly during the value assignment process.
The process is as follows:
Therefore, the output result is 144.
Appendix: zval debugging method in this article.
If you want to view the zval changes in a process, the best way is to add the debugging code before and after the process. For example
$a = 123;xdebug_debug_zval('a');$b=&$a;xdebug_debug_zval('a');
With drawing, you can get an intuitive zval update process.
References:
- Http://en.wikipedia.org/wiki/Symbol_table
- Http://arantxa.ii.uam.es /~ Modonnel/Compilers/04_symboltablesiers
- Http://web.cs.wpi.edu /~ Kal/courses/cs4533/module5/myst.html
- Http://www.cs.dartmouth.edu /~ Mckeeman/cs48/mxcom/doc/typeinferenceeman
- Http://www.cs.cornell.edu/courses/cs412/2008sp/lectures/lec12.pdf
- Http://php.net/manual/zh/language.references.return.php
- Http://stackoverflow.com/questions/10057671/how-foreach-actually-works
Due to the rush of writing, there will inevitably be errors in this article. You are welcome to discuss them.