PHP variable implementation (PHP source code for PHP developers-Part 3), developer source code
Article from: http://www.aintnot.com/2016/02/12/phps-source-code-for-php-developers-part3-variables-ch
Original article: http://blog.ircmaxell.com/2012/03/phps-source-code-for-php-developers_21.html
In the third article in the "PHP source code for PHP developers" series, we intend to expand the previous article to help understand how PHP works internally. In the first article, we introduced how to view the PHP source code, its code structure, and some C pointer basics for PHP developers. The second article introduces functions. This time, we plan to go deep into one of the most useful PHP structures: variables.
Enter ZVAL
In the PHP core code, variables are calledZVAL
. This structure is so important because not only PHP uses weak types, but C uses strong types. So how does ZVAL solve this problem? To answer this question, we need to carefully check the ZVAL type definition. To view this definition, let's try to search for zval in the definition search box of the lxr page. At first glance, we seem to be unable to find anything useful. But there is a rowtypedef
In zend. h file (typedef is a new data type defined in C ). This may be what we are looking. It seems unrelated. There is nothing useful here. But to confirm something, Let's click_zval_struct
This line.
1 struct _zval_struct {2 /* Variable information */3 zvalue_value value; /* value */4 zend_uint refcount__gc;5 zend_uchar type; /* active type */6 zend_uchar is_ref__gc;7 };
Then we get the PHP Foundation, zval. It looks simple, right? Yes, yes, but there are some amazing things that make sense. Note that this is a structure or struct. Basically, this can be seen as a class in PHP. These classes only have public attributes. Here, we have four attributes:value
,refcount__gc
,type
Andis_ref__gc
. Let's look at these attributes one by one (omitting their order ).
Value
The first element we talk about is the value variable. Its type iszvalue_value
. I don't know you, but I have never heard of it.zvalue_value
. So let's try to understand what it is. Like other parts of the website, you can click a type to view its definition. Once you click it, you will see that its definition is the same as below:
typedef union _zvalue_value { long lval; /* long value */ double dval; /* double value */ struct { char *val; int len; } str; HashTable *ht; /* hash table value */ zend_object_value obj;} zvalue_value;
Now, there are some black technologies. Do you see the definition of the union? This means that this is not a real struct, but a separate type. But there are multiple types of variables in it. If there are multiple types, how can it be a single type? I'm glad you asked this question. To understand this problem, we need to first think back to the type of C language we talked about in the first article.
In C, a variable is only a tag of a row of memory addresses. It can also be said that the type is only used to identify which memory will be used. In C, nothing is used to separate 4-byte strings from integer values. They are all just a whole block of memory. The compiler will try to parse it by "identifying" the memory segment as a variable and then convert these variables to a specific type, but this is not always successful (by the way, when a variable "overwrites" it to get the memory segment, it will generate a segment error ).
As we know, union is a separate type, which is explained in different ways based on how it is accessed. This allows us to define a value to support multiple types. Note that all types of data must be stored in the same memory. In this example, the 64-bit compiler, long and double will occupy 64 digits for saving. The character string structure occupies 96 bits (64 bits are used to store character pointers, and 32 bits are used to store integer lengths ).hash_table
It takes 64 bits, andzend_object_value
Takes 96 bits (32 bits are used to store elements, and the remaining 64 bits are used to store pointers ). The entire union occupies the memory size of the maximum element, so it is 96 bits here.
Now, if we look at this union again, we can see that there are only five PHP Data Types here (long = int, double = float, str = string, hashtable = array, zend_object_value = object ). So where are the remaining data types? Originally, this struct is enough to store the remaining data types. BOOL uses long (int) for storage,NULL
Do not occupy data segments,RESOURCE
Long is also used for storage.
TYPE
Because this value union does not control how it is accessed, we need other methods to record the variable type. Here, we can obtain how to access value information through data types. It uses the byte type to handle this problem (zend_uchar
Is an unsigned character, or a byte in memory ). It retains this information from the zend type constant. This is really a magic, it needs to be usedzval.type = IS_LONG
To define integer data. Therefore, this field and value field are enough to let us know the type and value of PHP variables.
IS_REF
This field identifies whether the variable is referenced. That is to say, if you execute it in the variable$foo = &$bar
. If it is 0, the variable is not a reference. If it is 1, the variable is a reference. It does not do much. So, after we finish_zval_struct
Before that, let's take a look at its fourth member.
REFCOUNT
This variable is a counter pointing to the pointer of the PHP variable container. That is to say, if refcount is 1, there is a PHP variable using this container. If refcount is 2, two PHP variables point to the same variable container. The separate refcount variable does not have much useful information,is_ref
It forms the basis of the garbage collector and the replication During writing. It allows us to use the same zval container to save one or more PHP variables. The semantic explanation of refcount is beyond the scope of this article. If you want to continue, I recommend that you read this document.
This is all ZVAL content.
How does it work?
In PHP, zval is passed to the function as a memory segment or a pointer to the memory segment (or pointer to the pointer, and so on) like other C variables. Once we have a variable, we want to access the data in it. How can we achieve this? We use the definition inzend_operators.h
The Macros in the file are used with zval to make data access easier. One important thing is that each macro has multiple copies. The difference is their prefix. For example, to obtain the zval typeZ_TYPE(zval)
Macro. This macro returns an integer to represent the zval parameter. But here is anotherZ_TYPE(zval_p)
Macro.Z_TYPE(zval)
The same thing is done, but it returns a pointer to zval. In fact, except for different parameter attributes, these two functions are the same. In fact, we can useZ_TYPE(*zval_p)
But _ P and _ PP make things easier.
We can use the VAL macro to obtain the zval value. YesZ_LVAL(zval)
To obtain integer values (such as integer data and resource data ). CallZ_DVAL(zval)
To get the floating point value. There are many others, so far. Note that you need to use macros (or should) to obtain zval values in C ). Therefore, when we see a function using them, we know that it extracts its value from zval.
What about the type?
So far, we have talked about the types and zval values. We all know that PHP helps us determine the type. Therefore, if we like it, we can treat a string as an integer value. We call this stepconvert_to_type
. To convert a zval value to a string value, callconvert_to_string
Function. It will change the ZVAL type that we pass to the function. Therefore, if you see a function calling these functions, you will know that it is in the Data Type of the conversion parameter.
Zend_Parse_Paramenters
In the previous articlezend_parse_paramenters
This function. Now that we know how PHP variables are represented in C, let's take a closer look.
ZEND_API int zend_parse_parameters(int num_args TSRMLS_DC, const char *type_spec, ...){ va_list va; int retval; RETURN_IF_ZERO_ARGS(num_args, type_spec, 0); va_start(va, type_spec); retval = zend_parse_va_args(num_args, type_spec, &va, 0 TSRMLS_CC); va_end(va); return retval;}
Now, on the surface, this seems confusing. It should be noted that the va_list type is only a variable parameter list using. Therefore, it is similarfunc_get_args()
Function construction is similar. With this, we can see thatzend_parse_parameters
Function call nowzend_parse_va_args
Function. Let's continue to look at this function...
This function looks interesting. At first glance, it seems to have done a lot of things. But take a closer look. First, we can see a for loop. This for loop mainly traverses fromzend_parse_parameters
Passedtype_spec
String. In the loop, we can see that it only calculates the expected number of parameters. How it achieves this research is left to readers.
Further, I can see that there are some reasonable checks (check whether all the parameters are correctly passed), as well as error checks, and check whether a sufficient number of parameters are passed. Next we will go into a circle of interest. The loop actually parses those parameters. In the loop, we can see three if statements. The identifier of the first processing optional parameter. Second Processingvar-args
(Number of parameters ). The third if statement is of interest to us. We can see thatzend_parse_arg()
Function. Let's take a closer look at this function...
Next, we can see that there are some very interesting things here. This function calls another function (zend_parse_arg_impl) and obtains some error messages. This is a common pattern in PHP, which extracts the function error handling work to the parent function. In this way, code implementation and error handling are separated and can be reused to the maximum extent. You can continue to study the function in depth, which is very easy to understand. But now let's take a closer look.zend_parse_arg_impl()
...
Now, we have taken the step of parsing parameters in PHP internal functions. Let's take a look at the branch of the first switch statement, which is used to parse integer parameters. The next step is easy to understand. Let's start with the first line of the Branch:
long *p = va_arg(*va, long *);
If you remember what we said before, va_args is a way to process variable parameters in C language. So here we define an integer pointer (long is an integer in C ). In short, it gets the pointer from the va_arg function. This shows that it gets the pointer to the parameter passed to the zend_parse_parameters function. So this is the result of the value assignment after the branch ends. Next, we can see that we enter a branch based on the passed variable (zval) type. Let's take a look.IS_STRING
Branch (this step will be executed when the integer value is passed to the string variable ).
case IS_STRING:{ double d; int type; if ((type = is_numeric_string(Z_STRVAL_PP(arg), Z_STRLEN_PP(arg), p, &d, -1)) == 0) { return "long"; } else if (type == IS_DOUBLE) { if (c == 'L') { if (d > LONG_MAX) { *p = LONG_MAX; break; } else if (d < LONG_MIN) { *p = LONG_MIN; break; } } *p = zend_dval_to_lval(d); }}break;
Now, this is not as much as it looks. Everything comes downis_numeric_string
Function. In general, this function checks whether the string contains only integer characters. If not, 0 is returned. If yes, it parses the string into the variable (integer or floating point type, p or d) and then returns the data type. So we can see that if the string is not a pure number, it returns the "long" string. This string is used to wrap the error handler. Otherwise, if the string is double (floating point type), it first checks whether the floating point number is too large to store as an integer, and then useszend_dval_to_lval
Function to help parse a floating point to an integer. This is what we know. We have parsed our string parameters. Now let's continue to look at other branches:
case IS_DOUBLE: if (c == 'L') { if (Z_DVAL_PP(arg) > LONG_MAX) { *p = LONG_MAX; break; } else if (Z_DVAL_PP(arg) < LONG_MIN) { *p = LONG_MIN; break; }}case IS_NULL:case IS_LONG:case IS_BOOL:convert_to_long_ex(arg);*p = Z_LVAL_PP(arg);break;
Here, we can see the operation to parse floating point numbers. This step is similar to the floating point number in the parsing string (coincidence ?). It is important to note that if the parameter identifier is not 'l' in upper case, it will be treated in the same way as other types of variables (this case statement does not have a break ). Now we have an interesting function convert_to_long_ex (). This is the same as the convert_to_type () function set we mentioned earlier. The conversion parameter of this function is of a specific type. The only difference is that if the parameter is not referenced (because this function is changing the data type), this function will separate the value of the variable and Its Reference (copy. (The only difference is that it separates (copies) the passed in variable if it's not a reference (since it's changing the type ).) this is the role of replication During writing. Therefore, when we pass a floating point number to a non-referenced integer variable, this function treats it as an integer, but we can still get floating point data.
case IS_ARRAY:case IS_OBJECT:case IS_RESOURCE:default:return "long";
Finally, we have three other case branches. We can see that if you pass an array, object, resource, or other unknown type to an integer variable, you will get an error.
The rest is left to readers. Readzend_parse_arg_impl
Functions are really useful for better understanding of the PHP type judgment system. Read in part, and try to track the status and type of various parameters in C.
Next part
The next part will be posted on Nikic's blog (we will jump back and forth in this series of articles ). In the next article, he talked about all the contents of the array.