Original: http://blog.ircmaxell.com/2012/03/phps-source-code-for-php-developers_21.html
In the third article in the "PHP Source for PHP Developers" series, we intend to extend the previous article to help understand how PHP works internally. In the first article, we showed you how to view the source code of PHP, how it is structured, and the C pointer base that is introduced to PHP developers. The second article describes the functions. This time, we're going to dive into one of the most useful constructs of PHP: variables.
Enter Zval
In the core code of PHP, variables are called Zval. This structure is so important for a reason, not least because PHP uses a weak type and C uses strongly typed. So how did zval solve the problem? To answer this question, we need to look carefully at the definition of the Zval type. To see this definition, let's try searching for zval in the definition search box on the LXR page. At first glance, we seem to be unable to find anything useful. But there is a line of typedef in the Zend.h file (typedef in C is a way to define new data types). This may be what we're looking for, and we'll continue to look at it. Originally, this seems irrelevant. There is no useful thing here. But to confirm some, let's click on the _zval_struct line.
struct _zval_struct {/* Variable information */zvalue_value value;/* value */zend_uint Refcount__gc;zend_uchar type;/* A ctive type */zend_uchar is_ref__gc;};
Then we get the basics of PHP, Zval. It looks simple, doesn't it? Yes, yes, but there are some magical things that make sense. Note that this is a struct or struct. Basically, this can be seen as a class in PHP that has only public properties. Here, we have four properties: value, REFCOUNT__GC, type, and is_ref__gc. Let's take one by one to view these properties (omitting their order).
Value
The first element we are talking about is the value variable, and its type is zvalue_value. I don't know you, but I've never heard of Zvalue_value. So let's try to figure out what it is. As with the rest of the site, you can click on a type to see its definition. Once you've clicked, you'll see that it's defined the same as the following:
typedef Union _ZVALUE_VALUE { long lval;/* Long value */ double dval;/* Double value */ struct { char *v Al; int len; } STR; HashTable *ht; /* Hash Table value */ zend_object_value obj;} zvalue_value;
Now, there are some black tech here. See the definition of the Union? That means it's not really a struct, it's a separate type. But there are multiple types of variables inside. If there are multiple types in this, how can it be a single type? I'm glad you asked this question. To understand this, we need to recall the type of C language we talked about in the first article.
In C, a variable is simply a label for a row of memory addresses. It can also be said that the type is just the way to identify which piece of memory will be used. Nothing in C separates the 4-byte string from the integer value. They are all just a whole chunk of memory. The compiler tries to resolve it by identifying the memory segment as a variable and then converting those variables to a specific type, but this is not always successful (by the way, when a variable "overrides" the memory segment it gets, it will produce a segment error).
So, as far as we know, union is a separate type, which is interpreted in different ways depending on how it is accessed. This allows us to define a value to support multiple types. One thing to note is that all types of data must be stored using the same piece of memory. In this example, a 64-bit compiler, a long and a double will take up 64 bits to save. The string structure takes 96 bits (64 bits to store character pointers and 32 bits to hold the integer length). Hash_table will occupy 64 bits, and Zend_object_value will take up 96 bits (32 bits to store the elements, and the remaining 64 bits to store the pointers). The entire Union takes up the maximum memory size of the element, so here it is 96 bits.
Now, if we look at this union again, we can see that there are only 5 PHP data types here (long = = Int,double = Float,str = = String,hashtable = = Array,zend_object_ Value = = object). So where does the rest of the data type go? Originally, this struct is sufficient to store the remaining data types. BOOL is stored with a long (int), NULL does not occupy data segments, and resource is also stored using long.
TYPE
Because this value consortium does not control how it is accessed, we need other ways to record the type of the variable. Here, we can use the data type to derive information about how to access value. It uses this byte of type to handle the problem (Zend_uchar is an unsigned character, or a byte in memory). It retains this information from the Zend type constants. This is really a magic that requires the use of Zval.type = Is_long to define integer data. So this field and the Value field are enough to let us know the type and value of the PHP variable.
Is_ref
This field identifies whether the variable is a reference. That is to say, if you execute the $foo = & $bar in the variable. If it is 0, then the variable is not a reference, and if it is 1, then the variable is a reference. It did not do too much of things. Well, before we end the _zval_struct, take a look at its fourth member.
RefCount
This variable is a counter to the pointer to the PHP variable container. That is, if RefCount is 1, it means that there is a PHP variable that uses this container. If RefCount is 2, it means that there are two PHP variables pointing to the same variable container. A separate refcount variable does not have much useful information, but if it is used with is_ref, it forms the basis of the garbage collector and copy-on-write. It allows us to use the same zval container to hold one or more PHP variables. The semantic interpretation of refcount is beyond the scope of this article, and if you want to go further, I recommend that you review this document.
This is all the content of Zval.
How does it work?
Inside PHP, Zval is passed to the function as a memory segment or as a pointer to a memory segment (or pointers to pointers, and so on), as with other C variables. Once we have the variable, we want to access the data inside it. How are we going to do that? We use macros defined in the Zend_operators.h file to work with Zval, making it easier to access data. It is important to note that each macro has multiple copies. The difference is their prefix. For example, to derive the type of zval, there is a z_type (ZVAL) macro that returns an integer data representing the Zval parameter. But here's a z_type (ZVAL_P) macro that's the same thing as Z_type (Zval), but it returns a pointer to Zval. In fact, the two functions are the same except for the properties of the parameters, in fact, we can use Z_type (*zval_p), but _p and _pp make things easier.
We can use the Val-class macro to get the value of the Zval. You can call Z_lval (Zval) to get an integer value (such as Integer data and resource data). Call Z_dval (Zval) to get a floating-point value. There's a lot of other things to do here. The key to note is that in order to get the value of Zval in C, you need to use a macro (or should). So, when we see a function using them, we know that it is extracting its value from the Zval.
So, what about the type?
So far, our knowledge has talked about the values of type and zval. As we all know, PHP has helped us make type judgments. So, if we like, we can use a string as an integer value. We call this step convert_to_type. To convert a zval to a string value, call the Convert_to_string function. It will change the type of zval that we pass to the function. So, if you see a function calling these functions, you know that it is the data type of the transformation parameter.
Zend_parse_paramenters
In the previous article, the function of Zend_parse_paramenters was introduced. Now that we know how PHP variables are represented in C, let's take a closer look.
ZEND_API int zend_parse_parameters (int num_args tsrmls_dc, const char *type_spec, ...) { va_list va; int retval; Return_if_zero_args (Num_args, Type_spec, 0); Va_start (VA, type_spec); retval = Zend_parse_va_args (Num_args, Type_spec, &va, 0 tsrmls_cc); Va_end (VA); return retval;}
Now, on the surface, this looks confusing. The point to understand is that the va_list type is just a variable argument list using ' ... '. Therefore, it is similar to the construction of the Func_get_args () function in PHP. With this thing, we can see that the Zend_parse_parameters function calls the Zend_parse_va_args function immediately. Let's go down and look at this function ...
This function looks very interesting. At first glance, it seems to have done a lot of things. But take a closer look. First, we can see a for loop. This for loop mainly iterates over the Type_spec string passed from Zend_parse_parameters. In the loop we can see that it just calculates the number of parameters that are expected to be received. The study of how it does this is left to the reader.
Keep looking down, I can see some reasonable checks (check that the parameters are passed correctly), and error checking to check if a sufficient number of parameters have been passed. Next go into a loop that we're interested in. This loop really parses those parameters. Inside the loop, we can see that there are three if statements. The first identifier to handle an optional parameter. The second processing Var-args (number of parameters). The third if statement is exactly what we are interested in. As you can see, the Zend_parse_arg () function is called here. Let's take a closer look at this function ...
Keep looking down and we can see there are some very interesting things here. This function calls another function (Zend_parse_arg_impl) and then gets some error information. This is a very common pattern in PHP that extracts the function's error-handling work to the parent function. This makes the code implementation and error handling separate and can be reused to the maximum. You can go further into that function, which is very easy to understand. But let's take a closer look at Zend_parse_arg_impl () ...
Now, we really have the steps to parse the parameters in the PHP intrinsics. Let's take a look at the branch of the first switch statement, which is used to parse integer parameters. The next step should be easy to understand. So, let's start with the first line of the branch:
Long *p = Va_arg (*va, long *);
If you remember what we said earlier, Va_args is the way the C language handles variable parameters. So here is the definition of an integer pointer (long in C is an integer). In short, it gets pointers from the Va_arg function. This shows that it gets a pointer to the arguments passed to the Zend_parse_parameters function. So this is the result of the pointer we will assign to the value after the branch ends. Next, we can see a branch that goes into a type based on the passed in variable (zval). Let's look at the Is_string branch first (this step is performed when passing an integer value to a string variable).
Case is_string:{ double D; int type; if (type = is_numeric_string (z_strval_pp (ARG), Z_STRLEN_PP (ARG), p, &d,-1)) = = 0) { return "long"; } else if (type = = is_double) { if (c = = ' L ') { if (d > Long_max) { *p = Long_max; break; } else if (D < long_min) { *p = long_min; break; } } *p = Zend_dval_to_lval (d); }} Break
Now, this does not look as much as it does. All things are attributed to the is_numeric_string function. In general, the function checks whether the string contains only integer characters and returns 0 if it is not. If so, it parses the string into a variable (integer or float, p or D) and returns the data type. So we can see that if the string is not a pure number, he returns a "long" string. This string is used to wrap the error handling function. Otherwise, if the string represents a double (float), it checks whether the floating-point number is too large to be stored as an integer and then uses the Zend_dval_to_lval function to help resolve the floating-point number to the integer. That's all we know. We have parsed our string arguments. Now keep looking at the other branches:
Case is_double: if (c = = ' L ') { if (z_dval_pp (ARG) > Long_max) {*p = Long_max; break; } else if (Z_DVAL_PP (ARG) < long_min) { *p = long_min; break; }} Case Is_null:case is_long:case IS_BOOL:CONVERT_TO_LONG_EX (arg); *p = Z_lval_pp (ARG);
Here, we can see the operation of resolving floating-point numbers, which is similar to the floating-point numbers in the parse string (coincidental?). )。 One important thing to note is that if the parameter's identity is not uppercase ' L ', it will be treated the same way as other types of variables (the case statement has no break). Now, we also have an interesting function, CONVERT_TO_LONG_EX (). This is a class of Convert_to_type () function sets that we talked about earlier, and the function transforms the parameter to a specific type. The only difference is that if the argument is not a reference (because the function is changing the data type), the function separates (copies) the value of the variable and its reference. (The only difference was that it separates (copies) the passed in variable if it's not a reference (since it ' s changing th E type). This is the role of copy-on-write. So when we pass a floating-point number to a non-referenced integer variable, the function treats it as an integer, but we can still get the floating-point data.
Case Is_array:case is_object:case IS_RESOURCE:default:return "Long";
Finally, we have another three case branches. We can see that if you pass an array, object, resource, or other unknown type to an integer variable, you get an error.
The rest of the sections we leave to the reader. Reading the Zend_parse_arg_impl function is really useful for better understanding the amount of PHP type judging system. Read part of it, and then try to track the status and type of various parameters in C.
The next part of the next section will be in Nikic's blog (we'll jump back and forth in this series of articles). In the next article, he talks about all the contents of the array.