Article from: http://www.aintnot.com/2016/02/12/phps-source-code-for-php-developers-part3-variables-ch
Original: http://blog.ircmaxell.com/2012/03/phps-source-code-for-php-developers_21.html
In the third article in the "PHP Source for PHP Developers" series, we intend to extend the previous article to help understand how PHP works internally. In the first article, we showed you how to view the source code of PHP, how it is structured, and the C pointer base that is introduced to PHP developers. The second article describes the functions. This time, we're going to dive into one of the most useful constructs of PHP: variables.
Enter Zval
In the core code of PHP, variables are calledZVAL. This structure is so important for a reason, not least because PHP uses a weak type and C uses strongly typed. So how did zval solve the problem? To answer this question, we need to look carefully at the definition of the Zval type. To see this definition, let's try searching for zval in the definition search box on the LXR page. At first glance, we seem to be unable to find anything useful. But there is a rowtypedefin the Zend.h file (typedef in C is a way to define a new data type). This may be what we're looking for, and we'll continue to look at it. Originally, this seems irrelevant. There is no useful thing here. But in order to confirm some, let's click on_zval_structthis line.
1 struct _zval_struct {
2 /* Variable information */
3 zvalue_value value; /* value */
4 zend_uint refcount__gc;
5 zend_uchar type; /* active type */
6 zend_uchar is_ref__gc;
7 };
Then we get the basics of PHP, Zval. It looks simple, doesn't it? Yes, yes, but there are some magical things that make sense. Note that this is a struct or struct. Basically, this can be seen as a class in PHP that has only public properties. Here, we have four properties:value,refcount__gctypeas wellis_ref__gc. Let's take one by one to view these properties (omitting their order).
Value
The first element we're talking about is the value variable, which is the typezvalue_value. I don't know you, but I've never heard of itzvalue_value. So let's try to figure out what it is. As with the rest of the site, you can click on a type to see its definition. Once you've clicked, you'll see that it's defined the same as the following:
typedef union _zvalue_value {
long lval; /* long value */
double dval; /* double value */
struct {
char *val;
int len;
} str;
HashTable *ht; /* hash table value */
zend_object_value obj;
} zvalue_value;
Now, there are some black tech here. See the definition of the Union? That means it's not really a struct, it's a separate type. But there are multiple types of variables inside. If there are multiple types in this, how can it be a single type? I'm glad you asked this question. To understand this, we need to recall the type of C language we talked about in the first article.
In C, a variable is simply a label for a row of memory addresses. It can also be said that the type is just the way to identify which piece of memory will be used. Nothing in C separates the 4-byte string from the integer value. They are all just a whole chunk of memory. The compiler tries to resolve it by identifying the memory segment as a variable and then converting those variables to a specific type, but this is not always successful (by the way, when a variable "overrides" the memory segment it gets, it will produce a segment error).
So, as far as we know, union is a separate type, which is interpreted in different ways depending on how it is accessed. This allows us to define a value to support multiple types. One thing to note is that all types of data must be stored using the same piece of memory. In this example, a 64-bit compiler, a long and a double will take up 64 bits to save. The string structure takes 96 bits (64 bits to store character pointers and 32 bits to hold the integer length).hash_tablewill occupy 64 bits, andzend_object_valuewill take up 96 bits (32 bits to store the element, and the remaining 64 bits to store the pointer). The entire Union takes up the maximum memory size of the element, so here it is 96 bits.
Now, if we look at this union again, we can see that there are only 5 PHP data types here (long = = Int,double = Float,str = = String,hashtable = = Array,zend_object_ Value = = object). So where does the rest of the data type go? Originally, this struct is sufficient to store the remaining data types. BOOL is stored by using long (int),NULLwithout consuming data segments,RESOURCEor by using long.
TYPE
Because this value consortium does not control how it is accessed, we need other ways to record the type of the variable. Here, we can use the data type to derive information about how to access value. It uses this byte of type to handle the problem (anzend_ucharunsigned character, or a byte in memory). It retains this information from the Zend type constants. This is really a kind of magic that needs to be usedzval.type = IS_LONGto define integer data. So this field and the Value field are enough to let us know the type and value of the PHP variable.
Is_ref
This field identifies whether the variable is a reference. That is to say, if you perform the execution in the variable$foo = &$bar. If it is 0, then the variable is not a reference, and if it is 1, then the variable is a reference. It did not do too much of things. So, before we finish_zval_struct, take a look at its fourth member.
RefCount
This variable is a counter to the pointer to the PHP variable container. That is, if RefCount is 1, it means that there is a PHP variable that uses this container. If RefCount is 2, it means that there are two PHP variables pointing to the same variable container. A separate refcount variable does not have much useful information, but ifis_refused together, it forms the basis of the garbage collector and copy-on-write. It allows us to use the same zval container to hold one or more PHP variables. The semantic interpretation of refcount is beyond the scope of this article, and if you want to go further, I recommend that you review this document.
This is all the content of Zval.
How does it work?
Inside PHP, Zval is passed to the function as a memory segment or as a pointer to a memory segment (or pointers to pointers, and so on), as with other C variables. Once we have the variable, we want to access the data inside it. How are we going to do that? We use macros defined inzend_operators.hthe file to work with Zval, making it easier to access data. It is important to note that each macro has multiple copies. The difference is their prefix. For example, to derive the type of zval, there is aZ_TYPE(zval)macro that returns an integer data to represent the Zval parameter. But there is also aZ_TYPE(zval_p)macro, whichZ_TYPE(zval)is the same as doing things, but it returns a pointer to Zval. In fact, except for the properties of the parameters, the two functions are the same, in fact, we can use themZ_TYPE(*zval_p), but _p and _pp make things easier.
We can use the Val-class macro to get the value of the Zval. Can be calledZ_LVAL(zval)to get an integer value (such as Integer data and resource data).Z_DVAL(zval)The call came to the floating-point value. There's a lot of other things to do here. The key to note is that in order to get the value of Zval in C, you need to use a macro (or should). So, when we see a function using them, we know that it is extracting its value from the Zval.
So, what about the type?
So far, our knowledge has talked about the values of type and zval. As we all know, PHP has helped us make type judgments. So, if we like, we can use a string as an integer value. We call this a stepconvert_to_type. To convert a zval to a string value, theconvert_to_stringfunction is called. It will change the type of zval that we pass to the function. So, if you see a function calling these functions, you know that it is the data type of the transformation parameter.
Zend_parse_paramenters
This function is described in the previous articlezend_parse_paramenters. Now that we know how PHP variables are represented in C, let's take a closer look.
ZEND_API int zend_parse_parameters(int num_args TSRMLS_DC, const char *type_spec, ...)
{
va_list va;
int retval;
RETURN_IF_ZERO_ARGS(num_args, type_spec, 0);
va_start(va, type_spec);
retval = zend_parse_va_args(num_args, type_spec, &va, 0 TSRMLS_CC);
va_end(va);
return retval;
}
Now, on the surface, this looks confusing. The point to understand is that the va_list type is just a variable argument list using ' ... '. Therefore, it is similar to the structure of a function in PHPfunc_get_args(). With this thing, we can see thezend_parse_parametersfunction calling thezend_parse_va_argsfunction immediately. Let's go down and look at this function ...
This function looks very interesting. At first glance, it seems to have done a lot of things. But take a closer look. First, we can see a for loop. This for loop mainly iterates throughzend_parse_parametersthe strings passed fromtype_spec. In the loop we can see that it just calculates the number of parameters that are expected to be received. The study of how it does this is left to the reader.
Keep looking down, I can see some reasonable checks (check that the parameters are passed correctly), and error checking to check if a sufficient number of parameters have been passed. Next go into a loop that we're interested in. This loop really parses those parameters. Inside the loop, we can see that there are three if statements. The first identifier to handle an optional parameter. The second processingvar-args(number of parameters). The third if statement is exactly what we are interested in. As you can see, the function is called herezend_parse_arg(). Let's take a closer look at this function ...
Keep looking down and we can see there are some very interesting things here. This function calls another function (Zend_parse_arg_impl) and then gets some error information. This is a very common pattern in PHP that extracts the function's error-handling work to the parent function. This makes the code implementation and error handling separate and can be reused to the maximum. You can go further into that function, which is very easy to understand. But let's take a closer look ...zend_parse_arg_impl()
Now, we really have the steps to parse the parameters in the PHP intrinsics. Let's take a look at the branch of the first switch statement, which is used to parse integer parameters. The next step should be easy to understand. So, let's start with the first line of the branch:
Long Long *);
If you remember what we said earlier, Va_args is the way the C language handles variable parameters. So here is the definition of an integer pointer (long in C is an integer). In short, it gets pointers from the Va_arg function. This shows that it gets a pointer to the arguments passed to the Zend_parse_parameters function. So this is the result of the pointer we will assign to the value after the branch ends. Next, we can see a branch that goes into a type based on the passed in variable (zval). Let's look at theIS_STRINGbranch first (this step is performed when passing an integer value to a string variable).
case IS_STRING:
{
double d;
int type;
if ((type = is_numeric_string(Z_STRVAL_PP(arg), Z_STRLEN_PP(arg), p, &d, -1)) == 0) {
return "long";
} else if (type == IS_DOUBLE) {
if (c == ‘L‘) {
if (d > LONG_MAX) {
*p = LONG_MAX;
break;
} else if (d < LONG_MIN) {
*p = LONG_MIN;
break;
}
}
*p = zend_dval_to_lval(d);
}
}
break;
Now, this does not look as much as it does. All things are attributed tois_numeric_stringfunctions. In general, the function checks whether the string contains only integer characters and returns 0 if it is not. If so, it parses the string into a variable (integer or float, p or D) and returns the data type. So we can see that if the string is not a pure number, he returns a "long" string. This string is used to wrap the error handling function. Otherwise, if the string represents a double (float), it checks whether the floating-point number is too large to be stored as an integer and then uses thezend_dval_to_lvalfunction to help resolve the floating-point number to the integer. That's all we know. We have parsed our string arguments. Now keep looking at the other branches:
case IS_DOUBLE:
if (c == ‘L‘) {
if (Z_DVAL_PP(arg) > LONG_MAX) {
*p = LONG_MAX;
break;
} else if (Z_DVAL_PP(arg) < LONG_MIN) {
*p = LONG_MIN;
break;
}
}
case IS_NULL:
case IS_LONG:
case IS_BOOL:
convert_to_long_ex(arg);
*p = Z_LVAL_PP(arg);
break;
Here, we can see the operation of resolving floating-point numbers, which is similar to the floating-point numbers in the parse string (coincidental?). )。 One important thing to note is that if the parameter's identity is not uppercase ' L ', it will be treated the same way as other types of variables (the case statement has no break). Now, we also have an interesting function, CONVERT_TO_LONG_EX (). This is a class of Convert_to_type () function sets that we talked about earlier, and the function transforms the parameter to a specific type. The only difference is that if the argument is not a reference (because the function is changing the data type), the function separates (copies) the value of the variable and its reference. (The only difference was that it separates (copies) the passed in variable if it's not a reference (since it ' s changing th E type). This is the role of copy-on-write. So when we pass a floating-point number to a non-referenced integer variable, the function treats it as an integer, but we can still get the floating-point data.
case IS_ARRAY:
case IS_OBJECT:
case IS_RESOURCE:
default:
return "long";
Finally, we have another three case branches. We can see that if you pass an array, object, resource, or other unknown type to an integer variable, you get an error.
The rest of the sections we leave to the reader. Readingzend_parse_arg_implfunction is really useful for better understanding the amount of PHP type judging system. Read part of it, and then try to track the status and type of various parameters in C.
Next section
The next section will be in Nikic's blog (we'll jump back and forth in this series of articles). In the next article, he talks about all the contents of the array.
"Translate" PHP variable implementation (to PHP developer PHP Source-Part III)