One common feature of each programming language is to store and retrieve information, and php is no exception. although many languages require that all variables be defined before use and their type information is fixed, php allows programmers to create variables when using them, and can store information that can be expressed in any type of language. the variable type can also be automatically converted as needed. internal and external variables
One common feature of each programming language is to store and retrieve information, and php is no exception. although many languages require that all variables be defined before use and their type information is fixed, php allows programmers to create variables when using them, and can store information that can be expressed in any type of language. the variable type can also be automatically converted as needed.
Because you have used php in user space, you should know that this concept is "weak type ". in this chapter, you will see how the information is encoded in the php parent language ---- c (the C type is strict.
Of course, data encoding is only half the work. to keep track of all these information slices, each variable also requires a tag and a container. from the perspective of user space, you can regard them as the concept of variable names and scopes.
Data type
The data storage unit in php is zval, also known as Zend Value. it is a struct with only four members, which is defined in Zend/zend. h. the format is as follows:
typedef struct _zval_struct { zval_value value; zend_uint refcount; zend_uchar type; zend_uchar is_ref; } zval;
We can intuitively guess that most of these members have basic storage types: unsigned integer refcount, unsigned character type, and is_ref. the value member is actually a structure defined as union. in php5, it is defined as follows:
typedef union _zvalue_value { long lval; double dval; struct { char *val; int len; } str; HashTable *ht; zend_object_value obj; } zvalue_value;
Union allows Zend to use a single, unified structure to store many different types of data in a php variable.
Zend currently defines eight data types listed in the following table:
Type value |
Purpose |
IS_NULL |
This type is automatically assigned to uninitialized variables until it is used for the first time. you can also use the built-in NULL constant in the user space to explicitly assign values. this variable type provides a special "no data" type, which is different from Boolean FALSE and integer 0. |
IS_BOOL |
Boolean variables can be either of the two possible states: TRUE/FALSE. conditional expressions in the user space control structure, such as if/while/ternary/for, are implicitly converted to Boolean type during evaluation. |
IS_DOUBLE |
The floating point data type, which uses the signed double data type of the host system. floating-point numbers are not stored with accurate precision. Instead, they use a public table to show the limited precision of the decimal part of the value. floating point value = symbol * tail number * 2 ^ index ---- from BSD Library Functions Manual: float (3 )). this notation allows the computer to store a large range of values (positive or negative): 8 bytes can represent 2.225*10 ^ (-308) to 1.798*10 ^ (308) number in the range. unfortunately, the actual decimal number it evaluates cannot always be stored as clean as the binary score. for example, if the decimal expression 0.5 is converted to binary, the exact value is 0.1. However, if the decimal expression 0.8 is converted to binary, it is infinite loop 0. 1100110011 ..., when it is converted back to decimal, it cannot be restored because it cannot store discarded binary digits. similarly, you can convert 1/3 to 0.333333 in decimal format. the two values are very similar, but they are not accurate, because 3*0.333333 is not equal to 1. 0. this inaccuracy is often confusing when dealing with floating point numbers on computers. (These range restrictions are generally based on 32-bit platforms; different system ranges may be different) |
IS_STRING |
The most common data type in php is a string, which is stored in a way that meets the expectations of experienced C programmers. allocate a memory that is large enough to save all the bytes/characters of the string, and save the pointer pointing to the string in the zval of the host. It is worth noting that the length of the php string is always explicitly indicated in the zval structure. this allows the string to contain NULL bytes without being truncated. this aspect of php strings is called "binary security" in the future, because it can safely include any type of binary data. It should be noted that the total memory allocated for a php string is always minimized: length plus 1. the last byte stores the ending NULL characters. Therefore, functions that do not care about binary security can directly pass string pointers. |
IS_ARRAY |
An array is a special variable. its unique function is to organize other variables. unlike the array concept in C, php arrays are not vectors of a single type of data (such as zval arrayofzvals [];). in fact, the php array is a complex set of data buckets, which is internally a HashTable. each HashTable element (bucket) contains two pieces of information: tags and data. in the application scenario of php arrays, labels are the keys or values of the associated arrays, and the data is the variable pointed to by the key (zval) |
IS_OBJECT |
The object stores multi-element data with arrays. In addition, methods, access modifiers, scope constants, and special event processors are added. as an extension developer, building medium-price object-oriented code in php4 and php5 is a big challenge, because between Zend Engine 1 (php4) and Zend Engine 2 (php5, the internal object model has been greatly changed. |
IS_RESOURCE |
Some data types cannot be mapped to user space. for example, the FILE pointer of stdio or the connection handle of libmysqlclient cannot be simply mapped to a scalar value array. to protect the user space script writers from solving these problems, php provides a generic resource data type. the implementation details of resource types will be covered in chapter 9th "resource data types". now we only need to know such a thing. |
In the above table, the IS _ * constant IS stored in the type element of the zval structure to determine which part of the value element should be viewed when testing the value of the variable.
The most obvious method to check a data type is as follows:
void describe_zval(zval *foo) { if (foo->type == IS_NULL) { php_printf("The variable is NULL"); } else { php_printf("The variable is of type %d", foo->type); } }
Obviously, but it is wrong.
Well, there is no error, but it is not the first choice. the Zend header file contains a lot of zval access macros, which the author expects to use when testing zval data. the main reason for this is to avoid incompatibility issues after the engine api is changed, but on the other hand, this will make the code easier to read. the following is a code snippet of the same function. this time Z_TYPE_P () macro is used:
void describe_zval(zval *foo) { if (Z_TYPE_P(foo) == IS_NULL) { php_printf("The variable is NULL"); } else { php_printf("The variable is of type %d", Z_TYPE_P(foo)); } }
The _ P suffix of this macro indicates that the passed parameter should be the pointer of level-1 inter-access. there are two other macros, Z_TYPE () and Z_TYPE_PP (), whose expected parameter types are zval (non-pointer) and zval ** (two-level inter-access pointer ).
Note:
In this example, a special output function php_printf () is used to display data slices. the syntax of this function is equivalent to the printf function of stdio. However, it has special processing for webserver sapi and uses the php output buffer mechanism to improve performance. you will learn more about this function and its family PHPWRITE () in Chapter 5th "your first extension ().
Data value
Same as the type, zval values can also be checked using three sets of macros. these macros always start with Z _ and end with _ P or _ PP, depending on their inter-access level.
For simple scalar types, boolean, long, double, macros are abbreviated as BVAL, LVAL, and DVAL.
void display_values(zval boolzv, zval *longpzv, zval **doubleppzv) { if (Z_TYPE(boolzv) == IS_BOOL) { php_printf("The value of the boolean is: %s\n", Z_BVAL(boolzv) ? "true" : "false"); } if (Z_TYPE_P(longpzv) == IS_LONG) { php_printf("The value of the long is: %ld\n", Z_LVAL_P(longpzv)); } if (Z_TYPE_PP(doubleppzv) == IS_DOUBLE) { php_printf("The value of the double is: %f\n", Z_DVAL_PP(doubleppzv)); } }
Because the string variable contains two members, it has a pair of macros representing char * (STRVAL) and int (STRLEN) members respectively:
void display_string(zval *zstr) { if (Z_TYPE_P(zstr) != IS_STRING) { php_printf("The wrong datatype was passed!\n"); return; } PHPWRITE(Z_STRVAL_P(zstr), Z_STRLEN_P(zstr)); }
The array data type is stored in HashTable * and can be accessed using Z_ARRVAL (zv), Z_ARRVAL_P (pzv), Z_ARRVAL_PP (ppzv. when you read the code of the old php kernel and pecl module, you may encounter the HASH_OF () Macro, which expects a zval * parameter. this macro is equivalent to Z_ARRVAL_P () macro. However, this usage is obsolete and should not be used in new code.
The internal representation structure of the object is complex. it has a large number of access macros: OBJ_HANDLE return processing identifier, OBJ_HT return processing table, OBJCE is used for class definition, and OBJPROP is used for HahsTable of attributes, OBJ_HANDLER is used to maintain a special processor method in the OBJ_HT table. do not be scared by so many object access macros. in chapter 10th "php4 objects" and Chapter 11th "php5 objects", their details will be introduced.
In a zval, the resource data type is stored as a simple integer, which can be accessed through the RESVAL macro group. this integer will be passed to the zend_fetch_resource () function to find the resource object in the registered resource list. in chapter 9th, we will discuss resource data types in depth.
Data creation
Now you know how to retrieve data from a zval. it's time to create some of your own data. although zval can be defined as a direct variable at the top of the function, the data of the variable is stored locally. to let it leave the function to reach the user space, you need to copy it.
Because most of the time you want to create zval to reach the user space, you need to allocate a block of memory to it and assign it to a zval * pointer. like the previous "obvious" solution, using malloc (sizeof (zval) is not the correct answer. instead, you need to use another Zend macro: MAKE_STD_ZVAL (pzv ). this macro will allocate memory for other zval instances in an optimized way and automatically handle out-of-memory errors (as explained in the next chapter ), and initialize the refcount and is_ref attributes of the new zval.
In addition to MAKE_STD_ZVAL (), you may also encounter other zval * creation macros, such as ALLOC_INIT_ZVAL (). the only difference between this macro and MAKE_STD_ZVAL is that it initializes the data type of zval * to IS_NULL.
Once the data storage space is available, you can fill in some information in your new zval. after reading the previous data storage section, you may want to use Z_TYPE_P () and Z_SOMEVAL_P () macros to set your new variables. let's see if this "obvious" solution is correct?
Likewise, the "obvious" is not correct!
Zend exposes another set of macros to set zval * values. The following are the new macros and the formats you are familiar with after expanding them:
ZVAL_NULL(pvz); Z_TYPE_P(pzv) = IS_NULL;
Although these macros do not save much compared to more direct versions, their appearance reflects integrity.
ZVAL_BOOL(pzv, b); Z_TYPE_P(pzv) = IS_BOOL; Z_BVAL_P(pzv) = b ? 1 : 0; ZVAL_TRUE(pzv); ZVAL_BOOL(pzv, 1); ZVAL_FALSE(pzv); ZVAL_BOOL(pzv, 0);
Note that any non-0 value provided to ZVAL_BOOL () will generate a true value. when hard encoding is performed in internal code, it is considered a good practice to use 1 to indicate the true value. macros ZVAL_TRUE () and ZVAL_FALSE () are provided to facilitate coding and sometimes improve code readability.
ZVAL_LONG(pzv, l); Z_TYPE_P(pzv) = IS_LONG; Z_LVAL_P(pzv) = l; ZVAL_DOUBLE(pzv, d); Z_TYPE_P(pzv) = IS_DOUBLE; Z_DVAL_P(pzv) = d;
The basic scalar macros are as simple as themselves. set the zval type and assign it a value.
ZVAL_STRINGL(pzv,str,len,dup); Z_TYPE_P(pzv) = IS_STRING; Z_STRLEN_P(pzv) = len; if (dup) { Z_STRVAL_P(pzv) = estrndup(str, len + 1); } else { Z_STRVAL_P(pzv) = str; } ZVAL_STRING(pzv, str, dup); ZVAL _STRINGL(pzv, str, strlen(str), dup);
Here, the creation of zval has become interesting. strings are like arrays, objects, and resources. additional memory needs to be allocated for their data storage. in the next chapter, you will continue to explore the memory management trap. now, you only need to note that when the dup value is 1, a new memory will be allocated and the string content will be copied, when the dup value is 0, zval is simply pointed to the existing string data.
ZVAL_RESOURCE(pzv, res); Z_TYPE_P(pzv) = IS_RESOURCE; Z_RESVAL_P(pzv) = res;
Looking back, the resource only stores a simple integer in zval, which is used to search in the resource table managed by Zend. therefore, ZVAL_RESOURCE () macro is similar to ZVAL_LONG () macro, but different types are used.
Data type/value/create review exercise
static void eae_001_zval_dump_real(zval *z, int level) { HashTable *ht; int ret; char *key; uint index; zval **pData; switch ( Z_TYPE_P(z) ) { case IS_NULL: php_printf("%*stype = null, refcount = %d%s\n", level * 4, "", Z_REFCOUNT_P(z), Z_ISREF_P(z) ? ", is_ref " : ""); break; case IS_BOOL: php_printf("%*stype = bool, refcount = %d%s, value = %s\n", level * 4, "", Z_REFCOUNT_P(z), Z_ISREF_P(z) ? ", is_ref " : "", Z_BVAL_P(z) ? "true" : "false"); break; case IS_LONG: php_printf("%*stype = long, refcount = %d%s, value = %ld\n", level * 4, "", Z_REFCOUNT_P(z), Z_ISREF_P(z) ? ", is_ref " : "", Z_LVAL_P(z)); break; case IS_STRING: php_printf("%*stype = string, refcount = %d%s, value = \"%s\", len = %d\n", level * 4, "", Z_REFCOUNT_P(z), Z_ISREF_P(z) ? ", is_ref " : "", Z_STRVAL_P(z), Z_STRLEN_P(z)); break; case IS_DOUBLE: php_printf("%*stype = double, refcount = %d%s, value = %0.6f\n", level * 4, "", Z_REFCOUNT_P(z), Z_ISREF_P(z) ? ", is_ref " : "", Z_DVAL_P(z)); break; case IS_RESOURCE: php_printf("%*stype = resource, refcount = %d%s, resource_id = %d\n", level * 4, "", Z_REFCOUNT_P(z), Z_ISREF_P(z) ? ", is_ref " : "", Z_RESVAL_P(z)); break; case IS_ARRAY: ht = Z_ARRVAL_P(z); zend_hash_internal_pointer_reset(ht); php_printf("%*stype = array, refcount = %d%s, value = %s\n", level * 4, "", Z_REFCOUNT_P(z), Z_ISREF_P(z) ? ", is_ref " : "", HASH_KEY_NON_EXISTANT != zend_hash_has_more_elements(ht) ? "" : "empty"); while ( HASH_KEY_NON_EXISTANT != (ret = zend_hash_get_current_key(ht, &key, &index, 0)) ) { if ( HASH_KEY_IS_STRING == ret ) { php_printf("%*skey is string \"%s\"", (level + 1) * 4, "", key); } else if ( HASH_KEY_IS_LONG == ret ) { php_printf("%*skey is long %d", (level + 1) * 4, "", index); } ret = zend_hash_get_current_data(ht, &pData); eae_001_zval_dump_real(*pData, level + 1); zend_hash_move_forward(ht); } zend_hash_internal_pointer_end(Z_ARRVAL_P(z)); break; case IS_OBJECT: php_printf("%*stype = object, refcount = %d%s\n", level * 4, "", Z_REFCOUNT_P(z), Z_ISREF_P(z) ? ", is_ref " : ""); break; default: php_printf("%*sunknown type, refcount = %d%s\n", level * 4, "", Z_REFCOUNT_P(z), Z_ISREF_P(z) ? ", is_ref " : ""); break; } } PHP_FUNCTION(eae_001_zval_dump) { zval *z; if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "z", &z) == FAILURE) { return; } eae_001_zval_dump_real(z, 0); RETURN_NULL(); } PHP_FUNCTION(eae_001_zval_make) { zval *z; MAKE_STD_ZVAL(z); ZVAL_NULL(z); eae_001_zval_dump_real(z, 0); ZVAL_TRUE(z); eae_001_zval_dump_real(z, 0); ZVAL_FALSE(z); eae_001_zval_dump_real(z, 0); ZVAL_LONG(z, 100); eae_001_zval_dump_real(z, 0); ZVAL_DOUBLE(z, 100.0); eae_001_zval_dump_real(z, 0); ZVAL_STRING(z, "100", 0); eae_001_zval_dump_real(z, 0); }
Data storage
You have already used php on the user space side, so you should be familiar with arrays. we can put any number of php variables (zval) into a container (array) and assign them numbers or string format names (tags ---- key)
If nothing happens, every variable in the php script can be found in an array. when you create a variable, assign it a value. Zend places the value in an internal array called the symbol table.
A symbolic table defines the global scope. after a request is initiated, the extended RINIT method is initialized before it is called. after the script is executed, the subsequent RSHUTDOWN method is destroyed before it is executed.
When a function or object method in a user space is called, a new symbol table is allocated for the lifecycle of a function or method. it is defined as an active symbol table. if the execution of the current script is not in a function or method, the global symbol table is considered activated.
Let's take a look at the implementation of the globals structure (defined in Zend/zend_globals.h). you will see the following two element definitions:
struct _zend_execution_globals { ... HashTable symbol_table; HashTable *active_symbol_table; ... };
Symbol_table, accessed using EG (symbol_table), is always a global variable scope. it is similar to the $ GLOBALS variable in the user space and corresponds to the global scope of the php script. in fact, the $ GLOBALS variable is wrapped in a layer on EG (symbol_table.
Another element, active_symbol_table, has the following access methods: EG (active_symbol_table), which indicates the scope of variables activated at the moment.
Here is a key point to note: EG (symbol_table), unlike almost all other HashTable you will encounter when working in php and zend APIs, it is a direct variable. almost all functions operate on HashTable and expect an inter-access HashTable * as a parameter. therefore, you need to add the address (&) before EG (symbol_table (&).
Considering the following code blocks, their functions are equivalent.
/* Php implementation */
/* C implementation */{zval * fooval; MAKE_STD_ZVAL (fooval); ZVAL_STRING (fooval, "bar", 1); ZEND_SET_SYMBOL (EG (active_symbol_table), "foo ", fooval );}
First, assign a new zval using MAKE_STD_ZVAL (), whose value is initialized as the string "bar ". next is a new macro call. It adds the zval fooval to the currently activated symbol table and sets the variable "foo ". because no user space function is activated at the moment, the final meaning of EG (active_symbol_table) = & EG (symbol_table) is that the variable is stored in the global scope.
Data retrieval
To retrieve a variable from the user space, you need to search for it in the storage of the symbol table. the following code segment shows how to use the zend_hash_find () function to achieve this purpose:
{ zval **fooval; if (zend_hash_find(EG(active_symbol_table), "foo", sizeof("foo"), (void**)&fooval) == SUCCESS) { php_printf("Got the value of $foo!"); } else { php_printf("$foo is not defined."); } }
In this example, it seems a bit strange. why should we define fooval as a two-level inter-access pointer? Why is sizeof () used to determine the length of "foo? Why is & fooval? Which one is evaluated as zval *** and converted to void **? If you have asked yourself all the three questions above, pat your back.
First, it is very valuable to know that HashTable is not only used for user space variables. the HashTable structure is widely used in the whole engine and can even perfectly store non-pointer data. hashTable buckets are fixed. therefore, to store any size of data, HashTable allocates a piece of memory for storing the stored data. for variables, a zval * is stored. Therefore, the HashTable storage mechanism allocates a sufficient memory to store a pointer. the HashTable bucket uses this new pointer to save the zval * value, so zval is saved in HashTable **. hashTable can store a complete zval perfectly. why should we store zval * like this? We will discuss the specific cause in the next chapter.
When trying to retrieve data, HashTable only knows that there is a pointer pointing to a data. to bring the pointer to the local storage of the called function, the address of the local pointer (variable) is required to call the function, the result is an unknown two-level inter-access pointer variable (such as void **). you need to know that your unknown type is zval * here. you can see that when you pass this type to zend_hash_find (), the compiler will find that it knows that it is a three-level inter-access, not two-level. this is the purpose of adding a forced type conversion to suppress the warning of the compiler.
In the previous example, sizeof () is used to include the termination NULL byte when the "foo" constant is used as a variable label. here, the effect of 4 is equivalent. However, this is dangerous because the modification to the label name will affect its length, now, when the tag name is changed, it is easier to find the location to be modified. (strlen ("foo") + 1) can solve this problem, but some compilers have not optimized this step, the resulting binary file may get a meaningless string length when it is finally executed. it is not so fun to use it to loop it!
If zend_hash_find () finds the item you want to find, it will pop up the pointer address of the bucket allocated from time to time when the requested data is added to the provided pointer (zend_hash_find () 4th parameters) for the first time, returns a SUCCESS integer constant. if zend_hash_find () cannot locate the data, it will not modify the pointer (the fourth parameter of zend_hash_find () but return the integer constant FAILURE.
From the perspective of user space, the SUCCESS or FAILURE returned by storing variables in the symbol table is actually whether the variables have been set (isset ).
Type conversion
Now you can capture variables from the symbol table, so you may want to do something about them. A direct method is to check the type of the variable and execute special actions depending on the type. just like the simple switch statement in the following code can work.
void display_zval(zval *value) { switch (Z_TYPE_P(value)) { case IS_NULL: /* NULLs are echoed as nothing */ break; case IS_BOOL: if (Z_BVAL_P(value)) { php_printf("1"); } break; case IS_LONG: php_printf("%ld", Z_LVAL_P(value)); break; case IS_DOUBLE: php_printf("%f", Z_DVAL_P(value)); break; case IS_STRING: PHPWRITE(Z_STRVAL_P(value), Z_STRLEN_P(value)); break; case IS_RESOURCE: php_printf("Resource #%ld", Z_RESVAL_P(value)); break; case IS_ARRAY: php_printf("Array"); break; case IS_OBJECT: php_printf("Object"); break; default: /* Should never happen in practice, * but it's dangerous to make assumptions */ php_printf("Unknown"); break; } }
Yes, it's simple and correct. It is not difficult to guess that this encoding will make the code difficult to manage. fortunately, when the script executes the output variable behavior, both the extension and the embedded environment use a very similar mileage. using the convert_to _ * () function family exposed by Zend makes this example very simple:
void display_zval(zval *value) { convert_to_string(value); PHPWRITE(Z_STRVAL_P(value), Z_STRLEN_P(value)); }
As you may guess, there are many such functions used to convert to most data types. it is worth noting that convert_to_resource () does not make sense, because the definition of the resource type is old and cannot be mapped to the value expressed by the real user space.
If you are worried that the change to the zval value passed to the function by convert_to_string () is irreversible, it means you are great. in a real code segment, this is a typical bad idea. of course, the engine does not output variables like this. in the next chapter, you will see a safe method to use the conversion function. it will safely modify the value content without damaging its existing content.
Summary
This chapter shows the internal representation of the php variable. you have learned how to differentiate types, set and retrieve values, add variables to the symbol table, and retrieve them. in the next chapter, you will learn how to copy a zval and destroy it when you don't need it. the most important thing is, how to avoid generating copies when not needed.
You will also see the corner of the Zend single request memory management layer, learn about persistent and non-persistent allocation. at the end of the next chapter, you can use your old strength to create a job extension and use your own code to perform experiments.
The above is [Translation] [php extension development and embedded] Chapter 2nd-internal and external content of variables. For more information, please refer to PHP Chinese website (www.php1.cn )!