Memory management has a significant impact on long-running programs, such as server daemon. Therefore, understanding how PHP allocates and releases memory is extremely important for creating such programs. This article focuses on PHP memory management. Memory management has a significant impact on long-running programs, such as server daemon.
PHPHow to allocate and release memory is very important for creating such programs. This article focuses on PHP memory management.
I. memory
In PHP, filling in a string variable is quite simple. only one statement is required. "<? Php $ str = 'Hello World';?> ", And the string can be freely modified, copied, and moved. In C, although you can write a simple static string such as "char * str =" hello world ";", you cannot modify the string, because it exists in the program space. To create an operable string, you must allocate a memory block and use a function (such as strdup () to copy its content.
{
Char * str;
Str = strdup ("hello world ");
If (! Str ){
Fprintf (stderr, "Unable to allocate memory! ");
}
}
For various reasons we will analyze later, traditional memory management functions (such as malloc (), free (), strdup (), realloc (), calloc (), and so on) almost none of them can be directly used by PHP source code.
II. release memory
On almost all platforms, memory management is implemented through a request and release mode. First, an application requests the layer (usually "operating system") below it: "I want to use some memory space ". If there is available space, the operating system will provide it to the program and mark it so that this part of memory will no longer be allocated to other programs.
When the application uses this part of memory, it should be returned to the OS; in this way, it can continue to be allocated to other programs. If the program does not return this part of memory, the OS cannot know whether the memory is no longer used and then distributed to another process. If a memory block is not released and the owner application loses it, we will say that this application "has a vulnerability" because this part of memory cannot be available for other programs.
In a typical client application, small and less frequent memory leaks can sometimes be "tolerated" by the OS ", because the leaked memory will be implicitly returned to the OS at the end of the process later. This is nothing, because the OS knows which program it allocates the memory to, and it can be sure that the memory is no longer needed when the program ends.
For long-running server daemon, includingApacheFor such web servers and php extension modules, processes are often designed to run for a long time. Because the OS cannot clean up memory usage, any program leakage, no matter how small, will lead to repeated operations and eventually exhaust all system resources.
Now, we may consider the stristr () function in the user space. to use case-insensitive searches to find a string, it actually creates a small copy of each of the Two Strings, then, execute a more traditional case-sensitive search to find the relative offset. However, after locating the offset of the string, it no longer uses these lowercase strings. If it does not release these copies, each script using stristr () will leak some memory each time it is called. Finally, the web server process will have all the system memory, but it cannot be used.
You can say with confidence that the ideal solution is to write good, clean, and consistent code. This is certainly good; however, in an environment like the PHP interpreter, this is only half the case.
III. Error handling
To enable the "jump out" extension function for the user space script and its dependenciesActivityRequest, you need to use a method to completely "jump out" an activity request. This is inZendImplemented in the engine: set a "jump out" address at the beginning of a request, and call it in any die () or exit () or encounter any key error (E_ERROR) run a longjmp () command to jump to the "jump out" address.
Although this "jump out" process can simplify the process of program execution, in most cases, this means that the resource clearing part of the code (such as free () call) will be skipped) and eventually lead to memory vulnerabilities. Now, let's consider the engine code for processing function calls in the simplified version below:
Void call_function (const char * fname, int fname_len TSRMLS_DC ){
Zend_function * fe;
Char * lcase_fname;
/* PHP function names are case-insensitive,
* To simplify the positioning of these functions in the function table,
* All function names are implicitly translated into lowercase letters.
*/
Lcase_fname = estrndup (fname, fname_len );
Zend_str_tolower (lcase_fname, fname_len );
If (zend_hash_find (EG (function_table), lcase_fname, fname_len + 1, (void **) & fe) = FAILURE ){
Zend_execute (fe-> op_array TSRMLS_CC );
} Else {
Php_error_docref (NULL TSRMLS_CC, E_ERROR, "Call to undefined function: % s ()", fname );
}
Efree (lcase_fname );
}
When the line php_error_docref () is executed, the internal error processor will understand that the error level is critical and call longjmp () to interrupt the current program flow and exit call_function () accordingly () the function does not even execute the efree (lcase_fname) line. You may want to move the efree () code line to the zend_error () code line. but what about the code line that calls this call_function () routine? Fname itself is probably an allocated string, and you cannot release it until it is used by error message processing.
Note that the php_error_docref () function is an internal equivalent implementation of the trigger_error () function. Its first parameter is an optional document reference that will be added to docref. The third parameter can be any familiar E _ * Family constant used to indicate the severity of the error. The fourth parameter (last one) follows the printf () format and variable parameter list style.
4. Zend memory manager
One of the solutions to solve memory leakage during the above "bounce" request is to use the ZendMM layer. This part of the engine is very similar to the memory management behavior of the operating system-allocating memory to the calling program. The difference is that it is in a very low position in the process space and is "request aware". In this way, when a request ends, it can execute the same behavior as the OS when a process is terminated. That is, it implicitly releases all memory occupied by the request. Shows the relationship between ZendMM and OS and PHP processes.
In addition to the implicit memory clearing function, ZendMM can also control the usage of each memory request based on the memory_limit settings in php. ini. If a script tries to request more memory than the available memory in the system, or greater than the maximum number of requests it should request each time, zendMM automatically sends an E_ERROR message and starts the corresponding "jump out" process. An additional advantage of this method is that the return values of most memory allocation calls do not need to be checked, because failure will immediately jump to the exit part of the engine.
The principle of "hooks" PHP internal code with the actual memory management layer of the OS is not complicated: all internally allocated memory must be implemented using a specific set of optional functions. For example, the PHP code uses emalloc (16) instead of malloc (16) to allocate a 16-byte memory block ). In addition to implementing the actual memory allocation task, ZendMM also uses the corresponding binding request type to mark the memory block. in this way, when a request jumps out, zendMM can be implicitly released.
Generally, memory needs to be allocated to a single request for a longer period of time. This type of allocation (known as "permanent allocation" because it still exists after a request ends) can be implemented using a traditional memory distributor, because these distributions do not add the additional information used by ZendMM for each request. However, sometimes, it is not until the runtime that a specific allocation needs to be permanently allocated. Therefore, ZendMM exports a group of help macros, and its behavior is similar to other memory allocation functions, however, the last extra parameter is used to indicate whether the allocation is permanent.
If you really want to implement a permanent allocation, this parameter should be set to 1; in this case, requests are transmitted through the traditional malloc () distributor family. However, if the runtime logic assumes that the block does not need to be permanently allocated, this parameter can be set to zero and the call will be adjusted to the memory distributor function for each request.
For example, pemalloc (buffer_len, 1) will map to malloc (buffer_len), while pemalloc (buffer_len, 0) will be mapped to emalloc (buffer_len) using the following statement ):
# Define in Zend/zend_alloc.h:
# Define pemalloc (size, persistent) (persistent )? Malloc (size): emalloc (size ))
All the sub-functions provided in ZendMM can find their more traditional implementations from the following table.
The table (below) shows every distributor function supported by ZendMM and their e/pe implementation:
You may notice that even the pefree () function requires a permanent flag. This is because when calling pefree (), it does not actually know whether ptr is a permanent allocation. Calling free () for a non-permanent allocation can result in double space release, while calling efree () for a permanent allocation may result in a segment error, because the memory manager will try to find non-existing management information. Therefore, your code needs to remember whether the data structure it allocates is permanent.
In addition to the core part of the alignment function, there are other very convenient ZendMM-specific functions, such:
Void * estrndup (void * ptr, int len );
This function can allocate len + 1 byte of memory and copy len byte from ptr to the latest allocated block. The behavior of this estrndup () function can be roughly described as follows:
Void * estrndup (void * ptr, int len)
{
Char * dst = emalloc (len + 1 );
Memcpy (dst, ptr, len );
Dst [len] = 0;
Return dst;
}
Here, the NULL byte that is implicitly placed at the end of the buffer can ensure that any function that uses estrndup () to perform string copy operations does not need to worry about passing the result buffer to a function such as printf () in this case, we hope that NULL is the Terminator. When using estrndup () to copy non-string data, the last byte is actually wasted, but the advantage is significantly greater than the disadvantage.
Void * safe_emalloc (size_t size, size_t count, size_t addtl );
Void * safe_pemalloc (size_t size, size_t count, size_t addtl, char persistent );
The final size of the memory space allocated by these functions is (size * count) + addtl ). You can ask: "Why do you need to provide additional functions? Why not use emalloc/pemalloc? "The reason is simple: for security. Although the possibility is sometimes quite small, it is the result of this "probability is quite small" that causes the memory overflow of the host platform. This may cause allocation of a byte space with a negative number, or even worse, it may cause allocation of a byte space smaller than the size required by the calling program. While safe_emalloc () can avoid this type of traps-by checking integer overflow and explicitly pre-ending when such overflow occurs.
Note that not all memory allocation routines have a corresponding p * peer implementation. For example, pestrndup () does not exist and safe_pemalloc () does not exist before PHP 5.1 ().
V. reference count
Careful memory allocation and release have an extremely significant impact on the long-term nature of PHP (which is a multi-request process), but this is only half of the problem. To enable a server that processes thousands of clicks per second to run efficiently, each request requires as little memory as possible and unnecessary data replication operations should be minimized. Consider the following PHP code snippets:
<? Php
$ A = 'Hello World ';
$ B = $;
Unset ($ );
?>
After the first call, only one variable is created and a 12-byte memory block is assigned to it to store the string "Hello World". It also contains a NULL character at the end. Now let's look at the following two rows: $ B is set to the same value as $ a, and then variable $ a is released.
If PHP needs to copy the variable content for each variable assignment, the additional 12 bytes need to be copied for the string to be copied in the previous example, in addition, another processor is loaded during data replication. This line seems a bit ridiculous at first glance, because when the third line of code appears, the original variable is released, making the entire data replication completely unnecessary. In fact, let's take a further look at what happens when the content of a 10 MB file is loaded into two variables. This will occupy 20 MB of space. at this time, 10 is enough. Will the engine waste so much time and memory on such a useless effort?
You should know that PHP designers are already familiar with this.
Remember, in the engine, variable names and their values are actually two different concepts. The value itself is an unknown zval * storage body (in this example, it is a string value) and is assigned to the variable $ a through zend_hash_add. What happens if both variable names point to the same value?
{
Zval * helloval;
MAKE_STD_ZVAL (helloval );
ZVAL_STRING (helloval, "Hello World", 1 );
Zend_hash_add (EG (active_symbol_table), "a", sizeof ("a"), & helloval, sizeof (zval *), NULL );
Zend_hash_add (EG (active_symbol_table), "B", sizeof ("B"), & helloval, sizeof (zval *), NULL );
}
At this point, you can actually observe $ a or $ B, and you will see that they all contain the string "Hello World ". Unfortunately, next, you will continue to execute the third line of code "unset ($ );". At this time, unset () does not know that the data pointed to by the $ a variable is also used by another variable, so it just blindly releases the memory. Any subsequent access to variable $ B will be analyzed as the released memory space and thus cause the engine to crash.
This problem can be solved by using refcount, the fourth member of zval (which has several forms. When a variable is created and assigned a value for the first time, its refcount is initialized to 1 because it is assumed to be used only by the corresponding variable when it was initially created. When your code snippet begins to assign helloval to $ B, it needs to increase the refcount value to 2; therefore, this value is now referenced by two variables:
{
Zval * helloval;
MAKE_STD_ZVAL (helloval );
ZVAL_STRING (helloval, "Hello World", 1 );
Zend_hash_add (EG (active_symbol_table), "a", sizeof ("a"), & helloval, sizeof (zval *), NULL );
ZVAL_ADDREF (helloval );
Zend_hash_add (EG (active_symbol_table), "B", sizeof ("B"), & helloval, sizeof (zval *), NULL );
}
Now, when unset () deletes the $ A replica of the original variable, it will be able to see from the refcount parameter that other people are interested in the data; therefore, it should only reduce the refcount count value, and then ignore it.
6. Copy on Write)
It is a good idea to use refcounting to save memory, but what happens when you only want to change the value of one of the variables? To do this, consider the following code snippet:
<? Php
$ A = 1;
$ B = $;
$ B + = 5;
?>
Through the above logic flow, you certainly know that the value of $ a is still equal to 1, and the value of $ B is 6 at last. At this point, you also know that Zend is trying to save memory-by making $ a and $ B reference the same zval (see the second line of code ). So what happens when the value of the $ B variable is executed in the third row?
The answer is: Zend needs to check the refcount value and ensure that it is separated when its value is greater than 1. In the Zend Engine, separation destroys a reference pair, which is exactly the opposite of the process you just saw:
Zval * get_var_and_separate (char * varname, int varname_len TSRMLS_DC)
{
Zval ** varval, * varcopy;
If (zend_hash_find (EG (active_symbol_table), varname, varname_len + 1, (void **) & varval) = FAILURE ){
/* The variable does not exist at all.-exit due to failure */
Return NULL;
}
If (* varval)-> refcount <2 ){
/* Varname is a unique actual reference,
* No separation is required.
*/
Return * varval;
}
/* Otherwise, copy another zval * value */
MAKE_STD_ZVAL (varcopy );
Varcopy = * varval;
/* Copy any allocated structure in zval */
Zval_copy_ctor (varcopy );
/* Delete the varname of the old version.
* This will reduce the value of varval refcount in the process.
*/
Zend_hash_del (EG (active_symbol_table), varname, varname_len + 1 );
/* Initialize the reference count of the newly created value and attach it
* Varname variable
*/
Varcopy-> refcount = 1;
Varcopy-> is_ref = 0;
Zend_hash_add (EG (active_symbol_table), varname, varname_len + 1, & varcopy, sizeof (zval *), NULL );
/* Return the new zval **/
Return varcopy;
}
Now, since the engine has a zval * only owned by variable $ B (the engine can know this ), therefore, it can convert the value into a long value and increase it by 5 according to the request of the script.
7. change-on-write)
The introduction of the reference count concept also leads to a new possibility of data operations. its form seems to be related to "reference" in the user space script manager. Consider the following user space code snippets:
<? Php
$ A = 1;
$ B = & $;
$ B + = 5;
?>
In the above PHP code, you can see that the value of $ a is now 6, although it is 1 at the beginning and has never (directly) changed. This occurs because when the engine increases the value of $ B by 5, it notices that $ B is a reference to $ a and thinks that "I can change this value without separating it, because I want to make this change visible to all referenced variables ".
But how does the engine know? It is easy. you only need to check the fourth and last elements (is_ref) of the zval structure. This is a simple on/off bit, which defines whether the value is actually part of a user space style reference set. In the previous code snippet, when the first line is executed, the value of a refcount created for $ a is 1, and an is_ref value is 0, because it is only a variable ($) it has no other variables and produces write reference changes for it. In the second row, the refcount element of this value is increased to 2, except that this is_ref element is set to 1 (because the script contains a "&" symbol to indicate that it is completely referenced ).
Finally, in the third row, the engine extracts the value related to the variable $ B again and checks whether it is necessary to separate it. This time the value is not separated because no check is included before. The following code is related to the refcount check in the get_var_and_separate () function:
If (* varval)-> is_ref | (* varval)-> refcount <2 ){
/* Varname is a unique actual reference,
* Or it is a full reference to other variables.
* Any method: no separation is performed.
*/
Return * varval;
}
This time, although refcount is 2, there is no separation, because this value is a full reference. The engine is free to modify it without having to worry about changes in other variable values.
VIII. Separation
Although the replication and reference technologies discussed above exist, there are still some problems that cannot be solved through the is_ref and refcount operations. Consider the following PHP code block:
<? Php
$ A = 1;
$ B = $;
$ C = & $;
?>
Here, you have a value that needs to be associated with three different variables. Two variables use the "change-on-write" full reference mode, and the third variable is in a detachable "copy-on-write" (write replication) context. If only is_ref and refcount are used to describe the relationship, what values can work?
The answer is: no one can work. In this case, the value must be copied to two separated zval *, although both contain identical data (see ).
Similarly, the following code blocks cause the same conflict and force the value to be separated into a copy (see ).
<? Php
$ A = 1;
$ B = & $;
$ C = $;
?>
Note: In both cases, $ B is associated with the original zval object, the reason is that the engine cannot know the name of the third variable in the operation when the separation occurs.
IX. Summary
PHP is a hosting language. From the perspective of common users, this method of careful resource control and memory control means that prototype development is easier and causes fewer conflicts. However, when we go deep into the "inside", all the commitments seem to no longer exist. Ultimately, we need to rely on developers with real sense of responsibility to maintain the consistency of the runtime environment.