In-depth discussion of memory management issues in PHP _php tutorial

Source: Internet
Author: User
Tags php source code
One, Memory

In PHP, it is fairly simple to populate a string variable, which only requires a statement "<?php $str = ' Hello World ';?" and the string can be freely modified, copied, and moved. And in the C language, although you can write for example "char *str =" Hello World "; Such a simple static string; However, the string cannot be modified because it exists within the program space. In order to create a string that can be manipulated, you must allocate a block of memory and copy its contents through a function such as strdup ().

{
Char *str;
str = strdup ("Hello World");
if (!STR) {
fprintf (stderr, "Unable to allocate memory!");
}
}
Because of the various reasons behind our analysis, traditional memory management functions (such as malloc (), free (), StrDup (), realloc (), Calloc (), and so on) can hardly be used directly for PHP source code.

   second, release the memory

In almost all platforms, memory management is implemented through a request and release pattern. First, an application requests the layer below it (usually "operating system"): "I want to use some memory space". If there is space available, the operating system will give it to the program and make a tag so that it does not allocate this part of memory to other programs.
When the application finishes using this portion of memory, it should be returned to the OS so that it can be assigned to other programs. If the program does not return this portion of memory, then the OS cannot know whether the memory is no longer being used and then allocated to another process. If a block of memory is not released and the owner application loses it, we say that the application is "vulnerable" because this part of the memory is no longer available to other programs.

In a typical client application, a smaller, less frequent memory leak can sometimes be "tolerated" by the OS because the leaked memory is implicitly returned to the OS at the end of the process. This is nothing, because the OS knows which program it allocates the memory to, and it can be sure that the memory is no longer needed when the program terminates.

For long-running server daemons, including Web servers like Apache and extended PHP modules, processes are often designed to run for quite a long time. Because the OS cannot clean up memory usage, any program leaks-no matter how small-will result in repetitive operations and eventually exhaust all system resources.

Now, let's consider the Stristr () function in user space, in order to find a string using a case-insensitive search, it actually creates a small copy of each of the two strings, and then executes a more traditional case-sensitive search to find the relative offset. However, after locating the offset of the string, it no longer uses these lowercase versions of the string. If it does not release these copies, then each script that uses STRISTR () will leak some memory each time it is called. Finally, the Web server process will have all of the system memory, but it is not able to use it.

You can safely say that the ideal solution is to write good, clean, consistent code. This is certainly good, but in an environment like the PHP interpreter, this view is only half right.

   third, error handling

In order to implement an active request for a "jump out" of a user-space script and its dependent extension functions, a method needs to be used to completely "jump out" an activity request. This is implemented within the Zend Engine: Set a "jump-out" address at the start of a request, and then execute a longjmp () at any die () or exit () call or when encountering any critical error (E_error) to jump to the "bounce" address.

While this "out-of-step" process simplifies the process of executing a program, in the vast majority of cases this would mean skipping the Resource Cleanup Code section (for example, free () calls) and eventually causing a memory leak. Now, let's consider the engine code for the following simplified version of the handler function call:

void call_function (const char *fname, int fname_len tsrmls_dc) {
Zend_function *fe;
Char *lcase_fname;
/* PHP function name is case insensitive,
* In order to simplify the positioning of them in the function table,
* All function names are implicitly translated in lowercase
*/
Lcase_fname = Estrndup (fname, Fname_len);
Zend_str_tolower (Lcase_fname, Fname_len);
if (Zend_hash_find (EG (function_table), Lcase_fname, Fname_len + 1, (void *) &fe) = = FAILURE) {
Zend_execute (Fe->op_array tsrmls_cc);
} else {
Php_error_docref (NULL tsrmls_cc, E_error, "call to undefined function:%s ()", fname);
}
Efree (Lcase_fname);
}


When executing to the PHP_ERROR_DOCREF () line, the internal error handler will understand that the error level is critical and call longjmp () accordingly to interrupt the current program flow and leave the call_function () function. It won't even execute to Efree (lcase_fname). You may want to move the Efree () line of code to the top of the Zend_error () line, but what about the code lines that call this call_function () routine? The fname itself is likely to be an assigned string, and you cannot release it until it is used by the error message processing.

Note that this php_error_docref () function is an internal equivalent implementation of the Trigger_error () function. Its first parameter is an optional document reference that will be added to the docref. The third parameter can be any e_* family constant that we are familiar with to indicate the severity of the error. The fourth parameter (the last one) follows the formatting and variable parameter list styles of the printf () style.

  Iv. Zend Memory Manager

One of the scenarios for resolving a memory leak during the "jump out" request above is to use the Zend Memory management (ZENDMM) layer. This part of the engine is very similar to the operating system's memory management behavior-allocating memory to the calling program. The difference is that it is in a very low position in process space and is "request-aware", so that when a request ends, it can perform the same behavior as the OS at the end of a process. That is, it implicitly frees all the memory that is occupied by the request. Figure 1 shows the relationship between the ZENDMM and the OS as well as the PHP process.


Figure 1. The Zend memory manager implements memory allocations for each request in lieu of system calls.


In addition to providing implicit memory cleanup, ZENDMM can control the usage of each memory request based on the settings memory_limit in php.ini. If a script tries to request more memory than is available in the system, or is greater than the maximum amount of memory it should request at a time, ZENDMM will automatically emit a E_ERROR message and initiate the corresponding "out-of-step" process. An additional benefit of this approach is that the return value of most memory allocation calls does not need to be checked, because if it fails, it will cause an immediate jump to the exit portion of the engine.

The principle of "hooking" the PHP internal code to the actual memory management layer of the OS is not complicated: all internally allocated memory is implemented using a specific set of optional functions. For example, instead of using malloc (16) to allocate a 16-byte block of memory, the PHP code uses EMALLOC (16). In addition to implementing the actual memory allocation task, ZENDMM uses the corresponding binding request type to flag the block of memory, so that when a request "jumps out", ZENDMM can implicitly release it.

Often, memory needs to be allocated for a longer period of time than a single request lasts. This type of allocation, which is known as a "permanent allocation" because it persists after the end of a request, can be implemented using a traditional memory allocator, because these allocations do not add the additional information that ZENDMM uses for each request. Sometimes, however, it is not until the runtime determines whether a particular assignment needs to be permanently allocated, so ZENDMM exports a set of help macros that behave like other memory allocation functions, but use the last additional parameter to indicate whether it is a permanent assignment.

If you do want to implement a permanent allocation, then this parameter should be set to 1, in which case the request is passed through the traditional malloc () allocator family. However, if the runtime logic considers that the block does not need to be permanently allocated, then this parameter can be set to zero, and the call will be adjusted to the memory allocator function for each request.

For example, Pemalloc (buffer_len,1) is mapped to malloc (Buffer_len), and Pemalloc (buffer_len,0) is mapped to Emalloc (Buffer_len) using the following statement:

#define IN zend/zend_alloc.h:
#define PEMALLOC (size, persistent) ((persistent)? malloc (size): Emalloc (size))
All of these allocator functions provided in ZENDMM are able to find their more traditional counterpart implementations from the table below.

Table 1 shows each of the allocator functions supported by ZENDMM and their E/PE counterpart implementations:

Table 1. The traditional type is relative to the PHP specific allocator.

Allocator functions E/pe corresponding implementation
void *malloc (size_t count); void *emalloc (size_t count), void *pemalloc (size_t count,char persistent);
void *calloc (size_t count); void *ecalloc (size_t count), void *pecalloc (size_t count,char persistent);
void *realloc (void *ptr,size_t count); void *erealloc (void *ptr,size_t count);
void *perealloc (void *ptr,size_t count,char persistent);
void *strdup (void *ptr); void *estrdup (void *ptr); void *pestrdup (void *ptr,char persistent);
void free (void *ptr); void Efree (void *ptr);
void Pefree (void *ptr,char persistent);

You may notice that even the Pefree () function requires the use of a permanent flag. This is because when you call Pefree (), it doesn't really know if PTR is a permanent assignment. Calling free () for a non-permanent allocation can result in double space deallocation, and calling Efree () for a permanent allocation may cause a segment error because the memory manager tries to find management information that does not exist. Therefore, your code needs to remember whether the data structure it allocates is permanent.

In addition to the core part of the allocator function, there are some other very handy zendmm-specific functions, such as:

void *estrndup (void *ptr,int len);
The function can allocate len+1 bytes of memory and copy len bytes from PTR to the most recently allocated block. The behavior of this estrndup () function can be described roughly as follows:

void *estrndup (void *ptr, int len)
{
Char *dst = emalloc (len + 1);
memcpy (DST, PTR, Len);
Dst[len] = 0;
return DST;
}
Here, the null byte implicitly placed at the end of the buffer ensures that any function that uses estrndup () to implement a string copy operation does not need to worry about passing the result buffer to a function such as printf () that expects null as a terminator. When using Estrndup () to copy non-string data, the last byte is essentially wasted, but the benefits are significantly greater than the disadvantages.

void *safe_emalloc (size_t size, size_t count, size_t ADDTL);
void *safe_pemalloc (size_t size, size_t count,size_t addtl,char persistent);
The final size of the memory space allocated by these functions is ((Size*count) +ADDTL). You can ask, "Why do you provide extra functions?" Why not use a emalloc/pemalloc? "The reason is simple: for security. Although the odds are sometimes quite small, it is this "very unlikely" result that causes memory overflow on the host platform. This can cause a byte space to be allocated a negative number, or worse, cause a byte space that is less than the calling program's required size to be allocated. Safe_emalloc () avoids this type of trap-by checking for an integer overflow and explicitly pre-ending when such an overflow occurs.

Note that not all memory allocation routines have a corresponding p* peer implementation. For example, there is no pestrndup (), and there is no Safe_pemalloc () before PHP 5.1.

   v. Reference counting

Careful memory allocation and deallocation has a significant impact on the long-term performance of PHP, which is a multi-request process, but this is only half the problem. In order for a server that processes thousands of clicks per second to run efficiently, every request needs to use as little memory as possible and to minimize unnecessary data replication operations. Consider the following snippet of PHP code:

<?php
$a = ' Hello world ';
$b = $a;
unset ($a);
? >
After the first call, only one variable is created, and a 12-byte memory block is assigned to it to store the string "Hello World" and also includes a null character at the end. Now, let's look at the following two lines: The $b is set to the same value as the variable $ A, and then the variable $ A is freed.

If PHP replicates the variable content for each variable assignment, then the string to be copied in the previous example needs to be copied with an additional 12 bytes and additional processor loading during data replication. This behavior seems a bit absurd at first, because when the third line of code appears, the original variable is freed, making the entire data duplication completely unnecessary. In fact, we might as well consider, let us imagine what happens when the contents of a 10MB-sized file are loaded into two variables. This will take up 20MB of space, when 10 is enough. Will the engine waste so much time and memory on such a useless endeavor?

You should know that PHP designers have already mastered this.

Remember, in the engine, the variable names and their values are actually two different concepts. The value itself is an unnamed zval* storage (in this case, a string value), which is assigned to the variable $ A by Zend_hash_add (). What happens if two of the variable names point to the same value?

{
Zval *helloval;
Make_std_zval (Helloval);
Zval_string (Helloval, "Hello World", 1);
Zend_hash_add (EG (active_symbol_table), "a", sizeof ("a"), &helloval, sizeof (zval*), NULL);
Zend_hash_add (EG (active_symbol_table), "B", sizeof ("B"), &helloval, sizeof (zval*), NULL);
}
At this point, you can actually observe $ A or $b, and you'll see that they all contain the string "Hello world". Unfortunately, next, you continue to execute the third line of code "unset ($a);". At this point, unset () does not know that the data pointed to by the $ A variable is also used by another variable, so it simply releases the memory blindly. Any subsequent access to the variable $b will be parsed into the freed memory space and thus cause the engine to crash.

This problem can be solved with the help of a fourth member of the Zval (which has several forms) refcount. When a variable is first created and assigned a value, its refcount is initialized to 1 because it is assumed to be used only by the corresponding variable when it was originally created. When your code snippet starts assigning Helloval to $b, it needs to increase the value of RefCount to 2, so that the value is now referenced by two variables:

{
Zval *helloval;
Make_std_zval (Helloval);
Zval_string (Helloval, "Hello World", 1);
Zend_hash_add (EG (active_symbol_table), "a", sizeof ("a"), &helloval, sizeof (zval*), NULL);
Zval_addref (Helloval);
Zend_hash_add (EG (active_symbol_table), "B", sizeof ("B"), &helloval,sizeof (zval*), NULL);
}
Now, when unset () deletes the corresponding copy of $ A for the original variable, it can be seen from the RefCount parameter, and others are interested in that data, so it should just reduce the count of RefCount, and then no longer control it.

   vi. Write Replication (copy on write)

Saving memory through Refcounting is really a good idea, but what happens when you just want to change the value of one of the variables? To do this, consider the following code snippet:

<?php
$a = 1;
$b = $a;
$b + = 5;
? >
With the logic flow above, of course you know that the value of $ A is still equal to 1, and the value of $b will end up being 6. And at this point, you also know that Zend is trying to save memory-by making $ A and $b both reference the same zval (see the second line of code). So what happens when we execute to the third row and have to change the value of the $b variable?

The answer is, Zend to see the value of RefCount, and make sure to detach it at a value greater than 1 o'clock. In the Zend engine, separation is the process of destroying a reference pair, just as opposed to the process you just saw:

Zval *get_var_and_separate (char *varname, int varname_len tsrmls_dc)
{
Zval **varval, *varcopy;
if (Zend_hash_find (EG (active_symbol_table), varname, Varname_len + 1, (void**) &varval) = = FAILURE) {
/* variable does not exist at all-failure results in exit */
return NULL;
}
if ((*varval)->refcount, 2) {
/* VarName is the only actual reference,
* No need for separation
*/
return *varval;
}
/* Otherwise, copy the value of the zval* */
Make_std_zval (varcopy);
Varcopy = *varval;
/* Copy any allocated structures within the zval* */
Zval_copy_ctor (varcopy);
/* Delete old version of VarName
* This will reduce the value of Varval refcount in the process
*/
Zend_hash_del (EG (active_symbol_table), varname, Varname_len + 1);
/* Initializes the reference count of the newly created value and attaches it to the
* VarName Variable
*/
Varcopy->refcount = 1;
Varcopy->is_ref = 0;
Zend_hash_add (EG (active_symbol_table), varname, Varname_len + 1,&varcopy, sizeof (zval*), NULL);
/* Return to new zval* */
return varcopy;
}
Now, since the engine has a zval* (which the engine knows about) that is only variable $b, it can convert this value to a Long value and add 5 to it according to the script's request.

   Vii. Writing changes (Change-on-write)

The introduction of the reference counting concept also leads to a new possibility of data manipulation in the form of a user-space script manager that appears to have a relationship with "references". Consider the following user space code snippet:

<?php
$a = 1;
$b = & $a;
$b + = 5;
? >
In the above PHP code, you can see that the value of $ A is now 6, although it starts at 1 and never (directly) changes. This happens because when the engine starts to increase the value of $b by 5 o'clock, it notices that $b is a reference to $ A and that "I can change that value without separating it, because I want to make all the reference variables see this change".

But how does the engine know? Very simply, it is just a look at the fourth and last element (Is_ref) of the zval structure. This is a simple on/off bit that defines whether the value is actually part of a user-space style reference set. In the preceding code fragment, when the first row is executed, the value created for $ A is worth a refcount of 1, and a is_ref is 0 because it is owned by only one variable ($a) and no other variable has a write-reference change to it. In the second line, the refcount element of this value is incremented to 2, except this time the IS_REF element is set to 1 (because the script contains a "&" symbol to indicate that it is a full reference).

Finally, on the third line, the engine again takes out the values associated with the variable $b and checks if it is necessary to detach. This time the value is not separated, because a check is not included earlier. Here is some of the code in the Get_var_and_separate () function related to the RefCount check:

if ((*varval)->is_ref | | (*varval)->refcount (2) {
/* VarName is the only actual reference,
* or it is a full reference to other variables
* Either way: there's no separation
*/
return *varval;
}
This time, although RefCount is 2, there is no separation because this value is a full reference. The engine is free to modify it without having to worry about changes in other variable values.

  Viii. Separation Issues

Although the replication and reference technologies discussed above already exist, there are still some problems that cannot be solved by IS_REF and refcount operations. Consider the following PHP code block:

$a = 1;
$b = $a;
$c = & $a;
? >
Here, you have a value that needs to be associated with three different variables. Of these, two variables are using the "change-on-write" full reference, and the third variable is in a separable "copy-on-write" (write-copy) context. If you use only Is_ref and refcount to describe this relationship, what are the values that work?

The answer is: no one can work. In this case, the value must be copied to two separate zval*, although both contain exactly the same data (see Figure 2).


Figure 2: Forcing separation when referencing


Similarly, the following block of code causes the same conflict and forces the value to separate a copy (see Figure 3).


Figure 3: Force detach when copying


<?php
$a = 1;
$b = & $a;
$c = $a;
? >
Note that in both cases, the $b is associated with the original Zval object because the engine cannot know the name of the third variable in the operation when the detach occurs.

   ix. Summary

PHP is a managed language. From a general user's point of view, this way of carefully controlling resources and memory means that prototyping is easier to develop and leads to fewer conflicts. However, as we go deep into the "inside", all promises seem to be gone and ultimately rely on a truly responsible developer to maintain the consistency of the entire runtime environment.

http://www.bkjia.com/PHPjc/324319.html www.bkjia.com true http://www.bkjia.com/PHPjc/324319.html techarticle one, memory in PHP, filling a string variable is quite simple, which only requires a statement "

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.