One, Memory
In PHP, populating a string variable is fairly straightforward, requiring only one statement "<?php $str = ' Hello World ';? >", and the string can be freely modified, copied, and moved. And in C, although you can write for example "char *str =" Hello World ";" Such a simple static string; However, the string cannot be modified because it lives within the program space. To create a manipulated string, you must allocate a block of memory and copy its contents through a function such as strdup ().
{
Char *str;
str = strdup ("Hello World");
if (!STR) {
fprintf (stderr, "Unable to allocate memory!");
}
}
Because of the various reasons that we will analyze, traditional memory management functions (such as malloc (), free (), StrDup (), realloc (), calloc (), etc.) are rarely used directly for PHP source code.
second, free memory
In almost all platforms, memory management is implemented through a request and release pattern. First, an application requests the layer below it (usually referred to as "operating system"): "I want to use some memory space". If there is space available, the operating system supplies it to the program and marks it so that it does not allocate this part of memory to another program.
When the application has finished using this part of memory, it should be returned to the OS, so that it can continue to be allocated to other programs. If the program does not return this part of memory, then the OS cannot know whether the block of memory is no longer in use and is then allocated to another process. If a block of memory is not released and the owner application loses it, then we say that the application "has a vulnerability" because this part of the memory can no longer be available to other programs.
In a typical client application, a smaller, less frequent memory leak can sometimes be "tolerated" by the OS because the leaking memory is implicitly returned to the OS at the end of the process. This is nothing, because the OS knows which program it assigns the memory to, and it can be sure that the memory is no longer needed when the program terminates.
For long-running server daemons, including Web servers like Apache and extended PHP modules, processes are often designed to run for quite a long time. Because the OS cannot clean up memory usage, any program leak--no matter how small--will result in repetitive operations and eventually drain all system resources.
Now, consider the STRISTR () function in user space; To find a string using a case-insensitive search, it actually creates a small copy of the two strings, and then performs a more traditional case-sensitive search to find the relative offset. However, after the offset of the string is positioned, it no longer uses these lowercase versions of the string. If it does not release these replicas, each script that uses STRISTR () will leak some memory each time it is invoked. Finally, the Web server process will have all of the system memory, but it cannot be used.
You can confidently say that the ideal solution is to write good, clean, consistent code. That's good, of course, but in an environment like the PHP interpreter, that's only half the point.
third, error handling
In order to implement an active request to "jump out" of a user-space script and its dependent extension functions, a method is needed to completely "jump out" an active request. This is implemented within the Zend engine by setting a "bounce" address at the start of a request and then executing a longjmp () to jump to the "out" address at any die () or exit () call or when any critical error (E_ERROR) is encountered.
Although this "jump out" process simplifies the process of program execution, in most cases this would mean skipping the Resource cleanup section (such as free () calls) and eventually leading to a memory leak. Now, let's consider the engine code for the following simplified version of the handler function call:
void call_function (const char *fname, int fname_len tsrmls_dc) {
Zend_function *fe;
Char *lcase_fname;
/* PHP function name is not case sensitive,
* To simplify the positioning of them in the function table,
* All function names are implicitly translated into lowercase
*/
Lcase_fname = Estrndup (fname, Fname_len);
Zend_str_tolower (Lcase_fname, Fname_len);
if (Zend_hash_find (EG (function_table), Lcase_fname, Fname_len + 1, (void * *) &fe) = = failure) {
Zend_execute (Fe->op_array tsrmls_cc);
} else {
Php_error_docref (NULL tsrmls_cc, E_error, "call to undefined function:%s ()", fname);
}
Efree (Lcase_fname);
}
When the line to Php_error_docref () is executed, the internal error processor will understand that the error level is critical and call longjmp () accordingly to interrupt the current program flow and leave the call_function () function. The Efree (lcase_fname) line is not even executed at all. You may want to move the Efree () line of code to the top of the Zend_error () line of code, but what about the code lines that call this call_function () routine? The fname itself is likely to be an assigned string, and you cannot release it until it is used by error message processing.
Note that this php_error_docref () function is an internal equivalence implementation of the Trigger_error () function. Its first argument is an optional document reference that will be added to the docref. The third parameter can be any familiar e_* family constant that indicates the severity of the error. The fourth parameter (the last one) follows the format of the printf () style and the variable parameter list style.
Four, Zend Memory manager
One of the scenarios for resolving memory leaks during the "jump out" request above is to use the Zend Memory management (ZENDMM) layer. This part of the engine is very similar to the memory management behavior of the operating system-allocating memory to the calling program. The difference is that it is in a very low position in the process space and is "request aware", so that when a request ends, it can perform the same behavior as the OS at the end of a process. That is, it implicitly releases all the memory that is occupied by the request. Figure 1 shows the relationship between the ZENDMM and the OS and the PHP process.
Figure 1. Zend Memory Manager Instead of system calls to implement memory allocations for each request.
In addition to providing implicit memory cleanup, ZENDMM can control the use of each memory request based on Memory_limit settings in php.ini. If a script tries to request more memory than is available in the system, or is larger than the maximum amount it should request each time, ZENDMM automatically emits a E_ERROR message and initiates the corresponding "bounce" process. An added advantage of this approach is that the return value of most memory allocation calls does not need to be checked, because failure would cause immediate jumps to the engine's exit section.
The principle of "hooking" the PHP internal code to the OS's actual memory management layer is not complicated: all internally allocated memory is implemented with a specific set of optional functions. For example, the PHP code does not use malloc (16) to allocate a 16-byte memory block, but instead uses Emalloc (16). In addition to implementing the actual memory allocation task, ZENDMM uses the appropriate binding request type to flag the memory block, so that when a request "jumps out", ZENDMM can implicitly release it.
Often, memory needs to be allocated for a longer period of time than a single request. This type of assignment, which is known as "permanent allocation" after the end of a request, can be implemented using a traditional memory allocator because these allocations do not add the additional information that ZENDMM uses for each request. Sometimes, however, it is not until the runtime determines whether a particular assignment needs to be permanently allocated, so ZENDMM exports a set of help macros that behave like other memory allocation functions, but use the last additional parameter to indicate whether it is permanent.
If you do want to implement a permanent assignment, the parameter should be set to 1, in which case the request is passed through the traditional malloc () allocator family. However, if the runtime logic considers that the block does not need to be permanently allocated, then this parameter can be set to zero, and the call will be adjusted to the memory allocator function for each request.
For example, Pemalloc (buffer_len,1) will be mapped to malloc (Buffer_len), and Pemalloc (buffer_len,0) will be mapped to Emalloc (Buffer_len) using the following statement:
#define IN zend/zend_alloc.h:
#define PEMALLOC (size, persistent) (persistent)? malloc (size): emalloc (size)
All of these allocator functions provided in ZENDMM are able to find their more traditional counterpart implementations from the table below.
Table 1 shows each of the allocator functions supported by ZENDMM and their e/pe corresponding implementations:
Table 1. Traditional type relative to PHP specific allocator.
Allocator function |
E/pe Correspondence Implementation |
void *malloc (size_t count); |
void *emalloc (size_t count); void *pemalloc (size_t count,char persistent); |
void *calloc (size_t count); |
void *ecalloc (size_t count); void *pecalloc (size_t count,char persistent); |
void *realloc (void *ptr,size_t count); |
void *erealloc (void *ptr,size_t count); void *perealloc (void *ptr,size_t count,char persistent); |
void *strdup (void *ptr); |
void *estrdup (void *ptr); void *pestrdup (void *ptr,char persistent); |
void free (void *ptr); |
void Efree (void *ptr); void Pefree (void *ptr,char persistent); |
You may notice that even the Pefree () function requires the use of a permanent flag. This is because when Pefree () is invoked, it does not actually know if PTR is a permanent assignment. Calling free () on a non-persistent assignment can result in double space deallocation, while calling Efree () for a permanent assignment can cause a segment error because the memory manager tries to find management information that does not exist. Therefore, your code needs to remember whether the data structure it allocates is permanent.
In addition to the central part of the allocator function, there are some other very convenient ZENDMM specific functions, such as:
void *estrndup (void *ptr,int len);
The function can allocate len+1 bytes of memory and copy len bytes from the PTR to the most recently allocated block. The behavior of this estrndup () function can be broadly described as follows:
void *estrndup (void *ptr, int len)
{
Char *dst = emalloc (len + 1);
memcpy (DST, PTR, Len);
Dst[len] = 0;
return DST;
}
Here, the null byte that is implicitly placed at the end of the buffer ensures that any function that implements a string copy operation using Estrndup () does not need to worry about passing the result buffer to a function such as printf () that expects null as a terminator. When using Estrndup () to copy non string data, the last byte is essentially wasted, but the benefits are significantly greater than the disadvantages.
void *safe_emalloc (size_t size, size_t count, size_t ADDTL);
void *safe_pemalloc (size_t size, size_t count,size_t addtl,char persistent);
The final size of the memory space allocated by these functions is ((Size*count) +ADDTL). You can ask, "Why do you have to provide extra functions?" Why not use a emalloc/pemalloc? "The reason is simple: for security. Although the probability is sometimes quite small, it is this "very unlikely" result that the host platform is overrun with memory. This can cause a negative number of bytes of space to be allocated, or worse, result in allocating a byte space smaller than the size required by the calling program. Safe_emalloc () can avoid this type of trap-by checking the integer overflow and explicitly ending it when such an overflow occurs.
Note that not all memory allocation routines have a corresponding p* Peer-to-peer implementation. For example, there is no pestrndup (), and Safe_pemalloc () does not exist before the PHP 5.1 version.
v. Reference count
Discreet memory allocations and releases have an extremely significant impact on the long-term nature of PHP, which is a multiple-request process, but this is only half the problem. In order to efficiently run a server that processes thousands of clicks per second, each request needs to use as little memory as possible and to minimize unnecessary data replication operations. Consider the following PHP code snippet:
$a = ' Hello world ';
$b = $a;
unset ($a);
? >
After the first call, only one variable is created, and a 12-byte block of memory is assigned to it to store the string "Hello world", including a null character at the end. Now, let's look at the following two lines: The $b is set to the same value as the variable $a, and then the variable $a is released.
If PHP replicates variable content for each variable assignment, the string to be copied in the previous example also requires an additional 12 bytes to be copied and another processor load during data replication. This behavior may seem absurd at first, because when the third line of code appears, the original variable is released, making the entire data copy seem completely unnecessary. In fact, we might as well think about it a little further, and let's imagine what happens when the contents of a 10MB sized file are loaded into two variables. This will take up 20MB of space, at which point 10 is enough. Will the engine waste so much time and memory on such a useless endeavor?
You know, PHP's designers have long been familiar with this.
Remember, in the engine, variable names and their values are actually two different concepts. The value itself is an unnamed zval* storage body (in this case, a string value) that is assigned to the variable $a by Zend_hash_add (). What happens if two variable names point to the same value?
{
Zval *helloval;
Make_std_zval (Helloval);
Zval_string (Helloval, "Hello World", 1);
Zend_hash_add (EG (active_symbol_table), "a", sizeof ("a"), &helloval, sizeof (zval*), NULL);
Zend_hash_add (EG (active_symbol_table), "B", sizeof ("B"), &helloval, sizeof (zval*), NULL);
}
At this point, you can actually observe $a or $b, and you'll see that they all contain the string "Hello world". Unfortunately, next, you continue to execute the third line of code "unset ($a);" At this point, unset () does not know that the data that the $a variable points to is also used by another variable, so it simply releases the memory blindly. Any subsequent access to the variable $b will be parsed into the freed memory space and thus cause the engine to crash.
This problem can be solved by means of the fourth member refcount of Zval (which has several forms). When a variable is first created and assigned a value, its refcount is initialized to 1 because it is assumed to be used only by the corresponding variable when it was originally created. When your code snippet starts assigning Helloval to $b, it needs to increase the value of RefCount to 2, so that the value is now referenced by two variables:
{
Zval *helloval;
Make_std_zval (Helloval);
Zval_string (Helloval, "Hello World", 1);
Zend_hash_add (EG (active_symbol_table), "a", sizeof ("a"), &helloval, sizeof (zval*), NULL);
Zval_addref (Helloval);
Zend_hash_add (EG (active_symbol_table), "B", sizeof ("B"), &helloval,sizeof (zval*), NULL);
}
Now, when unset () deletes the corresponding copy of the $a of the original variable, it can be seen from the RefCount parameter, and others are interested in the data, so it should just reduce the refcount count, and then no longer manage it.
Six, write copy (copy on write)
Saving memory through refcounting is a good idea, but what happens when you just want to change the value of one of the variables? To do this, consider the following code fragment:
$a = 1;
$b = $a;
$b + 5;
? >
Through the logical process above, you know that the value of $a is still equal to 1, and the $b value will be 6. And at this point, you know, Zend is trying to conserve memory-by making both $a and $b reference the same zval (see the second line of code). So what happens when you execute to the third row and you have to change the value of the $b variable?
The answer is, Zend to see the value of RefCount, and make sure that its value is greater than 1 o'clock to detach it. In the Zend engine, separation is the process of destroying a reference pair, just as opposed to the process you just saw:
Zval *get_var_and_separate (char *varname, int varname_len tsrmls_dc)
{
Zval **varval, *varcopy;
if (Zend_hash_find (EG (active_symbol_table), varname, Varname_len + 1, (void**) &varval) = = failure) {
/* variable does not exist at all-failed to cause exit/
return NULL;
}
if ((*varval)->refcount 2) {
/* VarName is the only actual reference,
* No need for separation
*/
return *varval;
}
/* Otherwise, duplicate a copy of the zval* value * *
Make_std_zval (varcopy);
Varcopy = *varval;
/* Copy any assigned structure within the zval* * *
Zval_copy_ctor (varcopy);
/* Delete old version of VarName
* This will reduce the value of the Varval refcount in the process
*/
Zend_hash_del (EG (active_symbol_table), varname, Varname_len + 1);
/* Initializes a reference count of the newly created value and attaches it to the
* VarName Variable
*/
Varcopy->refcount = 1;
Varcopy->is_ref = 0;
Zend_hash_add (EG (active_symbol_table), varname, Varname_len + 1,&varcopy, sizeof (zval*), NULL);
/* Return the new zval* * *
return varcopy;
}
Now that the engine has a zval* (which the engine knows about) that is only a variable $b, it can convert the value to a long value and increase it by 5 as requested by the script.
Vii. Writing changes (Change-on-write)
The introduction of the reference counting concept also leads to a new data manipulation possibility, which appears to be related to "references" in the form of user space scripting Manager. Consider the following user space code snippets:
$a = 1;
$b = & $a;
$b + 5;
? >
In the PHP code above, you can see that the value of $a is now 6, even though it starts with 1 and never changes (directly). This happens because when the engine starts adding the value of the $b to 5 o'clock, it notices that $b is a reference to $a and that "I can change the value without separating it because I want to make all the reference variables see this change."
But how does the engine know? Quite simply, it just looks at the fourth and last element (Is_ref) of the zval structure. This is a simple on/off position that defines whether the value is actually part of a user-space-style reference set. In the preceding code fragment, when the first row is executed, the value created for $a is worth a refcount of 1, and a is_ref is 0, because it is owned by only one variable ($a) and no other variable has a write-reference change to it. In the second line, the refcount element of this value is incremented to 2, except that the IS_REF element is set to 1 (because the script contains a "&" symbol to indicate a full reference).
Finally, in the third line, the engine again takes out the value associated with the variable $b and checks to see if it is necessary to detach. This time the value was not detached because a check was not included in the front. The following are some of the code related to the RefCount check in the Get_var_and_separate () function:
if ((*varval)->is_ref | | (*varval)->refcount 2) {
/* VarName is the only actual reference,
* or it is a full reference to other variables
* In either way: no separation
*/
return *varval;
}
This time, although RefCount is 2, there is no separation because the value is a full reference. The engine is free to modify it without having to worry about changes in the value of other variables.
Viii. Separation Issues
Although the replication and referral technologies discussed above are already in existence, there are some problems that cannot be solved through is_ref and refcount operations. Consider the following PHP code block:
$a = 1;
$b = $a;
$c = & $a;
? >
Here you have a value that needs to be associated with three different variables. Of these, two variables use the "Change-on-write" full reference, while the third variable is in a separable "copy-on-write" (write-copy) context. If only Is_ref and refcount are used to describe the relationship, what values can work?
The answer is: no one can work. In this case, this value must be replicated to two separate zval*, although both contain exactly the same data (see Figure 2).
Figure 2. Forced separation when referencing
Similarly, the following code block causes the same conflict and forces the value to detach a copy (see Figure 3).
Figure 3. Forced separation during replication
$a = 1;
$b = & $a;
$c = $a;
? >
Note that in both cases, the $b is associated with the original Zval object, because the engine cannot know the name of the third variable in the operation when the detach occurs.
ix. Summary
PHP is a managed language. From the general user's point of view, this way of carefully controlling resources and memory means that prototyping can be easier to develop and cause fewer conflicts to occur. However, when we go deep into the "inside", all the promises seem to be gone, and ultimately rely on the truly responsible developers to maintain the consistency of the environment throughout the run time.