for the deadlock problem, people often think of something about the visit is very slow, there is a white page phenomenon, if the test environment you will restart the PHP php-fpm process to find out again, after a period of similar problems, this issue we have invited Brother Lian PHP Education www.lampbrother.net the PHP Daniel for everyone to share , a long-hidden PHP deadlock problem is the layer dug out, thank Tetsushige for us to bring this invigorating experience, small partners, ready for this technology trip?
---------------
Discover problems
Recently discovered disk space alarms for many machines on the line, and log files have been cleaned, but disk space is not released. Via PS aux | grep php-cgi found that many processes start from a few days to weeks or even months ago. The php-cgi on our line have a maximum number of executions. It is usually restarted within 1 days. Preliminary conclusions, these CGI processes are problematic.
The lsof-p [PID] found that some log file handles were opened in the CGI process that started long and did not close. These log files have been deleted in the file system. However, the handle is not closed, causing disk space to not be freed. To this, the problem of disk space anomalies is basically determined. is due to the CGI not closing the file handle.
Further analysis of the process, strace-p [PID], found that all abnormal processes are blocking and Fmutex state. In other words, the abnormal CGI process is deadlocked. A process deadlock causes an open file handle to not close, resulting in a disk space exception.
Why is the CGI process deadlocked?
What is a deadlock?
Learn the operating system through the students, all understand the concept of multithreading. Accessing public resources in multiple threads requires a lock on the resource. After the access is over, release the lock. If the lock is not released, the next thread will never get a lock on the resource when it gets the resource, and the thread is deadlocked. So is CGI a deadlock caused by the access of multi-threaded public resources? The answer is no.
1. CGI is a single-threaded process that can be seen through PS. (Process State SL is a multithreaded process).
2. Even if it is multi-threaded, the deadlock occurs in PHP's shutdown process to call the location of the time function in glibc, not caused by the PHP module. The time-related functions in glibc are thread-safe and do not generate deadlocks.
What is the cause of the deadlock?
By analyzing the mechanism of deadlock in Linux, it is found that the signal processing function can also generate deadlock in addition to multithreading. So CGI is a deadlock caused by signal processing? Introduce a gratitude before this.
function reentrant and Signal security
A function reentrant means that, regardless of the first entry into the function, the function executes normally and returns the result. So are thread-safe functions reentrant? The answer is no. A thread-safe function that acquires a global lock when it accesses a public resource for the first time. If the function does not complete, the lock is not released, and the process is interrupted at this time. Then in the interrupt handler, the function is accessed again, resulting in a deadlock. So what kind of function can be accessed in the interrupt handler function? In addition to functions that do not use global locks, there are some signal safe system calls that can be used. Calling any other non-signal safe function will have unpredictable consequences (such as a deadlock). See Man signal. Before analyzing the cause of deadlocks, let's look at the process of CGI execution and analyze if there is any possibility of deadlocks.
php-cgi the execution process
The time function in GLIBC uses a global lock to ensure that the function is thread safe, but there is no guaranteed signal security (signal safe). After the previous analysis, we initially suspected that the deadlock was due to a signal received by the php-cgi process and then executed a non-signal safe function in the signal handle. The main process is executing the time function in glibc before the interrupt. Enter the interrupt process before the lock obtained by the function is released. The time function in glibc is also accessed in the interrupt process. This led to a deadlock.
The execution flow of the php-cgi, as shown in:
Further analysis found that an error message was recorded in the Sapi_global of all deadlocked CGI processes
"Max execution timeout of seconds exceeded".
60s is the set execution timeout in our php-cgi. So we confirmed that the CIG did produce a timeout exception during execution, and then because LONGJMP entered the shutdown process. The time function in glibc was accessed during the shutdown process. led to a deadlock.
void Zend_set_timeout (Long seconds)
{
Tsrmls_fetch ();
EG (timeout_seconds) = seconds;
if (!seconds) {
Return
}
......
Setitimer (Itimer_prof, &t_r, NULL);
Signal (sigprof, zend_timeout); The Zend exception handler is called here
Sigemptyset (&sigset);
Sigaddset (&sigset, sigprof);
......
}
The GDB debug found that all php-cgi are blocked in Zend_request_shutdown. Zend_request_shutdown invokes the SHUTDOWN function implemented in the user-defined PHP script. If the CGI executes the supermarket, then the timer generates a SIGPROF signal to interrupt the execution process. If the script is in the state of the call time function at this time, and the lock resource has not been freed. The execution process then enters the timeout function and continues to jump to Zend_request_shutdown. In this case, the time function is accessed in the custom shutdown function. will create a deadlock. We found evidence from the code:
Register_shutdown_function (' Simplewebsvc:: Shutdown ');
We use the Qalarm system in PHP code, the Qalarm system will be at the end of the CGI execution (shutdown), the injection of a hook function to analyze whether the CGI execution is normal, if not normal, send alarm information. The time function is accessed by just qalarm the alarm handler function. So there is a certain probability that a deadlock will occur.
Conclusion
From the above analysis, we find the cause of the CGI deadlock, is to use the signal handler in the non-signal safe function, resulting in deadlock.
Solutions
Remove or simplify the hook function that qalarm registers to shutdown. Avoid unsafe function calls.
Analyze the deadlock problem with PHP