[Practice Good Article] Analysis on the deadlock problem of PHP

Source: Internet
Author: User
Tags signal handler
Background: For the deadlock problem, people tend to think of some very slow, white-page phenomenon, if the test environment (I have a real test environment to talk about the same problem) you will restart the PHP php-fpm process to find out again, after a period of similar problems, you will look at the log, You will find that there are a lot of logs that are "Max execution timeout of seconds exceeded", you will find that this may be caused by some PHP daemon, you in order to solve the problem of the test environment, so think that the PHP-FPM process to open more points, May be better, so you open more, has not faced the cause of this problem, for what, because the company installed PHP is the operation of the dimensional, you have no way or time to install a debug version of PHP, you said this problem let the operation of the people to check, you think you can find out? So, this problem dragged and dragged, but is not solved, but one day you find that the disk is full, with Du to see the whole when found full, but if a directory to see and did not occupy much, also never thought that the deadlock of PHP will also lead to disk space consumption too much, the above situation I have encountered, Later re-reboot the operating system, the disk back, so, I think is a good article, so I turned this article, but also want to explain the extension of the PHP code quality control needs to be strict, and then PHP itself about the lock this piece to weaken (except cookie/session and cache lock, Other can not use, as little as possible with the lock, this is Bo master a little point of view, down to the bottom.

Introduction:

In this issue we invited the Cloud Disk Service team's technology talent-Xu Tiecheng, a long-hidden PHP deadlock problem is the layer dug out, thanks to Tetsushige for us to bring this invigorating experience, small partners, ready for this technology trip?

---------------

Discover problems

Recently discovered disk space alarms for many machines on the line, and log files have been cleaned, but disk space is not released. Via PS aux | grep php-cgi found that many processes start from a few days to weeks or even months ago. The php-cgi on our line have a maximum number of executions. It is usually restarted within 1 days. Preliminary conclusions, these CGI processes are problematic.

The lsof-p [PID] found that some log file handles were opened in the CGI process that started long and did not close. These log files have been deleted in the file system. However, the handle is not closed, causing disk space to not be freed. To this, the problem of disk space anomalies is basically determined. is due to the CGI not closing the file handle.

Further analysis of the process, strace-p [PID], found that all abnormal processes are blocking and Fmutex state. In other words, the abnormal CGI process is deadlocked. A process deadlock causes an open file handle to not close, resulting in a disk space exception.

Why is the CGI process deadlocked?

What is a deadlock?

Learn the operating system through the students, all understand the concept of multithreading. Accessing public resources in multiple threads requires a lock on the resource. After the access is over, release the lock. If the lock is not released, the next thread will never get a lock on the resource when it gets the resource, and the thread is deadlocked. So is CGI a deadlock caused by the access of multi-threaded public resources? The answer is no.

1. CGI is a single-threaded process that can be seen through PS. (Process State SL is a multithreaded process).

2. Even if it is multi-threaded, the deadlock occurs in PHP's shutdown process to call the location of the time function in glibc, not caused by the PHP module. The time-related functions in glibc are thread-safe and do not generate deadlocks.

What is the cause of the deadlock?

By analyzing the mechanism of deadlock in Linux, it is found that the signal processing function can also generate deadlock in addition to multithreading. So CGI is a deadlock caused by signal processing? Introduce a gratitude before this.

function reentrant and Signal security

A function reentrant means that, regardless of the first entry into the function, the function executes normally and returns the result. So are thread-safe functions reentrant? The answer is no. A thread-safe function that acquires a global lock when it accesses a public resource for the first time. If the function does not complete, the lock is not released, and the process is interrupted at this time. Then in the interrupt handler, the function is accessed again, resulting in a deadlock. So what kind of function can be accessed in the interrupt handler function? In addition to functions that do not use global locks, there are some signal safe system calls that can be used. Calling any other non-signal safe function will have unpredictable consequences (such as a deadlock). See Man signal. Before analyzing the cause of deadlocks, let's look at the process of CGI execution and analyze if there is any possibility of deadlocks.

php-cgi the execution process

The time function in GLIBC uses a global lock to ensure that the function is thread safe, but there is no guaranteed signal security (signal safe). After the previous analysis, we initially suspected that the deadlock was due to a signal received by the php-cgi process and then executed a non-signal safe function in the signal handle. The main process is executing the time function in glibc before the interrupt. Enter the interrupt process before the lock obtained by the function is released. The time function in glibc is also accessed in the interrupt process. This led to a deadlock.

The execution flow of the php-cgi, as shown in:

Further analysis found that an error message was recorded in the Sapi_global of all deadlocked CGI processes

"Max execution timeout of seconds exceeded".

60s is the set execution timeout in our php-cgi. So we confirmed that the CIG did produce a timeout exception during execution, and then because LONGJMP entered the shutdown process. The time function in glibc was accessed during the shutdown process. led to a deadlock.

void Zend_set_timeout (Long seconds)

{

Tsrmls_fetch ();

EG (timeout_seconds) = seconds;

if (!seconds) {

Return

}

......

Setitimer (Itimer_prof, &t_r, NULL);

Signal (sigprof, zend_timeout); The Zend exception handler is called here

Sigemptyset (&sigset);

Sigaddset (&sigset, sigprof);

......

}

The GDB debug found that all php-cgi are blocked in Zend_request_shutdown. Zend_request_shutdown invokes the SHUTDOWN function implemented in the user-defined PHP script. If the CGI executes the supermarket, then the timer generates a SIGPROF signal to interrupt the execution process. If the script is in the state of the call time function at this time, and the lock resource has not been freed. The execution process then enters the timeout function and continues to jump to Zend_request_shutdown. In this case, the time function is accessed in the custom shutdown function. will create a deadlock. We found evidence from the code:

Register_shutdown_function (' Simplewebsvc:: Shutdown ');

We use the Qalarm system in PHP code, the Qalarm system will be at the end of the CGI execution (shutdown), the injection of a hook function to analyze whether the CGI execution is normal, if not normal, send alarm information. The time function is accessed by just qalarm the alarm handler function. So there is a certain probability that a deadlock will occur.

Conclusion

From the above analysis, we find the cause of the CGI deadlock, is to use the signal handler in the non-signal safe function, resulting in deadlock.

Solutions

Remove or simplify the hook function that qalarm registers to shutdown. Avoid unsafe function calls.

From: http://www.v2gg.com/lady/shishangzixun/20140924/57266.html

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.