Cause Analysis of php failure to recover after overload

Source: Internet
Author: User

Recently, php servers are unable to provide services after frequent overload. As long as a request is sent, the php process responsible for processing the request occupies 100% of the cpu. The original load balancing policy is that once a machine's php request times out, the weight of the machine will be reduced, and the probability of sending requests to the machine will be reduced. Although there is a certain lag effect, but it should eventually be able to reduce the pressure and restore the service, but this policy suddenly failed recently. In this case, all requests sent to php-fpm are cpu100 % even if an empty php file is requested. This may be caused by eaccelerator. Our Php-fpm request_terminate_timeout is set to 5 s, so as long as a request is executed for more than 5s, it will be killed by php-fpm, A large number of 5s times out before and after the problem occurs. Preliminary conjecture may be caused by the shared memory of the eaccelerator. When the sub-process is killed, the shared memory is written incorrectly, causing errors in all requests, but this does not explain the problem that the new file will get stuck. So I went to the eacceleraotr code and found the following code [cpp] # define spinlock_try_lock (rw) asm volatile ("lock; decl % 0 ":" = m "(rw)-> lock):" memory ") # define _ spinlock_unlock (rw) asm volatile (" lock; incl % 0 ":" = m "(rw)-> lock):" memory ") static int Mm_do_lock (mm_mutex * lock, int kind) {while (1) {spinlock_try_lock (lock); if (lock-> lock = 0) {lock-> pid = getpid (); lock-> locked = 1; return 1;} _ spinlock_unlock (lock); sched_yield ();} return 1;} static int mm_do_unlock (mm_mutex * lock) {if (lock-> locked & (lock-> pid = getpid () {lock-> pid = 0; lock-> locked = 0; _ spinlock_unlock (lock);} return 1;} [cpp] Where mm_mutex points to the shared memory, that is, the eac uses the shared Memory is used as a lock between processes, and the spinlock method is used, so that everything can be explained. Suppose that a process is killed by php-fpm after it gets the lock, and it does not have the unlock. In this way, all php-fpm sub-processes cannot get the lock, so everyone is stuck in this while (1) loop. I guess I have it. How can I confirm it? The original idea was to directly read the shared memory. The result showed that IPC_PRIVATE was used in php, so there was no way to read it. So we can only wait until the online problem occurs and gdb goes up to check the memory. Now we have final evidence [html] (gdb) p * mm-> lock $8 = {lock = 4294966693, pid = 21775, locked = 1} here we can see that the memory has been obtained by the process with the process number 21775, but the fact is that this process has been killed long ago. The problem has been confirmed, so let's look back at the condition 1. The request has been executed for a long time and will be killed by php-fpm. 2. When the process is killed, php is in the require file, in addition, the eac gets the lock. From this point, we can see that some specific situations will increase the probability by 1. The request_terminate_timeout time is short. 2. The auoload method is used, or the require file in the execution logic is used, because if all files are loaded before the request starts, the require file should not be killed unless the require file has timed out. However, there is also an ugly way to avoid this problem using the autload method, that is, to judge in the autload function, if the execution time is too long, directly exit rather than require, I personally think, the best way to solve this problem is to set the request_terminate_timeout time long enough, such as 30 s, 300 s, and put all the timeout judgments on the application layer, which cannot be handled through php-fpm, the fact that php-fpm can only be used for the last heavy insurance, you have to use the insurance. In addition, max_execution_time has a timeout value in php, but this timeout value is cpu time in cgi Mode, so it does not play a major role.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.