502 Bad Gateway Cause Analysis
Note:
LNMP architecture.
The crontab contains a task to restart php every 20 minutes. Then, I use python to write a script to check whether the php-cgi process exists every 1 minute. If it does not exist, I call the script to restart php, and notify the Administrator by email. Both crontab and python scripts call the/usr/local/webserver/php/restart_php.sh script with the root permission. According to the following email notification, the restart php task failed in crontab at and (the site shows 502 Bad Gateway because nginx cannot find the reverse proxy ). The next minute, the python script detects that the php-cgi listening process on port 9000 does not exist, and then calls the restart_php.sh script again. Nginx 502 error trigger condition and solution summary LNMP common 502 Bad Gateway problem summary let's take a look at the php log (/uar/local/webserver/php/log/php-fpm.log ), why does crontab fail to be restarted. SIGCHLD signal description: "A sub-process (except init) does not disappear immediately after exit. Instead, it leaves a data structure called a zombie process, waiting for processing by the parent process. This is a stage that every sub-process must go through. When the child process exits, it sends a SIGCHLD signal to its parent process ." However, the parent process does not wait for the child process to end completely, so what will happen? Let's take a look at restart_php.sh: the php-fpm stop command will send a SIGTERM signal to the php-cgi sub-process. To confirm that all processes are terminated, killall is called twice, then run the php-fpm start command. However, at this time, all the sub-processes are not necessarily terminated. Therefore, if php-fpm start at this time, because the sub-process has not released the bound port (9000 ), port binding will fail: Check the logs. After the port binding fails, logs of sub-processes exiting will be printed one after another, again, the sub-processes are not kill when php-fpm start is started. At, The python script calls restart_php.sh again: note that four sub-processes only run for 59 seconds (compare with the previous log, the normal running sub-process should be 1200 seconds-20 minutes), which is the four sub-processes generated. (If the restart is normal, it should be 75 sub-processes. Because the port binding fails, other sub-processes fail to be started.) then the restart will succeed. Why will this restart be successful? Because the second restart is one minute after the first restart, if the first restart is successful, the second restart will not be performed; the first restart will fail (four sub-processes will be generated ), then, the first four sub-processes will be killed during the second restart. Because there are only four sub-processes, the kill will soon be very clean, and then the port binding error will not occur when php-fpm start is executed. After knowing the cause, the solution is simple. There is only one purpose: ensure that the sub-process has completely exited before php-fpm start. (1) Use the killall-w parameter. (2) Use commands such as ps and pgrep to check whether the php-cgi sub-process exists. (3) sleep for several seconds before php-fpm start. I am using the solution (1). Currently, no problems have occurred during the test. The following log shows that php-fpm start is about two seconds away from php-fpm's full exit. In addition, we can see from the following logs that the 502 Bad Gateway problem was rare in 2015, and port February 22, 2016 binding errors frequently occurred since 9000. I don't know where it was changed. For more details, please continue to read the highlights on the next page: