Article Source: "https://www.centos.bz/2015/04/handle-nginx-write-io-problem/"
Failure phenomena
Suddenly received a server load alarm, followed by slow Web site.
Fault analysis
- 1, log on to the server, using the top command to see the CPU line iowait reached more than 70%, so determined to be the cause of high IO load;
- 2, then use the Iotop-o command found, Nginx write Io is particularly large, and in the previous step of the top command to see Nginx process status of D, indicating that Nginx is waiting for IO has been zombie state;
- 3, this time is clear that is nginx in the file system for a large number of write operations caused by the system load is too high, but still can not know the specific nginx in writing what file caused the load pressure, so we still need to continue to track down;
- 4, we find one of the Nginx worker process PID, using lsof-p pid listed files found in addition to some system library files and log files, there are a considerable number of fastcgi_temp/xxx files, may be associated with these files;
- 5, again using STRACE-P PID tracking, found that nginx process to a large number of FD write operations, and the lsof command lists out of the file just match;
- 6, the use of Iostat 1 output of a large number of write IO partition is also in line with the Fastcgi_temp region;
- 7, speculation may be the external is uploading a large number of large files to php-fpm, so through the ezhttp of small tools to view real-time traffic, found that the inbound traffic is not really small.
Analysis results
According to the above fault analysis, it is very likely that some programs in this machine upload a large number of large files via HTTP. Because it is not familiar with the program logic, it is just speculation. In order to restore service as soon as possible, the following solutions are implemented.
Solution Solutions
Since it is clear that the fastcgi_temp IO pressure is large, can not be a short time to fundamentally solve the problem, so decided to Fastcgi_temp point to/dev/shm, that is, mapped to memory, restart Nginx after the service resumed normal. The final cause needs to be developed in conjunction with the solution.
Nginx Write IO occupies high fault handling