This article mainly introduces the implementation of PHP through the Strace location fault causes of the method, has a certain reference value, now share to everyone, the need for friends can refer to
In this paper, the author introduces a method to locate the fault in PHP through Strace. Share to everyone for your reference, as follows:
As the saying goes: Not afraid of thieves to steal, afraid of thieves. In the face of failure, I also have a similar feeling: not afraid of failure, afraid you do not know the cause of the failure, but go out to find the door.
11 long holiday is not over, the server is now high load, nginx error log:
Connect () failed (110:connection timed out) while connecting to upstream
Connect () failed (111:connection refused) while connecting to upstream
It seems that upstream is out of the question, in this case upstream is PHP (version: 5.2.5). Unfortunately the monitoring is not perfect, I do not know exactly what is the problem, but under the forced to constantly restart PHP to alleviate the failure.
If it's a chore to manually restart the service every time, fortunately it can be done by cron setup every minute:
#/bin/bashload=$ (awk ' {print '} '/proc/loadavg) if [$ (echo $LOAD > | BC) = 1]; Then /etc/init.d/php-fpm Restartfi
Unfortunately, this is only a stopgap measure, and if you want to solve it, you have to find out what is the real reason for the failure.
Don't talk about it, it's Strace's turn to play, count the time-consuming situations of each system call:
Shell> strace-c-P $ (pgrep-n php-cgi)% time seconds Usecs/call calls errors syscall--------------------------- -----------------------------------30.53 0.023554 179 brk 14.71 0.011350, Bayi mlock 12 .0.009798 658 Recvfrom 8.96 0.006910 7 927 read 6.61 0.005097, 119 accept 5.57 0.004294 4 977 poll 3.13 0.002415 7 359 write 2.82 0.002177 7 311 sendto 2 .0.002033 2 1201 1 stat 2.27 0.001750 1 2312 gettimeofday 2.11 0.001626 1 1428 RT _sigaction 1.55 0.001199 2 730 fstat 1.29 0.000998-Connect 1.03 0.000792 4 17 8 Shutdown 1.00 0.000773 2 492 open 0.93 0.000720 1 711 close 0.49 0.000381 2 23 8 chdir 0.35 0.000271 3 Select 0.29 0.000224 1 357 setitimer 0.21 0.000159 2 Bayi Munlock 0.17 0.000133 2 getsockopt 0.14 0.000110 1 149 lseek 0.14 0.000106 1 121 mmap 0.11 0 .000086 1 121 Munmap 0.09 0.000072 0 238 rt_sigprocmask 0.08 0.000063 4 Lstat 0.07 0.000054 0 313 uname 0.00 0.000000 0 1 Access 0.00 0.000000 0 Socket 0.00 0.000000 0 101 setsockopt 0.00 0.000000 0 277 fcntl----------------------------------- ---------------------------100.00 0.077145 13066 118 Total
It looked "brk" very suspicious, it took 30% of the time, to be safe, to confirm separately:
Shell> Strace-t-E brk-p $ (pgrep-n php-cgi) brk (0x1f18000) = 0x1f18000 <0.024025>brk (0x1f58000) = 0x1f58000 < ; 0.015503>brk (0x1f98000) = 0x1f98000 <0.013037>brk (0x1fd8000) = 0x1fd8000 <0.000056>brk (0x2018000) = 0x2018000 <0.012635>
Note: There are two options related to Strace and operation time, respectively, "-r" and "-t", the difference being that "-r" represents relative time, and "-t" represents absolute time. Simple statistics can be used "-r", but it is important to note that in the multi-tasking background, the CPU may be switched out to do other things at any time, so the relative timing is not necessarily accurate, at this time the best use of "-t", at the end of the line can see the operation time, can be found indeed very slow.
Before we continue to locate the cause of the failure, we first check its meaning by "man brk":
BRK () Sets the end of the data segment to the value specified by End_data_segment, when that value is reasonable, the Syst EM does has enough memory and the process does not exceed it max data size (see Setrlimit (2)).
The simple point is that when the memory is not enough to apply for new memory (data segment), but why?
Shell> STRACE-T-P $ (pgrep-n php-cgi) 2>&1 | Grep-b brkstat ("/path/to/script.php", {...}) = 0 <0.000064>brk (0x1d9a000) = 0x1d9a000 <0.000067>brk ( 0x1dda000) = 0x1dda000 <0.001134>brk (0x1e1a000) = 0x1e1a000 <0.000065>brk (0x1e5a000) = 0x1e5a000 < 0.012396>BRK (0x1e9a000) = 0x1e9a000 <0.000092>
Through "grep" we can easily get the relevant context, repeatedly run several times, found that whenever some PHP script request, there will be a number of time-consuming "brk", and these PHP scripts have a common feature, is very large, even hundreds of K, why this large PHP script? In fact, the programmer in order to avoid database operations, the very large array of variables through "var_export" persisted to the PHP file, and then in the program through the "include" to get the corresponding variable, because the variable is too large, so PHP has to frequently execute " brk", unfortunately in this example environment, this operation is slow, resulting in processing requests for too long, in addition to the number of PHP process is limited, so on Nginx caused the request congestion, resulting in high load failure.
The following need to verify that the inference seems to be correct, first of all to find out what is involved in the problem script:
shell> find/path-name "*.php" | Xargs grep "script.php"
Just put them all disabled, and see if the server can be slow, perhaps people think this is too Lu Meng, but special circumstances must make a special decision, not like a woman indecisive, not long, the server load back to normal, and then count the time of the system call:
Shell> strace-c-P $ (pgrep-n php-cgi)% time seconds Usecs/call calls errors syscall--------------------------- -----------------------------------24.50 0.001521 138 2 recvfrom 16.11 0.001000 PT 7.86 0.000488 8 sendto 7.35 0.000456 1 rt_sigaction 6.73 0.000418 2 198 Poll 5.72 0.000355 1 285 stat 4.54 0.000282 0 573 gettimeofday 4.41 0.000274 7 42 Shutdown 4.40 0.000273 2 137 open 3.72 0.000231 1 197 fstat 2.93 0.000182 1 187 Close 2.56 0.000159 2 Setitimer 2.13 0.000132 1 244 Read 1.71 0.000106 4 30 Munmap 1.16 0.000072 1 chdir 1.13 0.000070 4 setsockopt 1.05 0.000065 1 100 Write 1.05 0.000065 1 lseek 0.95 0.000059 1 uname 0.00 0.000000 0 30 Mmap 0.00 0.000000 0 rt_sigprocmask 0.00 0.000000 0 3 2 Access 0.00 0.000000 0 9 Select 0.00 0.0000 0 Socket 0.00 0.000000 0 Connect 0.00 0.000000 0 getsockopt 0.00 0. 000000 0 fcntl 0.00 0.000000 0 9 Mlock 0.00 0.000000 0 9 Munlock----------- ---------------------------------------------------100.00 0.006208 3119 Total
Obviously, "brk" is gone and replaced by "recvfrom" and "accept", but these operations are time-consuming, so locating "brk" is the cause of the failure.
...
Hug failure, every failure is a experience. is the so-called: the day will be down to the great people also, will first bitter his mind, labor its bones, hungry its body skin, empty its body, the line to whisk its own, so tempted to endure sex, gain it can not.