PHP solves the fault location by strace positioning and resolves

Source: Internet
Author: User
This time for everyone to bring PHP through the Strace location troubleshooting and solve the problem, PHP through the location of Strace location and solve the problem of the points of attention, the following is the actual case, to see.

As the saying goes: Not afraid of thieves to steal, afraid of thieves. In the face of failure, I also have a similar feeling: not afraid of failure, afraid you do not know the cause of the failure, but go out to find the door.

11 long holiday is not over, the server is now high load, nginx error log:

Connect () failed (110:connection timed out) while connecting to upstream
Connect () failed (111:connection refused) while connecting to upstream

It seems that upstream is out of the question, in this case upstream is PHP (version: 5.2.5). Unfortunately the monitoring is not perfect, I do not know exactly what is the problem, but under the forced to constantly restart PHP to alleviate the failure.

If it's a chore to manually restart the service every time, fortunately it can be done by cron setup every minute:

#/bin/bashload=$ (awk ' {print '} '/proc/loadavg) if [$ (echo $LOAD > | BC) = 1]; Then  /etc/init.d/php-fpm Restartfi

Unfortunately, this is only a stopgap measure, and if you want to solve it, you have to find out what is the real reason for the failure.

Don't talk about it, it's Strace's turn to play, count the time-consuming situations of each system call:

Shell> strace-c-P $ (pgrep-n php-cgi)% time seconds Usecs/call calls errors syscall--------------------------- -----------------------------------30.53 0.023554 179 brk 14.71 0.011350, Bayi mlock 12  .0.009798 658 Recvfrom 8.96 0.006910 7 927 read 6.61 0.005097, 119 accept 5.57 0.004294 4 977 poll 3.13 0.002415 7 359 write 2.82 0.002177 7 311 sendto 2 .0.002033 2 1201 1 stat 2.27 0.001750 1 2312 gettimeofday 2.11 0.001626 1 1428 RT _sigaction 1.55 0.001199 2 730 fstat 1.29 0.000998-Connect 1.03 0.000792 4 17 8 Shutdown 1.00 0.000773 2 492 open 0.93 0.000720 1 711 close 0.49 0.000381 2 23    8 chdir 0.35 0.000271 3 Select 0.29 0.000224 1 357 setitimer 0.21 0.000159 2 Bayi Munlock 0.17 0.000133 2 getsockopt 0.14 0.000110 1 149 lseek 0.14 0.000106 1 121 mmap 0.11 0 .000086 1 121 Munmap 0.09 0.000072 0 238 rt_sigprocmask 0.08 0.000063 4 Lstat 0.07 0.000054 0 313 uname 0.00 0.000000 0 1 Access 0.00 0.000000 0 Socket 0.00 0.000000 0 101 setsockopt 0.00 0.000000 0 277 fcntl----------------------------------- ---------------------------100.00 0.077145 13066 118 Total

It looked "brk" very suspicious, it took 30% of the time, to be safe, to confirm separately:

Shell> Strace-t-E brk-p $ (pgrep-n php-cgi) brk (0x1f18000) = 0x1f18000 <0.024025>brk (0x1f58000) = 0x1f58000 < ; 0.015503>brk (0x1f98000) = 0x1f98000 <0.013037>brk (0x1fd8000) = 0x1fd8000 <0.000056>brk (0x2018000) = 0x2018000 <0.012635>

Note: There are two options related to Strace and operation time, respectively, "-r" and "-t", the difference being that "-r" represents relative time, and "-t" represents absolute time. Simple statistics can be used "-r", but it is important to note that in the multi-tasking background, the CPU may be switched out to do other things at any time, so the relative timing is not necessarily accurate, at this time the best use of "-t", at the end of the line can see the operation time, can be found indeed very slow.

Before we continue to locate the cause of the failure, we first check its meaning by "man brk":

BRK () Sets the end of the data segment to the value specified by End_data_segment, when that value is reasonable, the Syst EM does has enough memory and the process does not exceed it max data size (see Setrlimit (2)).

The simple point is that when the memory is not enough to apply for new memory (data segment), but why?

Shell> STRACE-T-P $ (pgrep-n php-cgi) 2>&1 | Grep-b brkstat ("/path/to/script.php", {...}) = 0 <0.000064>brk (0x1d9a000) = 0x1d9a000 <0.000067>brk ( 0x1dda000) = 0x1dda000 <0.001134>brk (0x1e1a000) = 0x1e1a000 <0.000065>brk (0x1e5a000) = 0x1e5a000 < 0.012396>BRK (0x1e9a000) = 0x1e9a000 <0.000092>

Through "grep" we can easily get the relevant context, repeatedly run several times, found that whenever some PHP script request, there will be a number of time-consuming "brk", and these PHP scripts have a common feature, is very large, even hundreds of K, why this large PHP script? In fact, the programmer in order to avoid database operations, the very large array of variables through "var_export" persisted to the PHP file, and then in the program through the "include" to get the corresponding variable, because the variable is too large, so PHP has to frequently execute " brk", unfortunately in this example environment, this operation is slow, resulting in processing requests for too long, in addition to the number of PHP process is limited, so on Nginx caused the request congestion, resulting in high load failure.

The following need to verify that the inference seems to be correct, first of all to find out what is involved in the problem script:

shell> find/path-name "*.php" | Xargs grep "script.php"

Just put them all disabled, and see if the server can be slow, perhaps people think this is too Lu Meng, but special circumstances must make a special decision, not like a woman indecisive, not long, the server load back to normal, and then count the time of the system call:

Shell> strace-c-P $ (pgrep-n php-cgi)% time seconds Usecs/call calls errors syscall--------------------------- -----------------------------------24.50 0.001521 138 2 recvfrom 16.11 0.001000      PT 7.86 0.000488 8 sendto 7.35 0.000456 1 rt_sigaction 6.73 0.000418 2 198      Poll 5.72 0.000355 1 285 stat 4.54 0.000282 0 573 gettimeofday 4.41 0.000274 7 42      Shutdown 4.40 0.000273 2 137 open 3.72 0.000231 1 197 fstat 2.93 0.000182 1 187      Close 2.56 0.000159 2 Setitimer 2.13 0.000132 1 244 Read 1.71 0.000106 4 30      Munmap 1.16 0.000072 1 chdir 1.13 0.000070 4 setsockopt 1.05 0.000065 1 100      Write 1.05 0.000065 1 lseek 0.95 0.000059 1 uname 0.00 0.000000 0 30     Mmap 0.00 0.000000 0 rt_sigprocmask 0.00 0.000000 0 3 2 Access 0.00 0.000000 0 9 Select 0.00 0.0000 0 Socket 0.00 0.000000 0 Connect 0.00 0.000000 0 getsockopt 0.00 0. 000000 0 fcntl 0.00 0.000000 0 9 Mlock 0.00 0.000000 0 9 Munlock----------- ---------------------------------------------------100.00 0.006208 3119 Total

Obviously, "brk" is gone and replaced by "recvfrom" and "accept", but these operations are time-consuming, so locating "brk" is the cause of the failure.

Hug failure, every failure is a experience. is the so-called: the day will be down to the great people also, will first bitter his mind, labor its bones, hungry its body skin, empty its body, the line to whisk its own, so tempted to endure sex, gain it can not.

Believe that you have read the case of this article you have mastered the method, more exciting please pay attention to the PHP Chinese network other related articles!

Recommended reading:

PHP using file_get_contents to send HTTP request steps in detail

Php+mysql handling High concurrent lock transaction steps

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.