The system ran for a long time, there will always be such a problem and bottlenecks, there are problems not scary, we have the "tiger" thing--nothing more than the positioning problem-> analysis-> put forward solutions-> practice-> Results Feedback-> summarized and optimized.
A previous optimization practice, recently turned out to see, some of the general optimization methods can be reused. The system ran for a long time, there will always be such a problem and bottlenecks, there are problems not scary, we have the "tiger" thing--nothing more than the positioning problem-> analysis-> put forward solutions-> practice-> Results Feedback-> summarized and optimized.
Problem Description: The system uses PHP5 + Zend framework development, in the data size and access to increase (tens), the background Apache server load is too high, in the visit peak period (such as 10 o'clock hours every day, especially in Friday), Machine CPU load will soar to more than 170, CPU load too high cause processing request also slows down correspondingly, so need to solve this problem urgently.
Problem analysis: Through consecutive days of observation and analysis, when the CPU utilization reached 100%, where the system CPU utilization accounted for a large proportion of users CPU utilization rate is not very high, the other front-end Haproxy and squid cache CPU load is very low, Memcached and squid hit ratio generally can reach about 60%.
By analyzing the access-log of backend, it is found that a large part of the request User-agent is search crawler.
At the same time, the Apache configuration of Xdebug, in the free time of the main page measurement of a set of performance data, by using Kcachegrind to analyze the measured data (how to configure Xdebug, you can use Soso search), found:
The performance data is not stable enough and the test data between the same requests will vary considerably
Slow points are more dispersed.
Most of Memcached's visits are slow (above 100ms)
Through the above preliminary analysis, the solution has made a series of adjustments to the existing procedures.
The first consideration is whether you can think of ways to increase the front-end squid cache hit ratio, thereby reducing the number of requests to reach the back-end Apache.
Given that a significant portion of the request originated from crawler, the squid cache would only cache requests for language cookies, and the request from crawler had no cookie information. So think of the request from crawler defaults to language for ZH_CN, and then modify the Haproxy configuration, user-agent for common crawler request all turn to squid cache.
Modify the PHP code to set some pages to a longer cache time
After two steps, the request to arrive at Apache did decrease a little, but the problem with the CPU load is too high to help, so we find another way.
Second, depending on the results of using xdebug profiling, the interaction with memcached takes a long time, and it is thought that it is possible to find a way to get memcached to respond more quickly to requests, so that each request can be completed faster, thus reducing concurrency.
Through the code analysis, found that the online memcached is using poll (), while the number of memcached connections in the busy time to maintain around 1000, memcached CPU utilization rate of about 30%. Obviously, the poll () approach is inefficient when dealing with so many concurrent connections. So recompile memcached, make it use Epoll () way to process requests, replace with Epoll, memcached CPU usage from about 30% down to about 3%, 10 times times more!
In addition, memcached hit ratio is not particularly high, and the number of items are also relatively high, so think of the contents of the cache as a partition. Originally intended to do manually partition, It was later found that PHP's latest memcache extension would support automatic partition based on cache key and add new memcached instances without modifying the program code (which requires modifying the configuration file:-)). So upgrade each Apache PHP memcache extension, and then add a new memcached in the configuration file. This completes the memcached content partition. After the modification of the effect is more significant, the page loading time than modified before a lot of reduction.
After these two steps of adjustment, memcached efficiency than before, but the Apache load is still high, no way, think of other ways!
Further in-depth analysis of the previous mentioned that the main system CPU occupancy is very high, to find the reason can only go deep into the kernel: From now on our strace journey. Apply a Nike advertising word: Just strace it!
The httpd process was strace during the peak hours, with the following methods
Strace-p Pid-c Draw Summary
Strace-p pid-o output.log Write file, slowly study
Strace-p pid-e Trace=file Only see filesystem operation-related Syscalls
Strace-p PID-ELSTAT64,STAT64,OPEN,GETCWD only track these Syscalls
...
From the above Strace analysis, the following conclusions are obtained:
Lstat64,stat64,open and so Syscalls is really much.
The above syscalls occupy time indeed many! They robbed more than 60% of their time, Orz.
The vast majority of Syscall is a failure, it is a repeated defeat of war AH
With the above data, we find the direction of the problem, and that is to find out how these meaningless system calls come about.
After analysis, these are PHP to load a class, will go to the include_path defined in a series of directories to search for the corresponding file, each directory so try until found. Well, it's obviously inefficient, is there a better way to do it? The answer is yes, there is! And there's more than one way!
When require_once () is invoked, the parameter writes an absolute path (which is not understood at the start of the guys write Zend framework; later)
Use __autoload () to lazy the class loading, that is, when you really need to load, not regardless of 3,721 of the possible use of the class files are require_once.
The problem is found, but there is another problem to be solved. In the development of the Code are attention to the absolute path, the only thing that can be improved is changed to lazy loading, but the Zend framework in a large number of require_once using a relative path, which is causing the problem- The problem I'm talking about here is the root cause of the high CPU load we're talking about in this article.
OK, now that the problem is found, solve it. Write a script that automatically generates a class-> file Path correspondence that generates corresponding relational files for all classes in the code and for all classes in the Zend Framework. Comment out all the require_once in the code and in the Zend Framework library. Then carry out a detailed test, and then the online. The results were astonishing and the load dropped to less than 3!! Solve the problem.
Summarize:
Write code people know that there will always be problems, any problem will have a reason (even if not found), from the root of the solution is kingly, solve what the problem is not important, I hope we can learn this solution, good at using tools. OK, here's the case.