Optimization practices for one php application

Last Update:2014-05-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Some common optimization methods can still be reused. After the system has been running for a long time, there will always be such problems and bottlenecks. if there is a problem, it will not be terrible. we have a fight for Tigers-nothing more than locating the problem-& gt; analyze the problem-& gt; propose a solution... syntaxHighlighter. all (); some common optimization methods can still be reused. After the system has been running for a long time, there will always be such problems and bottlenecks, and the problems are not terrible, we have things about the "Tiger" guy-just to locate the problem-> analyze the problem-> propose a solution-> practice-> result feedback-> summarize and optimize it.
Problem Description: The system is developed using PHP5 + Zend framework. when the data size and access volume increase (tens of millions), the backend apache server load is too high, during peak hours (for example, from off-duty to every night, especially on Friday), the machine's CPU load will soar to more than 170. If the CPU load is too high, processing requests will also slow down accordingly, therefore, we urgently need to solve this problem.
Problem Analysis: After several consecutive days of observation and analysis, when the CPU usage reaches 100%, the CPU usage of the system occupies a large proportion, and the user's CPU usage is not very high, in addition, the front-end haproxy and squid cache have low cpu load, and the hit ratio of memcached and squid can generally reach about 60%.
After analyzing the access-log of the backend, it is found that a considerable number of requests use the User-Agent as search crawlers;
At the same time, xdebug is configured on apache, and a set of performance data is measured for the main pages during idle time. kcachegrind is used to analyze the measured data (how to configure xdebug, you can use soso to search) and find that:
The performance data is not stable enough, and the test data between the same requests will be quite different.
Slow points are scattered
Most accesses to memcached are slow (over MS)
Through the preliminary analysis above, the solution gradually makes a series of adjustments to the existing programs.
The first consideration is whether we can find a way to increase the Hit ratio of the front-end squid cache to reduce the number of requests that penetrate squid to the backend apache.
Considering that a considerable number of requests come from crawlers, squid cache only caches the requests with language cookies configured, but no cookie information exists for the requests from crawlers. Therefore, we thought of setting the default language of all requests from the Crawler to zh_CN, modifying the haproxy Configuration, and forwarding User-Agent requests as common crawlers to the squid cache.
Modify the php code and set the cache Time for some pages to be longer.
After the above two steps, the number of requests to reach apache is indeed reduced, but the problem of high cpu load is very little help, so I am looking for another method.
Secondly, according to the results of using xdebug profiling, the interaction with memcached takes a long time, so I wonder if I can find a way to make memcached respond to the request more quickly, so that each request can be completed more quickly, thus reducing the concurrency.
Through code analysis, it is found that online memcached uses poll (), while the number of connections of memcached remains at around 1000 during busy hours, and the CPU usage of memcached is around 30%. Obviously, the poll () method is inefficient in handling so many concurrent connections. So we re-compiled memcached and used epoll () to process the request. after replacing it with epoll, the cpu usage of memcached was reduced from about 30% to about 3%, as many as 10 times!
In addition, the hit ratio of memcached is not particularly high, and the number of items to be swapped out is also relatively high. Therefore, we thought of partition for the cache content. originally intended to do manually partition, and later found that php's latest memcache extension can automatically perform partition based on the cache key, and can not modify the program code (the configuration file needs to be modified :-)) to add a new memcached instance. Therefore, every apache php memcache extension is upgraded, and a new memcached is added to the configuration file. This completes the content partition of memcached. After modification, the page loading time is much shorter than before modification.
After these two steps, the memcached efficiency is higher than before, but the apache load is still high, no way!
Further in-depth analysis we mentioned above that the CPU usage of the main system is very high. to find the reason, we only need to go deep into the kernel: from now on our strace journey. Apply a Nike ad word: Just strace it!
During peak hours, the httpd process was strace. the methods are similar to the following:
Strace-p PID-c to get the summary
Write strace-p PID-o output. log into the file, and study it slowly.
Strace-p PID-e trace = file: only view syscils related to filesystem operations.
Strace-p PID-elstat64, stat64, open, getcwd only tracks these syscils
...
The following conclusions are obtained from the above strace analysis:
There are so many syscils, such as lstat64, stat64, and open.
The above syscils takes a lot of time! More than 60% of the time has been snatched by them, orz
Most syscall failed.
With the above data, we can find the problem direction, that is, how to find these meaningless system calls.
After analysis, these are php to load a class, will go to the include_path defined in a series of directories to search for the corresponding files of this class, one directory to try, until found. Well, this method is obviously inefficient. Is there a better way to accomplish this? The answer is yes, yes! There are more than one way!
When require_once () is called, the parameter is written to the absolute path (the Guys write Zend Framework does not understand this principle at the beginning; it is updated later ))
Use _ autoload () to perform lazy loading on the class. that is to say, the class is loaded only when it is actually needed, instead of require_once for all the class files that may be used by November 21.
The problem is found, but there is another problem to solve. When developing code, pay attention to absolute paths. The only thing that can be improved is to change to lazy loading. However, a large number of require_once in Zend Framework use relative paths, this is the root cause of the problem-the problem I am talking about here is the high CPU load we are talking about.
OK. Now that the problem is found, solve it. Write a script to automatically generate the ing between Class-> File Path and generate the ing files between all classes in the code and all classes in Zend Framework. Comment out all require_once in the code and in the Zend Framework Library. Then perform a detailed test and go online. The results were surprising. the load dropped to less than 3 !! Solve the problem.
Summary:
Everyone who writes code knows that there is always a problem where the problem may occur, and there will be a reason for any problem (even if it is not found at the moment). solving the problem from the root is the king. it is not important to solve any problem, I hope you can learn about this solution and be good at using tools. OK. this is the case.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Optimization practices for one php application

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support