CentOS Web server-based performance optimization

Last Update:2014-05-24 Source: Internet

Author: User

Tags database load balancing

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Website optimization Case Based on Dynamic Content

1. Website running environment description
Hardware environment: one IBM x3850 server, a single dual-core Xeon 3.0g cpu, 2 GB memory, 3 72 gb scsi disks.
Operating System: CentOS5.4.
Website architecture: Web applications are based on the LAMP architecture, and all services are deployed on one server.
2. Performance Problems and Countermeasures
Symptom description
When a website is accessed at around ten o'clock A.M. and around three o'clock P.M., the webpage cannot be opened. After the service is restarted, the website will be able to provide normal services for a period of time, but the response will become slow after a while, the last page cannot be opened.
Check Configuration
First, check the system resource status. When the service fails, the system load is extremely high and the memory is basically exhausted. Then, check the Apache configuration file httpd. conf, the "MaxClients" option value is set to 2000, and The KeepAlive feature of Apache is enabled.
Handling measures
According to the above check, it is preliminarily determined that the "MaxClients" option of Apache is improperly configured because the system memory size is only 2 GB, and the "MaxClients" option is configured as 2000, too many user access processes consume system memory. Then, modify httpd. the "MaxClients" option in the conf configuration file reduces the value from 2000 to 1500. Continue to observe and find that the website is still frequently down, and then the "MaxClients" option value is reduced to 1024, after observing for a while, we found that the website service downtime was longer and not as frequent as before, but the system load was still high and the webpage access speed was extremely slow.
3. First analysis optimization
Since the website service is out of response due to system resource depletion, we can analyze the usage of system resources in depth through combined use of commands such as uptime, vmstat, top, and ps, the following conclusions are drawn:
Conclusion description
The average system load is very high. The "load average" value of the system output through uptime is over 10, and the CPU resources are also greatly consumed, this is the main cause of slow website response or long periods of no response. The main reason for high system resource consumption is that the user process consumes a lot of resources.
Cause Analysis
Through the top command, it is found that each Apache sub-process consumes nearly 6 ~ About 8 MB memory, which is abnormal. According to experience, under normal circumstances, each Apache sub-process consumes about 1 MB. Combined with Apache output logs, it is found that the homepage has the highest Access frequency, that is, the homepage program code may have problems. So I checked the PHP code on the homepage and found that the homepage has a very large page with many images and is composed of dynamic programs. In this way, every time a user visits the homepage, he/she needs to query the database multiple times, querying a database is a very CPU-consuming process, and the PHP code on the home page does not have a cache mechanism. Every user request must be re-queried, how high is the database query load.
Handling measures
Modify the PHP code on the home page, reduce the page size, and increase the cache mechanism for frequently accessed operations to minimize program access to the database.
4. Second analysis optimization
Through the preceding simple optimization, the number of system service downtime decreases significantly, but the website occasionally fails to be accessed during peak hours. Starting from analyzing the usage of system resources, we found that the system memory resource consumption was too large and the disk I/O was waiting. Therefore, we came to the following conclusion:
Cause Analysis
Memory consumption is too large, which is certainly caused by the excessive number of processes accessed by users. before optimizing the PHP code, each Apache sub-process consumes 6 ~ 8 MB memory. If you set the maximum number of users of Apache to 1024, it is inevitable that the memory will be exhausted. When the physical memory is exhausted, the virtual memory will be enabled and the virtual memory will be frequently used, the disk I/O wait problem will certainly occur, and the CPU resources will eventually be exhausted.
Handling measures
Through the optimization of PHP code, the memory resources consumed by each Apache sub-process are basically maintained at 1 ~ 2 MB or so. Therefore, modify the Apache configuration file httpd. the value of the "MaxClients" option in conf is "600", and the "KeepAlive" feature in Apache configuration is disabled. As a result, the number of Apache processes is greatly reduced, which is basically 500 ~ Between 600, although occasionally using virtual memory, but the Web Service is normal, service downtime issues rarely occur.
5. Third analysis optimization
After the previous two optimizations, the website basically runs normally, but sometimes the site cannot be accessed during peak hours. Continue to analyze the problem and run the command to view system resources, it is still caused by CPU resource depletion, but it is different from the previous two:
Cause Analysis
By observing the background logs, it is found that PHP programs frequently access database operations, and a large number of SQL statements include clauses such as where and order by. At the same time, too many database queries, most of which are complex queries, generally, the whole table needs to be traversed, while a large number of tables are not indexed. Such program code leads to a high load on the MySQL database, and the MySQL database and Apache are deployed on the same server, this is also the cause of high CPU consumption on the server.
Handling measures
Optimize the SQL statements in the program, add matching conditions on the where clause, reduce traversal of all queries, and create indexes on the fields of the where and order by clauses, and increase the program cache mechanism, through this optimization, the website is basically running normally and there is no downtime.

6. Optimization Analysis for the fourth time
After the preceding three optimizations, the optimization space for the website in terms of program code, operating system, and Apache becomes smaller and smaller, so service downtime should be avoided, it also ensures stable, efficient, and fast website operation and can be optimized from the website structure, that is, the Web and database are deployed separately, and a dedicated database server can be added, deploy the MySQL database separately. As access traffic increases, if the front-end cannot meet access requests, you can add multiple Web servers and deploy load balancing among Web servers to solve the front-end performance bottleneck; if there is still read and write pressure on the database end, you can continue to add a MySQL server to separate and deploy MySQL, so that a high-performance, high-reliability website system is built.

2. Cases of website Optimization Based on Dynamic and Static content

1. Website running environment description
Hardware environment: Two IBM x3850 servers, a single dual-core Xeon 3.0g cpu, 4 GB memory, 3 72 gb scsi disks.
Operating System: CentOS5.4.
Website architecture: Web applications are e-commerce applications based on the J2EE architecture. The Web application server is Tomcat, And the MySQL database is used. The Web and database are deployed on two servers independently.

2. Performance Problems and Solutions
Symptom description
During peak website access, the webpage cannot be opened. After the Java service is restarted, the website can run normally for a period of time, but the response becomes slow after a while, And the webpage cannot be opened completely.
Check Configuration
First, check the system resource status. When a service failure occurs, the system load is extremely high and the CPU is running at full capacity. Java processes occupy 99% of the system's CPU resources, but the memory resources are not used; check the application server information and find that only one Tomcat is running Java program. Then, view the Tomcat configuration file server. xml, server. parameters in the xml file are configured by default without any optimization.
Handling measures
Server. the default parameters of the xml file must be modified according to the characteristics of the application. For example, you can modify parameters of several Tmcat configuration files, such as connectionTimeout, maxKeepAliveRequests, and maxProcessors, increase the value of these parameters. After modifying the parameter value, we can continue to observe and find that the website service downtime interval is longer, which is not as frequent as before. However, the Java Process consumes a lot of CPU resources and the webpage access speed is extremely slow.

3. First analysis optimization
Since the Java Process consumes a lot of CPU resources, you need to check what causes serious Java resource consumption. Through the lsof and netstat commands, you can find a large number of Java request wait information, and then view the Tomcat log, when a large number of error messages, log prompts, and database connection times out, the database cannot be connected, and static website resources cannot be accessed, the following conclusions are drawn:
Cause Analysis
Tomcat itself is a Java container that uses the connection/thread model to process business requests. It is mainly used to process dynamic applications such as Jsp and servlet, although it can also be used as an HTTP server, however, the efficiency of processing static resources is very low, which is far inferior to that of Apache or Nginx. From the analysis of the phenomena observed above, we can initially determine that Tomcat cannot respond to client requests in a timely manner, resulting in an increasing number of request queues until Tomcat crashes completely. For a normal access request, after the server receives the request, it will send the request to Tomcat for processing. Tomcat then performs compilation, database access, and other operations, and then returns the information to the client, after the client receives the information, Tomcat closes the request link so that the complete access process ends. In the highly concurrent access status, many requests are instantly handed over to Tomcat for processing, so Tomcat has not completed the first request, the second request has arrived, followed by the third request, and so on, in this way, the more data is accumulated, and Tomcat eventually loses the response. The Java Process is frozen and the resources cannot be released. This is the root cause.
Handling measures
To optimize Tomcat performance, we need to reconstruct the structure. First, we need to add Apache support. Apache processes static resources and Tomcat processes dynamic requests, the Mod_JK module is used for communication between the Apache server and the Tomcat server. The advantage of using the Mod_JK module is that it can define detailed resource processing rules and submit all static resource files to Apache for processing based on the characteristics of dynamic and static websites, dynamic requests are sent to Tomcat through the Mod_JK module for processing. Through the integration of Apache + JK + Tomcat, the performance of Tomcat applications can be greatly improved.

4. Second analysis optimization
After the previous optimization measures, Java resources occasionally increase, but will automatically decrease after a period of time, which is normal, and in the case of high concurrency access, the Java Process may sometimes encounter resources rising and falling. By viewing Tomcat logs, the following conclusions are drawn from a comprehensive analysis:
To achieve higher and more stable performance, a single Tomcat Application Server sometimes cannot meet the requirements. Therefore, you must run the Tomcat-Based Load Balancing System in conjunction with the Mod_JK module, in this way, Apache is responsible for user request scheduling at the front end, and multiple backend Tomcat servers are responsible for parsing dynamic applications. by distributing the load evenly to multiple Tomcat servers, the overall performance of the website will be substantially improved.

I. Features of several typical applications on system resources

1.1 Web applications based on static content
One of the main features of such applications is that the majority of small files and frequent read operations, Web servers are generally Apache or Nginx, because the two HTTP servers process static resources very quickly and efficiently. A single Web service cannot support a large number of client accesses when there is a large number of concurrent requests, A load cluster system composed of multiple Web servers is required. To achieve more efficient access, you can also set up a Cache server at the frontend, that is, Cache static resource files to the operating system memory for direct read operations, because reading data directly from the memory is much more efficient than reading data from the hard disk, setting up a Cache server on the Web Front End can greatly improve the concurrent access performance. Common Cache software include Squid and Varinsh.
Although the Cache server can improve the access performance, the server requires a large amount of memory. When the system memory is sufficient, the random read Pressure on the disk can be mitigated. When the memory is too small or the memory is insufficient, the system will use virtual memory, and the use of virtual memory will increase the disk I/O. When the disk I/O increases, the CPU overhead will also increase.
When there is a high concurrency access, another problem is the network bandwidth bottleneck. If the client has a large traffic volume and the bandwidth is insufficient, the network will be blocked and access will be affected. Therefore, when building a Web-based network application, network bandwidth must also be considered.

1.2 Web applications focusing on Dynamic Content
One feature of such applications is that frequent write operations, such as Java, PHP, Perl, and CGI, can cause serious CPU resource consumption. Because the execution of dynamic programs requires compiling and reading databases, and these operations consume CPU resources, a dynamic program-based Web Application, multiple CPUs with high performance should be selected, which will greatly improve the overall performance of the system.
When Dynamic Content-based Web applications are accessed in high concurrency, the number of processes executed by the system is large. Therefore, pay attention to load distribution. Too many processes consume a large amount of system memory. If the memory is insufficient, the virtual memory will be used. The increase in the virtual memory will lead to frequent disk write operations, which will consume CPU resources, therefore, we need to seek a balance between hardware and software resources, such as configuring a large memory and a high-performance CPU. In terms of software, we can use Memcached to accelerate the access efficiency between programs and databases.

1.3 database applications

One of the main features of database applications is the consumption of memory and disk I/O, while the consumption of CPU is not very large, therefore, the most basic practice is to configure a large memory and fast read/write disk array for the database server. For example, you can select RAID level for the disk of the database server, such as RAID 5 and RAID 01. Separating Web Server from DB Server is also a common practice for optimizing database applications. If the client user's request to the database is too large, you can also consider using the database load balancing solution to improve the database access performance through software load balancing or hardware load balancing.
For tables that are too large in the database, you can consider splitting them, that is, splitting a large table into multiple small tables and then associating them through indexes. This can avoid performance problems caused by querying large tables, when the table is too large, querying and traversing the entire table will result in a sharp increase in disk read operations, resulting in read operation waiting. At the same time, the query statements in the database are complex. A large number of where clauses, order by, group by sorting statements, and so on, can easily cause CPU bottlenecks. Finally, when data is updated, a large volume of data updates or frequent updates may also result in a surge in disk write operations and a bottleneck in write operations. This should also be avoided in the program code.
In daily applications, another method can significantly improve the performance of the database server, that is, read/write splitting. Read and Write operations on the database at the same time are extremely inefficient access methods. A good practice is to meet the Read and Write pressure and requirements, create two database servers with identical structures, copy the data on the server responsible for writing to the server responsible for reading at regular intervals, and improve the overall system performance through read/write collaboration.
The cache method can also improve the performance of the database. The cache is a temporary container of the database or objects in the memory. Using the cache can greatly reduce the read operations of the database and provide data in the memory. For example, you can add a data cache layer between the Web Server and the DB Server to create copies of frequently requested objects in the system memory. In this way, data can be provided for programs without accessing the database, memcached, which is widely used today, is based on this principle.
1.4 Software Download Application

Static resource download servers are characterized by high bandwidth consumption and high storage performance requirements. When downloads are extremely high, multiple servers and multi-point servers can be used to share the download load, in terms of HTTP servers, we recommend Lighttpd HTTP servers instead of traditional Apache servers from the perspective of high performance and reduced server deployment, the reason is that Apache uses the blocking mode of I/O operations, the performance is relatively poor, the concurrency capability is limited, and Lighttpd uses the asynchronous I/O method, the processing of resource download concurrency capability far exceeds Apache.

1.5 streaming media service applications
Streaming media is mainly used in video conferencing, video on demand, distance education, and online live broadcasting. The main performance bottleneck of such applications is network bandwidth and storage system bandwidth (mainly read operations ), in the face of a massive number of users, how to ensure that users receive high-definition, smooth images, and how to maximize network bandwidth savings is the primary problem for streaming media applications.
To optimize the streaming media server, you can consider the storage policy, Transmission Policy, scheduling policy, proxy server Cache Policy, and the architecture design of the Streaming Media Server. In terms of storage, the video encoding format needs to be optimized to save space and optimize storage performance. In terms of transmission, intelligent stream technology can be used to control the transmission rate, to maximize the smoothness of watching videos. Static and Dynamic Scheduling can be used for scheduling. Management policies such as segment caching and Dynamic Caching can be used for proxy servers; in the architecture of streaming media, the memory pool and thread pool technology can be used to improve the impact of memory consumption and excessive threads on performance.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More