Performance Analysis and Optimization of CentOS servers

Last Update:2014-07-24 Source: Internet

Author: User

Tags database load balancing

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Performance Analysis and Optimization of CentOS servers

As a Linux system administrator, the most important task is to optimize system configuration so that applications can run in the optimal state on the system. However, hardware, software, and network environments are complex and changeable, as a result, the optimization of the system becomes abnormal and complex. How to locate the performance problem is a major problem in performance optimization. Starting from the system, this article focuses on the performance problems caused by improper system software and hardware configuration, and provides general methods and procedures for detecting system faults and optimizing performance.

I. Purpose of System Performance Analysis

1.1 Identify system performance bottlenecks
System Performance refers to the effectiveness, stability, and response speed of the operating system to complete tasks. Linux system administrators may often encounter problems such as system instability and slow response speed. For example, if a Web Service is built on Linux, Web pages may often fail to be opened or opened slowly. When encountering these problems, some people will complain that the Linux system is not good. In fact, these are superficial phenomena. A task completed by the operating system is closely related to the system settings, network topology, routing devices, routing policies, access devices, physical lines, and other aspects, the performance of the entire system is affected when any problem occurs. Therefore, when a problem occurs in a Linux application, you should comprehensively check the application, operating system, server hardware, and network environment to locate the problem and then solve it in a centralized manner.

1.2 provide performance optimization solutions
Searching for system performance bottlenecks is a complex and time-consuming process. You need to find and locate the bottlenecks in applications, operating systems, server hardware, and network environments, the biggest impact on performance is the application and the operating system, because the problems in these two aspects are hard to detect and the concealment is very strong. Hardware and network problems can be located immediately. Once the system performance problem is found, it is very fast and easy to solve. For example, if there is a problem with the system hardware, if it is a physical fault, you can replace the hardware, if the hardware performance cannot meet the requirements, you can upgrade the hardware. If you find a network problem, such as insufficient bandwidth or unstable network, you only need to optimize and upgrade the network; if it is an application problem, modify or optimize the software system. If it is an operating system configuration problem, modify the system parameters and system configuration.
It can be seen that as long as the performance bottleneck is found, a performance optimization solution can be provided, with standard and purposeful system optimization.

1.3 balance the use of system hardware and software resources
Linux is an open-source product and a practical and application platform for open-source software. It is supported by countless open-source software, common examples include Apache, Tomcat, MySQL, and PHP. The biggest idea of open-source software is freedom and openness. Linux, as an open-source platform, must realize the lowest cost through the support of these open-source software, to optimize application performance. However, the performance of the system is not isolated and solves a performance bottleneck, which may lead to another performance bottleneck. Therefore, the ultimate goal of performance optimization is: within a certain range, the use of various system resources tends to be reasonable and maintain a certain balance, that is, when the system runs well, it is precisely when the system resources reach a balance. In the operating system, excessive use of any resource will undermine this balance, resulting in slow system response or excessive load. For example, excessive use of CPU resources may lead to a large number of waiting processes in the system, resulting in slow application response, while a large increase in processes may lead to an increase in system memory resources, when the physical memory is exhausted, the system will use virtual memory, and the use of virtual memory will increase the disk I/O and increase the CPU overhead. Therefore, the optimization of system performance is to find a balance between hardware, operating systems, and application software.

2. Analyze the personnel involved in system performance

2.1 Linux System Administrator
During performance optimization, system administrators undertake important tasks. First, system administrators should understand and master the current operating status of the operating system, for example, system load, memory status, Process status, CPU load, and other information are the basis and basis for detecting and judging system performance. Secondly, system administrators also have knowledge of system hardware information, for example, disk I/O, CPU model, memory size, network card bandwidth, and other parameter information. Then, the system resource usage is evaluated based on the information. Third, as a system administrator, we also need to know the usage of system resources by applications. A more in-depth understanding of the running efficiency of applications, such as program bugs and memory overflow, by monitoring system resources, you can find out whether there is an exception in the application. If there is a problem in the application, you need to immediately report the problem to the program developers to improve or upgrade the program.
Performance optimization is a complex and tedious process, system Administrators can optimize server performance only when they understand system hardware information, network information, operating system configuration information, and application information, this requires the system administrator to have sufficient theoretical knowledge, rich practical experience, and a thorough analysis of problems.

2.2 System Architecture designer
The second type of personnel involved in system performance optimization is the application architecture designer. After comprehensive judgment, the system administrator finds that the execution efficiency of the application affects the performance, the program architecture designer must promptly intervene to gain a deeper understanding of the running status of the program. First, the system architecture designer should track and understand the execution efficiency of the program. If there is a problem with the execution efficiency, find out where the problem occurs. Secondly, if the architecture design has a problem, we need to optimize or improve the system architecture and design a better application system architecture.

2.3 software developers
The last part of system performance optimization involves program developers. After the system administrator or architecture designer finds the program or structure bottleneck, program developers need to immediately intervene in the corresponding program modification. To modify a program, the execution efficiency of the program should be taken as a benchmark, the logic of the program should be improved, and code optimization should be conducted in a targeted manner. For example, if the system administrator finds that an SQL statement consumes a lot of system resources and crawls the executed SQL statement, the system administrator finds that the execution efficiency of this SQL statement is too low, this is caused by low execution efficiency of Code Compiled by developers. Therefore, developers need to feedback this information to developers. After receiving this question, developers can perform targeted SQL optimization, to optimize the program code.
From the above process, we can see that the general process for system performance optimization is: first, the system administrator can check the overall system status, the system hardware, network equipment, operating system configuration, application architecture, and program code are used to make a comprehensive judgment. If a problem occurs in system hardware, network equipment, or operating system configuration, the system administrator can resolve the issue based on the actual situation. If a program structure problem is found, it must be submitted to the program architecture designer. If a program code execution problem is found, it should be handed over to the developer for code optimization. This completes a process of system performance optimization.

Iii. Various factors affecting Linux Performance

3.1 system hardware resources
1. CPU
CPU is the foundation of the stable operation of the operating system. The speed and performance of the CPU determine the overall performance of the system. Therefore, the more CPUs, the higher the clock speed, the server performance is also better. But not exactly.
At present, most CPUs can run only one thread at a time. hyper-threading processors can run multiple threads at a time. Therefore, the hyper-threading feature of the processor can be used to improve system performance. In a Linux system, hyper-threading is supported only when the SMP kernel is run. However, the more CPUs are installed, the fewer performance improvements are obtained from hyper-threading. In addition, the Linux kernel recognizes multi-core processors as multiple separate CPUs. For example, two 4-core CPUs are considered as eight single-core CPUs in the Lnux system. However, from the performance perspective, the two 4-core CPUs and eight single-core CPUs are not exactly equivalent. According to the test conclusion, the overall performance of the former is 25% ~ lower than that of the latter ~ 30%.
Applications that may encounter CPU bottlenecks include email servers and dynamic Web servers. For such applications, the CPU configuration and performance should be placed in the main position.
2. Memory
The memory size is also an important factor affecting Linux performance. If the memory size is too small, the system process will be blocked, and the application will become slow or even fail to respond. If the memory size is too large, resources will be wasted. Linux uses two methods: physical memory and virtual memory. Although the virtual memory can alleviate the shortage of physical memory, it occupies too much virtual memory, and the performance of applications will be significantly reduced, to ensure high performance of applications, physical memory must be large enough. However, excessive physical memory may cause a waste of memory resources. For example, in a 32-bit processor Linux operating system, physical memory exceeding 8 GB will be wasted. Therefore, to use larger memory, we recommend that you install a 64-bit operating system and enable the Linux large memory kernel.
Due to the limitation of the processor's addressing range, in a 32-bit Linux operating system, a single process of an application can only use 2 GB of memory, so that even if the system has a larger memory, applications cannot be used for "exclusive" purposes. The solution is to use a 64-bit processor to install a 64-bit operating system. In a 64-bit operating system, the memory usage requirements of all applications are almost unlimited.
Applications that may encounter memory performance bottlenecks include printing servers, database servers, and static Web servers. For such applications, the memory size should be placed in the main position.
3. Disk I/O performance
The disk I/O performance directly affects the application performance. In an application with frequent read/write operations, if the disk I/O performance is not satisfied, the application will be stuck. Fortunately, today's disks use many methods to improve I/O performance, such as the common disk RAID technology.
RAID is called Redundant Array of Independent Disk, which is an Independent Redundant Disk Array. RAID combines multiple Independent Disks (physical hard disks) in different ways to form a disk group (logical hard disks), providing higher I/O performance and data redundancy than a single hard disk.
A disk group composed of RAID technology is equivalent to a large hard disk. You can partition and format it, create a file system, and perform other operations. The disk is identical to a single physical hard disk, the only difference is that the I/O performance of RAID disk groups is much higher than that of a single hard disk, and the data security is greatly improved.
RAID can be divided into RAID 0, RAID1, RAID2, RAID3, RAID4, RAID5, RAID6, RAID7, RAID0 + 1, and RAID10 based on different disk combinations, commonly used RAID levels include RAID 0, RAID 1, RAID 5, and RAID 0 + 1. Here is a brief introduction.
RAID 0: bind multiple hard disks into a larger hard disk group to improve disk performance and throughput. This method is cost-effective and requires at least two disks, but has no fault tolerance and data repair functions. Therefore, it can only be used in environments with low data security requirements.
RAID 1: This is the disk image. by mirroring data from one disk to another, the data on the disk can be restored to the maximum extent, ensuring high data redundancy, however, the disk utilization rate is only 50%, which leads to the highest cost and is mostly used to store important data.
RAID5: it adopts the technology of disk segmentation and parity verification, which improves the system reliability. RAID5 reading efficiency is very high, and the write efficiency is average. It requires at least three disks. A disk failure is allowed without affecting data availability.
RAID0 + 1: The combination of RAID0 and RAID1 technology becomes RAID0 + 1 and requires at least four hard disks. In this way, data is distributed across multiple disks, and each disk has its own image disk, providing full redundancy capability while allowing one disk to fail without affecting data availability, and has the ability to read/write quickly.
By understanding the performance of each RAID level, you can select a suitable RAID level based on different characteristics of the application, so as to ensure that the application achieves the best performance on the disk.
4. network bandwidth
Linux applications are generally network-based. Therefore, network bandwidth is also an important factor affecting performance. Low-speed and unstable networks will cause access congestion to network applications, stable and high-speed network bandwidth ensures that applications can run smoothly on the network. Fortunately, the current network is generally a Gigabit bandwidth or optical fiber network, and the impact of bandwidth problems on application performance is gradually decreasing.

3.2 operating system resources

The Performance Optimization Based on the operating system is also multi-faceted. It can be measured in the following aspects: system installation, system kernel parameters, network parameters, and file system.
1. system installation optimization
System optimization can begin with the installation of the operating system. When the Linux system is installed, the disk division and SWAP memory allocation will directly affect the system performance. For example, disk allocation can follow the needs of applications: for applications that require frequent write operations but do not have high data security requirements, the disk can be made into RAID 0, while for data security, applications that do not have special requirements on reading and writing can make disks into RAID 1. applications that have high requirements on read operations, but do not have special requirements on write operations, and must ensure data security, select RAID 5. For applications with high read/write requirements and data security requirements, select RAID 01. In this way, different RAID levels are set based on different application requirements to optimize the system at the bottom of the disk.
As the Memory price decreases and the memory capacity increases, the virtual memory SWAP settings have no requirement that the so-called virtual memory is twice the physical memory, however, SWAP settings cannot be ignored. Based on experience, if the memory size is small (the physical memory is smaller than 4 GB), the SWAP partition size of SWAP is generally set to twice the memory size; if the physical memory is larger than 4 GB and smaller than 16 GB, you can set the SWAP size to be equal to or slightly smaller than the physical memory. If the memory size is larger than 16 GB, you can set the SWAP to 0 in principle, however, this is not recommended because setting a SWAP of a certain size still has a certain effect.
2. Kernel Parameter Optimization
After the system is installed, the optimization is not completed. Then, you can optimize the system kernel parameters. However, the Kernel Parameter Optimization should be integrated with the applications deployed in the system. For example, if the system is deployed with an Oracle database application, you need to share the system memory segment (kernel. shmmax, kernel. shmmni, kernel. shmall), system semaphore (kernel. sem), file handle (fs. file-max) and other parameters are optimized. If a Web application is deployed, You need to optimize the network parameters according to the characteristics of the Web application, such as modifying the net. ipv4.ip _ local_port_range, net. ipv4.tcp _ tw_reuse, net. core. network kernel parameters such as somaxconn.
3. File System Optimization
File System Optimization is also a focus of system resource optimization. in Linux, the optional file systems include ext2, ext3, xfs, and ReiserFS. Select different file systems based on different applications.
The Linux standard file system starts from VFS, ext, and ext2. It should be said that ext2 is a standard file system on Linux, and ext3 is formed by adding logs on the basis of ext2, from VFS to ext3, its design philosophy has not changed much. They are the design philosophy of the early UNIX family based on Super blocks and inode.
The XFS file system is an advanced Log File System developed by SGI and then transplanted to the Linux system, XFS distributes disk requests, locates data, and maintains Cache consistency to provide low-latency and high-bandwidth access to file system data. Therefore, XFS is highly scalable and robust, it has excellent logging functions, strong scalability, and fast writing performance.
ReiserFS is a high-performance log file system developed under the leadership of Hans Reiser. It manages data, including file data, file name and log support, through a fully balanced tree structure, compared with ext2/ext3, access performance and security are greatly improved. ReiserFS has the advantages of efficient and reasonable use of disk space, advanced log management mechanism, unique search methods, and massive disk storage.

3.3 application software resources
Application optimization is actually the core of the entire optimization project. If an application has bugs, even if all other aspects reach the optimal state, the entire application system still has low performance, application optimization is the top priority of the performance optimization process, which puts forward higher requirements for program architecture designers and program developers.

I. Features of several typical applications on system resources

1.1 Web applications based on static content
One of the main features of such applications is that the majority of small files and frequent read operations, Web servers are generally Apache or Nginx, because the two HTTP servers process static resources very quickly and efficiently. A single Web service cannot support a large number of client accesses when there is a large number of concurrent requests, A load cluster system composed of multiple Web servers is required. To achieve more efficient access, you can also set up a Cache server at the frontend, that is, Cache static resource files to the operating system memory for direct read operations, because reading data directly from the memory is much more efficient than reading data from the hard disk, setting up a Cache server on the Web Front End can greatly improve the concurrent access performance. Common Cache software include Squid and Varinsh.
Although the Cache server can improve the access performance, the server requires a large amount of memory. When the system memory is sufficient, the random read Pressure on the disk can be mitigated. When the memory is too small or the memory is insufficient, the system will use virtual memory, and the use of virtual memory will increase the disk I/O. When the disk I/O increases, the CPU overhead will also increase.
When there is a high concurrency access, another problem is the network bandwidth bottleneck. If the client has a large traffic volume and the bandwidth is insufficient, the network will be blocked and access will be affected. Therefore, when building a Web-based network application, network bandwidth must also be considered.

1.2 Web applications focusing on Dynamic Content
One feature of such applications is that frequent write operations, such as Java, PHP, Perl, and CGI, can cause serious CPU resource consumption. Because the execution of dynamic programs requires compiling and reading databases, and these operations consume CPU resources, a dynamic program-based Web Application, multiple CPUs with high performance should be selected, which will greatly improve the overall performance of the system.
When Dynamic Content-based Web applications are accessed in high concurrency, the number of processes executed by the system is large. Therefore, pay attention to load distribution. Too many processes consume a large amount of system memory. If the memory is insufficient, the virtual memory will be used. The increase in the virtual memory will lead to frequent disk write operations, which will consume CPU resources, therefore, we need to seek a balance between hardware and software resources, such as configuring a large memory and a high-performance CPU. In terms of software, we can use Memcached to accelerate the access efficiency between programs and databases.

1.3 database applications

One of the main features of database applications is the consumption of memory and disk I/O, while the consumption of CPU is not very large, therefore, the most basic practice is to configure a large memory and fast read/write disk array for the database server. For example, you can select RAID level for the disk of the database server, such as RAID 5 and RAID 01. Separating Web Server from DB Server is also a common practice for optimizing database applications. If the client user's request to the database is too large, you can also consider using the database load balancing solution to improve the database access performance through software load balancing or hardware load balancing.
For tables that are too large in the database, you can consider splitting them, that is, splitting a large table into multiple small tables and then associating them through indexes. This can avoid performance problems caused by querying large tables, when the table is too large, querying and traversing the entire table will result in a sharp increase in disk read operations, resulting in read operation waiting. At the same time, the query statements in the database are complex. A large number of where clauses, order by, group by sorting statements, and so on, can easily cause CPU bottlenecks. Finally, when data is updated, a large volume of data updates or frequent updates may also result in a surge in disk write operations and a bottleneck in write operations. This should also be avoided in the program code.
In daily applications, another method can significantly improve the performance of the database server, that is, read/write splitting. Read and Write operations on the database at the same time are extremely inefficient access methods. A good practice is to meet the Read and Write pressure and requirements, create two database servers with identical structures, copy the data on the server responsible for writing to the server responsible for reading at regular intervals, and improve the overall system performance through read/write collaboration.
The cache method can also improve the performance of the database. The cache is a temporary container of the database or objects in the memory. Using the cache can greatly reduce the read operations of the database and provide data in the memory. For example, you can add a data cache layer between the Web Server and the DB Server to create copies of frequently requested objects in the system memory. In this way, data can be provided for programs without accessing the database, memcached, which is widely used today, is based on this principle.
1.4 Software Download Application

Static resource download servers are characterized by high bandwidth consumption and high storage performance requirements. When downloads are extremely high, multiple servers and multi-point servers can be used to share the download load, in terms of HTTP servers, we recommend Lighttpd HTTP servers instead of traditional Apache servers from the perspective of high performance and reduced server deployment, the reason is that Apache uses the blocking mode of I/O operations, the performance is relatively poor, the concurrency capability is limited, and Lighttpd uses the asynchronous I/O method, the processing of resource download concurrency capability far exceeds Apache.

1.5 streaming media service applications
Streaming media is mainly used in video conferencing, video on demand, distance education, and online live broadcasting. The main performance bottleneck of such applications is network bandwidth and storage system bandwidth (mainly read operations ), in the face of a massive number of users, how to ensure that users receive high-definition, smooth images, and how to maximize network bandwidth savings is the primary problem for streaming media applications.
To optimize the streaming media server, you can consider the storage policy, Transmission Policy, scheduling policy, proxy server Cache Policy, and the architecture design of the Streaming Media Server. In terms of storage, the video encoding format needs to be optimized to save space and optimize storage performance. In terms of transmission, intelligent stream technology can be used to control the transmission rate, to maximize the smoothness of watching videos. Static and Dynamic Scheduling can be used for scheduling. Management policies such as segment caching and Dynamic Caching can be used for proxy servers; in the architecture of streaming media, the memory pool and thread pool technology can be used to improve the impact of memory consumption and excessive threads on performance.

Web Application-based performance analysis and optimization Cases

1. Website optimization Case Based on Dynamic Content

1. Website running environment description
Hardware environment: one IBM x3850 server, a single dual-core Xeon 3.0g cpu, 2 GB memory, 3 72 gb scsi disks.
Operating System: CentOS5.4.
Website architecture: Web applications are based on the LAMP architecture, and all services are deployed on one server.
2. Performance Problems and Countermeasures
Symptom description
When a website is accessed at around ten o'clock A.M. and around three o'clock P.M., the webpage cannot be opened. After the service is restarted, the website will be able to provide normal services for a period of time, but the response will become slow after a while, the last page cannot be opened.
Check Configuration
First, check the system resource status. When the service fails, the system load is extremely high and the memory is basically exhausted. Then, check the Apache configuration file httpd. conf, the "MaxClients" option value is set to 2000, and The KeepAlive feature of Apache is enabled.
Handling measures
According to the above check, it is preliminarily determined that the "MaxClients" option of Apache is improperly configured because the system memory size is only 2 GB, and the "MaxClients" option is configured as 2000, too many user access processes consume system memory. Then, modify httpd. the "MaxClients" option in the conf configuration file reduces the value from 2000 to 1500. Continue to observe and find that the website is still frequently down, and then the "MaxClients" option value is reduced to 1024, after observing for a while, we found that the website service downtime was longer and not as frequent as before, but the system load was still high and the webpage access speed was extremely slow.
3. First analysis optimization
Since the website service is out of response due to system resource depletion, we can analyze the usage of system resources in depth through combined use of commands such as uptime, vmstat, top, and ps, the following conclusions are drawn:
Conclusion description
The average system load is very high. The "load average" value of the system output through uptime is over 10, and the CPU resources are also greatly consumed, this is the main cause of slow website response or long periods of no response. The main reason for high system resource consumption is that the user process consumes a lot of resources.
Cause Analysis
Through the top command, it is found that each Apache sub-process consumes nearly 6 ~ About 8 MB memory, which is abnormal. According to experience, under normal circumstances, each Apache sub-process consumes about 1 MB. Combined with Apache output logs, it is found that the homepage has the highest Access frequency, that is, the homepage program code may have problems. So I checked the PHP code on the homepage and found that the homepage has a very large page with many images and is composed of dynamic programs. In this way, every time a user visits the homepage, he/she needs to query the database multiple times, querying a database is a very CPU-consuming process, and the PHP code on the home page does not have a cache mechanism. Every user request must be re-queried, how high is the database query load.
Handling measures
Modify the PHP code on the home page, reduce the page size, and increase the cache mechanism for frequently accessed operations to minimize program access to the database.
4. Second analysis optimization
Through the preceding simple optimization, the number of system service downtime decreases significantly, but the website occasionally fails to be accessed during peak hours. Starting from analyzing the usage of system resources, we found that the system memory resource consumption was too large and the disk I/O was waiting. Therefore, we came to the following conclusion:
Cause Analysis
Memory consumption is too large, which is certainly caused by the excessive number of processes accessed by users. before optimizing the PHP code, each Apache sub-process consumes 6 ~ 8 MB memory. If you set the maximum number of users of Apache to 1024, it is inevitable that the memory will be exhausted. When the physical memory is exhausted, the virtual memory will be enabled and the virtual memory will be frequently used, the disk I/O wait problem will certainly occur, and the CPU resources will eventually be exhausted.
Handling measures
Through the optimization of PHP code, the memory resources consumed by each Apache sub-process are basically maintained at 1 ~ 2 MB or so. Therefore, modify the Apache configuration file httpd. the value of the "MaxClients" option in conf is "600", and the "KeepAlive" feature in Apache configuration is disabled. As a result, the number of Apache processes is greatly reduced, which is basically 500 ~ Between 600, although occasionally using virtual memory, but the Web Service is normal, service downtime issues rarely occur.
5. Third analysis optimization
After the previous two optimizations, the website basically runs normally, but sometimes the site cannot be accessed during peak hours. Continue to analyze the problem and run the command to view system resources, it is still caused by CPU resource depletion, but it is different from the previous two:
Cause Analysis
By observing the background logs, it is found that PHP programs frequently access database operations, and a large number of SQL statements include clauses such as where and order by. At the same time, too many database queries, most of which are complex queries, generally, the whole table needs to be traversed, while a large number of tables are not indexed. Such program code leads to a high load on the MySQL database, and the MySQL database and Apache are deployed on the same server, this is also the cause of high CPU consumption on the server.
Handling measures
Optimize the SQL statements in the program, add matching conditions on the where clause, reduce traversal of all queries, and create indexes on the fields of the where and order by clauses, and increase the program cache mechanism, through this optimization, the website is basically running normally and there is no downtime.

6. Optimization Analysis for the fourth time
After the preceding three optimizations, the optimization space for the website in terms of program code, operating system, and Apache becomes smaller and smaller, so service downtime should be avoided, it also ensures stable, efficient, and fast website operation and can be optimized from the website structure, that is, the Web and database are deployed separately, and a dedicated database server can be added, deploy the MySQL database separately. As access traffic increases, if the front-end cannot meet access requests, you can add multiple Web servers and deploy load balancing among Web servers to solve the front-end performance bottleneck; if there is still read and write pressure on the database end, you can continue to add a MySQL server to separate and deploy MySQL, so that a high-performance, high-reliability website system is built.

2. Cases of website Optimization Based on Dynamic and Static content

1. Website running environment description
Hardware environment: Two IBM x3850 servers, a single dual-core Xeon 3.0g cpu, 4 GB memory, 3 72 gb scsi disks.
Operating System: CentOS5.4.
Website architecture: Web applications are e-commerce applications based on the J2EE architecture. The Web application server is Tomcat, And the MySQL database is used. The Web and database are deployed on two servers independently.

2. Performance Problems and Solutions
Symptom description
During peak website access, the webpage cannot be opened. After the Java service is restarted, the website can run normally for a period of time, but the response becomes slow after a while, And the webpage cannot be opened completely.
Check Configuration
First, check the system resource status. When a service failure occurs, the system load is extremely high and the CPU is running at full capacity. Java processes occupy 99% of the system's CPU resources, but the memory resources are not used; check the application server information and find that only one Tomcat is running Java program. Then, view the Tomcat configuration file server. xml, server. parameters in the xml file are configured by default without any optimization.
Handling measures
Server. the default parameters of the xml file must be modified according to the characteristics of the application. For example, you can modify parameters of several Tmcat configuration files, such as connectionTimeout, maxKeepAliveRequests, and maxProcessors, increase the value of these parameters. After modifying the parameter value, we can continue to observe and find that the website service downtime interval is longer, which is not as frequent as before. However, the Java Process consumes a lot of CPU resources and the webpage access speed is extremely slow.

3. First analysis optimization
Since the Java Process consumes a lot of CPU resources, you need to check what causes serious Java resource consumption. Through the lsof and netstat commands, you can find a large number of Java request wait information, and then view the Tomcat log, when a large number of error messages, log prompts, and database connection times out, the database cannot be connected, and static website resources cannot be accessed, the following conclusions are drawn:
Cause Analysis
Tomcat itself is a Java container that uses the connection/thread model to process business requests. It is mainly used to process dynamic applications such as Jsp and servlet, although it can also be used as an HTTP server, however, the efficiency of processing static resources is very low, which is far inferior to that of Apache or Nginx. From the analysis of the phenomena observed above, we can initially determine that Tomcat cannot respond to client requests in a timely manner, resulting in an increasing number of request queues until Tomcat crashes completely. For a normal access request, after the server receives the request, it will send the request to Tomcat for processing. Tomcat then performs compilation, database access, and other operations, and then returns the information to the client, after the client receives the information, Tomcat closes the request link so that the complete access process ends. In the highly concurrent access status, many requests are instantly handed over to Tomcat for processing, so Tomcat has not completed the first request, the second request has arrived, followed by the third request, and so on, in this way, the more data is accumulated, and Tomcat eventually loses the response. The Java Process is frozen and the resources cannot be released. This is the root cause.
Handling measures
To optimize Tomcat performance, we need to reconstruct the structure. First, we need to add Apache support. Apache processes static resources and Tomcat processes dynamic requests, the Mod_JK module is used for communication between the Apache server and the Tomcat server. The advantage of using the Mod_JK module is that it can define detailed resource processing rules and submit all static resource files to Apache for processing based on the characteristics of dynamic and static websites, dynamic requests are sent to Tomcat through the Mod_JK module for processing. Through the integration of Apache + JK + Tomcat, the performance of Tomcat applications can be greatly improved.

4. Second analysis optimization
After the previous optimization measures, Java resources occasionally increase, but will automatically decrease after a period of time, which is normal, and in the case of high concurrency access, the Java Process may sometimes encounter resources rising and falling. By viewing Tomcat logs, the following conclusions are drawn from a comprehensive analysis:
To achieve higher and more stable performance, a single Tomcat Application Server sometimes cannot meet the requirements. Therefore, you must run the Tomcat-Based Load Balancing System in conjunction with the Mod_JK module, in this way, Apache is responsible for user request scheduling at the front end, and multiple backend Tomcat servers are responsible for parsing dynamic applications. by distributing the load evenly to multiple Tomcat servers, the overall performance of the website will be substantially improved.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More