Use IBM Performance analysis tools to address performance issues in a production environment

Source: Internet
Author: User
Tags memory usage requires

Preface

Enterprise Application system software usually has the requirement of concurrent number and response time, which requires a large number of users to complete the business operation in high response time. These two performance indicators often determine whether an application system software can be successfully online, and this also determines whether a project can finally be acceptance of success, customer recognition, can continue to grow and develop in an industry. This shows the importance of performance for an application system, of course, it seems to have become the software industry unspeakable pain-most of the application system in the line before the project team members have to undergo a process of reinvent themselves.

The establishment of production environment includes many aspects, such as storage planning, operating system parameter adjustment, database tuning, application system tuning, and so on. These aspects affect each other, only through continuous adjustment and optimization, can achieve the maximum utilization of resources, meet customer requirements for system throughput and response time. In countless practical experiences, many software experts have been able to agree: the optimization of the application system itself is essential, otherwise, even if there is a large amount of memory will be consumed, especially the generation of OOM (out of Memory) error, it will greedily eat your memory space, until the system downtime.

Memory leaks-hard to chew bones

There are many reasons for OOM, which can be simply divided into two situations, one is that physical memory is limited, and when this happens, it is easy to find the cause, but it generally does not occur in the actual production environment. Because the production environment often has a configuration sufficient to meet the requirements of the application system, this is purchased at the outset of the project according to the system requirements.

Another cause of OOM is the application system itself to the improper use of resources, configuration, resulting in the continuous increase in memory usage, resulting in the JVM Heap Memory is depleted, such as the correct release of JDBC Connection Pool objects, there is no limit to the use of Cache The size of the Cache and so on. This article does not discuss all kinds of situations, but takes a project case as the background, explores the way to solve such problems, and sums up some best practices for reference by development engineers.

Introduction to the project background

Project background:

Intranet users 500 people, need to do business online at the same time (one hour at noon, 6 o'clock in the evening work).

The production environment uses the traditional master-slave type, does not make the Cluster, provides HA high availability.

The server is AIX p570,8u,16g, but only half of the resources, the 4u,8g, are available for the new system.

Project at the beginning of March, the author and the architect has been to the client site simple deployment of one or two times, mainly the installation of software, application deployment, test application is not able to run up, counted as the preparatory work before the line. Application on line (trial run) on the same day, the project team to stay at the customer site, looking at the number of users logged on, we have no bottom, the peak time to 440, the system began a little reaction slowed, but still carried down, finally due to the current limited resources, and so the other half of the resources, it will certainly be no problem. (Note that the increase in resources, tuning the most of the work to be done again, system-level, database-level, etc., which is why the following recommendations if resources available, the best one-step reasons.) In order to temporarily resolve the problem of limited resources, through consultations with customers, decided to 12:30 noon and 11 o'clock through system scheduling restart the application server, so that a few hours apart, to manually clean up the memory.

Project in the trial run stage, there are still new child applications began to put into the joint, and customers every day to propose such a change in demand, if the need for a very urgent, it is necessary to revise at any time, the next modification use. There is no sufficient time for regression testing after the modification, the new deployment of code will inevitably have such a problem, encountered several times this situation, and finally have to use the business system to restart the application system, so there will be a business termination caused by inconsistent data, but also to modify the maintenance of these data, increased the workload. During the operation of the application several times unusually slow, because the business can not be interrupted for too long, need to quickly restore the system into use, so often restart the application server to free memory. After checking the log, only to find that the log impressively recorded a OOM error, which attracted the attention of the project manager, asked the architect to further study the issue to confirm.

But a few months passed, problems still arise, so through the coordination of customers and companies, please several experts, including operating system experts, database experts, most of the experts in the inspection, the conclusion is: most of the parameters need to be adjusted have been adjusted, or from the application system itself to find reasons, It seems that we have to solve it on our own. (The end result also proves that such a mysterious and covert OOM problem is difficult to find at a glance, the role of the tool can not be ignored.) )

We have added log processing to capture all SQL statements that execute for more than 10 seconds and log them in the application log for future reference, through the skeleton code of the underlying package, mainly the DAO layer and the interface of the database. At the same time, through the database monitoring aids to provide suggestions for all the exceeded SQL through the establishment of index, or modify the data structure (mainly by creating redundant fields to avoid multiple Table association query) to optimize. So after a few days, there is basically no execution time of more than 10 seconds of SQL statements, it can be said that our application layer optimization has reached the standard.

However, the problem of downtime has not been completely resolved, and several times, through a brief console monitoring, found that the phenomenon of wire waiting, and two or three times produced several G-size heapdump files, accompanied by a javacore file generation. Because each downtime requires urgent processing, not allow long time monitoring, only the application server log and the resulting heapdump files, do further research. With log checking, we found that several downtime occurred on the same two business points, but the code that handles the business function has been checked and analyzed for many times, and there is still no reason to find it. It seems that only hope for the heapdump and javacore of downtime, and then began to focus on the OOM generated by the analysis of these files.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.