How to locate performance bottlenecks in performance testing

Last Update:2016-01-24 Source: Internet

Author: User

Tags high cpu usage jprofiler jconsole

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The concept of performance testing is what, the basic purpose is what, I think everyone is basically clear, not to be detailed, in short, performance testing is just one way in the test process, to help our function better run, if the functional test is available, easy to use, to meet the needs, users for purposes, performance testing is nothing but to make these purposes smoother. There is no professional concept, nothing more than the realization of two words: easy to use!

Therefore, performance testing in the course of this test, one of the transitional work, is the implementation of the problem, positioning, positioning of the function, the location of the load, the most important, of course, is the problem of the "bottleneck", contact performance testing is not deep, more non-experts, their own understanding, Bottlenecks occur in the following areas:

1, network bottlenecks, such as bandwidth, traffic and other forms of network environment
2. Application service bottleneck, such as basic configuration of middleware, cache, etc.
3, System bottleneck, this is more commonly used: Application server, database server and client CPU, memory, hard disk and other configuration
4, the database bottleneck, in Oracle, for example, the default parameter set in SYS
5, the application itself bottleneck,

For the network bottleneck, now take a little, but not without, first think about if there is network congestion, broken network, bandwidth is occupied by other resources, speed limit and so on, the application or system will be what situation, for the web, is nothing more than timeouts, http400,500 and other errors, for some client programs, may also be time-out, drop the line, the server issued, need to return the server to obtain the information can not get to a more obvious situation, should be the transaction commits slow, if the package transaction code is not perfect, the general cause of the error, is nothing more than the data submitted incomplete, or because the net end cause + code defects caused by repetitive submissions. So integrated down, it must be considered that the network has a bottleneck, and then consider the network has a problem, how to optimize, is the need to optimize the interaction of some code, or interface and so on.

Application service bottleneck positioning, more complex, learning, but there is a lot of information on the Internet can be consulted. Generally like tomcat,weblogic, there are default settings, but also through the schema and maintenance personnel to test debugging some values, these values can generally meet the needs of the program release, do not have to make too many settings, perhaps we know the most basic is the java_opts settings, Parameters such as maxthreads,time_out we do performance testing with tools such as Lr,jemeter or webload, especially for application services, and if application services have bottlenecks, Generally we set the log4j.properties, logs will be recorded. Then, according to the log, to further determine the problem of application services

System bottleneck, although the location is relatively complex, but there are many predecessors of the experience of reference, do not explain, I believe that with LR counterparts, can also be obtained from the performance of some indicators of the register, plus nagios,cacti, it can be obvious that the system which resources are sufficient, which resources are obviously not enough. However, the general system bottleneck is caused by the application itself. Regarding the analysis and localization of craved, we need to classify and locate the bottleneck of the application itself.

Now basically all of the things, are inseparable from the database of the background, the bottleneck of the database is not really know what the concept of database administrator work, the database administrator to do the work, may be a bottleneck positioning work, such as: query for a v$sys_event,v$sysstat,v$ Syssql, such as the table, compared to the daily normal situation of the monitoring data, see if there are anomalies and so on. In other ways, I don't know much about it.

Application bottleneck, this is the test process is the most need to pay attention to, need testers and developers to work together, and then positioning, I do most of the execution, such as there will be a script to run, developers will combine the tools such as Jprofiler to see the heap traversal, The thread profiling situation determines where the problem is. This is roughly the case, no actual operation.

Gradually refine the analysis, the first can monitor some common measurement of CPU, memory, disk performance indicators, to conduct a comprehensive analysis, and then according to the specific situation of the system under test, the initial problem location, and then determine more detailed monitoring indicators to analyze.

When you suspect that there is not enough memory:

Method 1:

"Monitoring indicators": Memory Available MBytes, Memory pages/sec, page read/sec, page faults/sec

"Reference value":

If the Page reads/sec ratio continues to remain at 5, it indicates that there may be insufficient memory.

Page/sec Recommended 00-20 (this value will always be high if the server does not have enough memory to handle its workload.) If greater than 80, indicates a problem).

Method 2: Analyze performance bottlenecks based on physical Disk value

"Monitoring metrics": Memory Available MBytes, Pages read/sec,%disk time and Avg.Disk Queue Length

"Reference value":%disk time recommended threshold value 90%

When the memory is low, a bit of process is transferred to the hard disk to run, resulting in a sharp decline in performance, and a memory-starved system often shows a high CPU utilization, because it needs to constantly scan the memory, the memory of the page moved to the hard disk.

When a memory leak is suspected

"Monitoring indicators": Memory Available MBytes, process\private bytes and process\working set,physicaldisk/%disk time

"description":

In Windows Resource monitoring, if the values of the Process\Private bytes counter and the Process\Working set counter continue to rise over a long period of time, and the value of the Memory\Available bytes counter continues to decrease, There is a good chance of a memory leak. Memory leaks should be tested over a long period of time to investigate the application response when all memory is exhausted.

CPU Analysis

"Monitoring Indicators":

System%processor time Cpu,processor%processor time CPU

Processor%user Time and processor%privileged time

System\Processor Queue Length

Context Switches/sec and%privileged time

"Reference value":

System\%total processor Time does not last more than 90%, if the server is dedicated to SQL Server, the maximum acceptable limit is 80-85%, with a reasonable range of 60% to 70%.

Processor%processor time less than 75%

System\Processor Queue length value, less than the total number of CPUs +1

CPU Bottleneck Problems

1, System\%total Processor Time if the value continues to exceed 90%, and the processor is blocked, the entire system is facing a processor bottleneck.

Note: In some multi-CPU systems, although the data itself is not large, but the load situation between the CPU is extremely uneven, it should also be seen as a system to create a processor bottleneck.

2, excluding memory factors, if the value of the Processor%processor time counter is larger, while the network card and hard disk value is lower, then you can determine the CPU bottleneck. (In the case of low memory, a bit of process will be transferred to the hard disk to run, resulting in a sharp decline in performance, and a memory-starved system often shows a high CPU utilization, because it needs to constantly scan the memory, the memory of the page moved to the hard disk.) ）

Causes of high CPU usage:

Frequent execution of programs, complex operations, heavy CPU consumption

Database query statement complex, a large number of where clauses, order BY, the group by sort, etc., CPU prone to bottlenecks

Insufficient memory, IO disk issue increases CPU overhead

Disk I/O analysis

"Monitoring indicator": Physicaldisk/%disk time,physicaldisk/%idle time,physical disk\ avg.disk Queue Length, Disk sec/transfer

"Reference value":%disk time recommended threshold value 90%

In Windows Resource monitoring, if the value of% Disk Time and Avg.Disk Queue length is high, and page reads/sec is low, there may be a disk bottle diameter.

Processor%privileged time the value of this parameter is always high, and if the physical disk counter is only a large%disk, and the other values are relatively moderate, the hard disk may be a bottleneck. If several values are larger, then the hard drive is not a bottleneck. If the value continues to exceed 80%, it may be a memory leak. If the value of this counter (processor%privileged time) is also high when the value of the physical disk counter is high, consider using a faster or more efficient disk subsystem.

Disk Sec/transfer Generally, this value is less than 15ms is best, between 15-30ms is good, between 30-60ms is acceptable, more than 60ms need to consider the replacement of hard disk or hard disk RAID mode.

Average Transaciton Response Time (transaction average response times) as the test time changes, the speed of the system processing transactions begins to slow down, which indicates that the application system will have a downward trend as the production time changes.

Transactions per Second (number of transactions per second/tps) when the pressure increases, the clickthrough rate/tps curve If the change is slow or there is a flat trend, it is likely that the server began to bottleneck

Hits per Second (clicks per second) can determine if the system is stable by looking at the click-through times. Decreased CTR of the system usually indicates that the server is responding slowly and needs further analysis to find the bottleneck of the system.

The throughput (throughput rate) can assess the amount of load generated by virtual users based on the throughput of the server, as well as the ability of the server to handle traffic and whether there are bottlenecks.

Connections (number of connections) when the number of connections reaches a steady state and the transaction response time increases rapidly, adding a connection can greatly improve performance (transaction response time will be reduced)

Time to primary buffer breakdown (over time) (the first Buffer time subdivision (with the change) can be used to determine when a server or network problem occurs during a scene or session step run.

Performance issues that you have encountered:

1. In the case of high concurrency, the resulting processing failure (for example: Database connection pool is too low, the number of server connections exceeds the limit, database lock control is not considered enough)
2. Memory leaks (for example: In the long run, the memory is not properly released, the outage occurs, etc.)
3. CPU Usage deviation (for example: high concurrency leads to high CPU usage)
4. Log print too much, the server has no hard disk space

How to locate these performance issues:

1. Check the system log, log is the Magic weapon to locate the problem, if the full log records, it is easy to find the problem through the log.

For example, when the system goes down, the system log prints out an out-of-memory error when a method executes, and we can follow the lead and quickly locate the problem that caused the memory overflow.

2. Using performance monitoring tools, such as: Java Development B/s structure of the project, can be through the JDK jconsole, or jprofiler, to monitor server performance, Jconsole can remotely monitor the server's CPU, memory, thread and other state, and draw a change graph.

Use of Spotlight to monitor database usage.

The performance points we need to focus on are: CPU load, memory utilization, network I/O, etc.

3. Tools and logs are just means, and in addition, a reasonable performance test scenario needs to be designed

Specific scenarios are: Performance testing, load testing, stress testing, stability testing, surge testing, etc.

Good test scenarios to find bottlenecks more quickly and locate bottlenecks

4. Understand the system parameter configuration, can perform later performance tuning

In addition, I would like to say a digression, that is, the use of performance testing tools

In the beginning with LoadRunner and JMeter, when doing high concurrency test, there has been no server crushing, the two programs fall first of their own situation.

If this problem is encountered, it can be resolved by remotely invoking multiple client services and dispersing the stress of the performance test tool client.

The purpose of this is to say that when doing performance testing, we must make sure that bottlenecks do not occur on our own test scripts and test tools.

How to locate performance bottlenecks in performance testing

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More