Monitoring and analysis of key indexes in performance testing

Source: Internet
Author: User
Tags memory usage cpu usage

First, what are the key metrics to monitor for software performance testing?

The main purpose of the software performance test is the following three points:

1. Evaluate the current performance of the system and determine if the system meets the expected performance requirements.

2. Look for possible performance problems with your software system, locate performance bottlenecks, and resolve problems.

3. Determine the performance of the software system, anticipate system load pressure endurance, evaluate system performance before application deployment.

For users, the most important concern is the current system:

1. Do you meet the on-line performance requirements?

2. How is the system ultimate bearer?

3. How stable is the system?

Therefore, for the purpose of the above performance testing and the user's concern, to achieve the above objectives and answer the user's concerns, you must first perform performance testing and clearly need to collect, monitor which key indicators, typically, performance testing monitoring indicators are mainly divided into: resource indicators and system indicators, as shown in the figure below, Resource metrics are directly related to hardware resource consumption, while system metrics are directly related to user scenarios and requirements.

Performance Test Monitoring key indicator Description:

1. Resource indicators

CPU usage: Refers to the percentage of CPU time consumed by the user process and the system process, and is generally acceptable up to 85% for a long period of time.

Memory Utilization: Memory utilization = (1-free memory/total memory size) *100%, which generally has at least 10% available memory and an acceptable upper memory usage limit of 85%.

Disk I/O: disk is mainly used to access data, so when it comes to IO operation, there will be two corresponding operations, the data when the corresponding is the write IO operation, the time when the data is read IO operation, the general use of% Disk time (disk for read and write operations occupied by the percentage of the period) Measure disk read and write performance.

Network bandwidth: Typically used counter Bytes total/sec to measure, Bytes Total/sec is expressed as the rate at which bytes are sent and received, including frame characters. Determine if the network connection speed is a bottleneck, you can use the value of this counter and the current network bandwidth comparison.

2. System Metrics

Concurrent users: The number of users who committed a request to the system at the same time in a physical moment.

Number of online users: the number of users who access the system during a certain period of time, and these users do not necessarily submit requests to the system simultaneously.

Average response time: the average of the response time that the system processes transactions. The response time of a transaction is the time it takes to submit the access request from the client to the client to receive the server response. For the system Quick Response Class page, the general response time is about 3 seconds.

Transaction success rate: In performance testing, define transactions to measure performance metrics for one or more business processes, such as user logins, save orders, and submit order actions, which can be defined as transactions, as shown in the following illustration:

The number of defined transactions that a system can successfully complete in a unit of time, which in part reflects the processing power of the system, is generally measured in terms of the success rate of the transaction, and the formula is as follows:

Time-out error rate: Mainly refers to the ratio of failures to total transactions due to timeouts or other internal system errors.

Second, how to monitor the key indicators?

1. Resource Metrics Monitoring

Mainly for the server system platform (Windows, Linux, UNIX, etc.) Resource usage monitoring.

You can use the system's own performance monitoring tools or third-party tools for monitoring, such as the "System Performance Monitor" that comes with your Windows system, as shown in the following figure:

Linux system, free, Vmstat, SAR, Iostat and other commands to monitor the memory, CPU, disk IO and other uses, as shown in the following figure:

Third-party monitoring tools, such as Spotlight,spotlight, are a visual tool developed by Quest Company to monitor multiple system platforms and databases, as shown in the following figure:

Nmon is a free tool provided by IBM for monitoring AIX and Linux system resources, which provides an intuitive statistical analysis of collected resource information through Excel, as shown in the following figure:

2. System Metrics Monitoring

System metrics monitoring is typically monitored graphically through performance testing tools such as LoadRunner, JMeter, and so on, as shown in the figure below, the number of concurrent users and the average response time graph.

Third, how to analyze the key indicators of monitoring?

Through the second part of the monitoring collection of performance metrics key indicators, how to analyze, and determine whether there is a performance bottleneck? The following are mainly from the resource indicators and system indicators two aspects are elaborated.

1. Resource Indicators Analysis

The method of determining whether the CPU is a bottleneck: Normally the CPU is working at full capacity, sometimes it cannot be judged as CPU bottlenecks, such as Linux always trying to get the CPU as busy as possible, so that the throughput of the task is maximized, that is, the CPU maximizes its use. Therefore, the general judgment CPU is the bottleneck, mainly from two aspects: one is the CPU idle lasts 0, the second is the running queue is larger than the CPU core number (experience value 3-4 times), can determine the bottleneck, for the CPU high consumption mainly caused by what, may be unreasonable application of the cause, may be the lack of hardware resources, For specific problem specific analysis, such as problem SQL statements, you need to track and optimize the SQL statements that cause the CPU to be used too high.

A way to determine if memory is a bottleneck: there is generally at least 10% available memory and an acceptable maximum memory usage of 85%. When the idle memory becomes an hour, the system starts to mobilize the disk paging file frequently, the idle memory is too small may be insufficient memory or the memory leak causes, needs to monitor the analysis according to the system actual situation.

How to determine if disk I/O is a bottleneck: disk I/O for database servers, file servers, streaming media server systems, it is more likely to become a bottleneck, generally from the following aspects of the disk I/O analysis and judgment:

① calculating the number of I/Os per disk

The number of I/Os per disk can be used to compare the I/O capability of the disk, and if the number of computed I/O per disk exceeds the nominal I/O capability of the disk, then there is a real disk performance bottleneck, and the per-disk I/O calculation method is as follows:

② monitoring disk reads and writes, if the disk for a long time large data volume read and write operations, and the CPU waits more than 20%, it indicates a problem with disk I/O, consider improving disk I/O read and write performance.

To determine whether network bandwidth is a bottleneck, the first condition of determining whether the network bandwidth is the bottleneck of system performance is whether the network bandwidth will affect the performance of the system transaction execution. For example: Reduce network bandwidth, number of concurrent users, response time and transaction pass rate and other performance indicators are unacceptable, or increase network bandwidth, the number of concurrent users, response time and transaction pass rate and other performance indicators will be significantly improved.

In the actual performance test, if the discovery always reported that the connection time-out, and the actual manual access to normal access, you can ping the application server IP or gateway IP, if the network is severely delayed or dropped packets, the network is unstable, you need to check the network.

Through the analysis of four indicators of resource indicators, in fact, all aspects are interdependent, can not isolate the single from a certain aspect of the investigation. When a performance problem occurs in one aspect, it often leads to other performance problems, for example, a large number of disk reads and writes are bound to consume CPU and IO resources, while insufficient memory will cause frequent memory page writes to disk, disk write to memory operation, causing disk IO bottleneck, and A large amount of network traffic can also cause CPU overloads, so in analyzing performance issues, you need to consider all aspects.

2. System Indicators Analysis

Concurrent users: The number of users that the system can support is an important sign of the system capacity, and the number of concurrent users is used to measure the parallel processing ability of the system under high concurrent access, generally if there is deadlock and resource contention in the system, and the system response slows down with time because the request is in queue waiting.

In general, the use of high-throughput, high database I/O, high business risk of business functions for concurrent user access testing.

Determine the maximum number of concurrent users that the system can withstand, usually to meet the following conditions:

1, business function operation average response time within a reasonable range

2, the success rate of business within a reasonable range

3, System operation without fault (no abnormal downtime)

4, the system resource indicators used within a reasonable range

Average response time: For client users, the most intuitive experience is to access the page quickly or slowly, that is, the length of response time. For example, in the continuous concurrent performance testing process, the customer perceived access to the application is very slow, monitoring the average response time also gradually become longer, then need to rely on the monitoring of the resource indicators, first to eliminate the resource constraints, and then from the application itself to locate, such as the use of page segmentation tools (such as HttpWatch, LoadRunner Page Component breakdown in anaysis analysis of pages with slow response.

Transaction success rate, time-out error ratio: The higher the transaction success rate, the greater the system processing power, and the failure transaction is mainly due to the system response is slow, resulting in access to the business function timeout, or the system business function is abnormal, not normal access, etc., need to according to the transaction error message, specific analysis.

To sum up, software performance testing is the process of executing and monitoring-〉 analysis-〉 tuning, that is, monitoring is to provide more reference data for analysis, analysis is for tuning, tuning is to solve the current system performance bottlenecks, to provide users with a better, faster customer experience. As the analysis, tuning needs to be based on specific problems of specific analysis, this article does not do too much to explain, only the common key indicators for monitoring and analysis, recommendations in the actual work from the resource indicators and system indicators two aspects, layer detection, step by step troubleshooting, performance problems are nowhere to hide, once found the cause of the problem, Performance problems will be solved! Transferred from:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.