discussion on monitoring and analysis of key indexes in software performance testing
first, the software performance test needs to monitor which key indicators.
The purpose of software performance testing is mainly as follows: three.
Ø evaluate the current performance of the system to determine whether the system meets the expected performance requirements.
Ø identify performance problems that may exist in software systems, locate performance bottlenecks, and solve problems.
Ø the performance of the software system is judged, the endurance of the system load pressure is foreseen, and the system performance is evaluated before the application is deployed.
For the user, the current system is most concerned:
Ø to meet the on-line performance requirements.
Ø system limit load how.
Ø how the system is stable.
Therefore, for the purposes of the above performance tests and the user's concerns, to achieve these goals and answer the user's concerns, you must first perform performance testing and clear what key indicators need to be collected and monitored, typically, performance test monitoring indicators are mainly divided into: resource indicators and system indicators, as shown in the following figure, Resource indicators are directly related to the consumption of hardware resources, while system indicators are directly related to user scenarios and requirements.
Performance Test Monitoring key indicators Description:
Ø Resource Indicators
CPU Usage: refers to the percentage of CPU time consumed by user processes and system processes, and in a long period of time, the general acceptable limit is no more than 85%.
Memory Utilization: Memory Utilization = (1-free memory/total memory size) *100%, generally at least 10% available memory, the memory usage can accept the upper limit of 85%.
disk I/O: disk is mainly used to access data, so when it comes to IO operations, there will be two corresponding operations, the data is written IO corresponding to the operation of data, when the corresponding is read IO operation, the general use of% Disk Time (the percentage of a disk used for read-write operations) measures disk read/write performance.
network bandwidth: generally use the counter Bytes total/sec to measure, Bytes total/sec represents the rate at which bytes are sent and received, including frame characters. To determine whether the network connection speed is a bottleneck, you can compare the value of this counter with the bandwidth of the current network.
Ø System Index:
Concurrent Users : The number of users who submit requests to the system at the same time at a physical time.
Online Users: The number of users who have access to the system for a period of time, and these users do not necessarily submit requests to the system at once.
average response time: The average of the response time that the system processes transactions. The response time for a transaction is the time it takes to submit an access request from the client to the client to receive the server response. For the System Fast Response Class page, the general response time is about 3 seconds.
transaction success rate: in performance testing, you define the performance metrics that a transaction uses to measure one or more business processes, such as user logins, save orders, and submit order actions that can be defined as transactions, as shown in the following illustration:
The number of defined transactions that the system can successfully complete in a unit of time, to a certain extent, reflects the processing capacity of the system, generally measured by transaction success rate, the formula is as follows:
Timeout Error Rate: The ratio of a transaction to a total transaction due to a timeout or other error within the system.
second, how to monitor key indicators.
Ø Resource Indicator monitoring
Mainly for the server system platform (Windows, Linux, UNIX, etc.) resource use monitoring.
You can use the system's own performance monitoring tools or third-party tools for monitoring, such as the system Performance Monitor, which is self-contained in the Windows system, as shown in the following illustration:
Linux system, free, Vmstat, SAR, Iostat and other commands to monitor the memory, CPU, disk IO, and other uses, as shown in the following figure:
Third-party monitoring tools, such as Spotlight,spotlight, is a visual tool developed by Quest Company to monitor a variety of system platforms and databases, as shown in the following illustration:
Nmon is a free tool provided by IBM to monitor AIX and Linux system resources, and can form an intuitive statistical map of the collected resource information through Excel, as shown in the following illustration:
Ø System Index monitoring
System indicator monitoring is generally through the performance testing tools (such as LoadRunner, JMeter, etc.) in a graphical way to monitor, as shown in the following figure, the number of concurrent users and the average response time diagram.
third, how to analyze the key indicators of monitoring.
Through the second part of monitoring to collect performance metrics key indicators, how to analyze, and determine whether there are performance bottlenecks. The following mainly from the resource indicators and system indicators of two aspects of the elaboration.
Ø Resource Index Analysis
the way to determine whether the CPU is a bottleneck: under normal circumstances, CPU full load, sometimes can not determine the CPU bottlenecks, such as Linux is always trying to get the CPU as busy as possible, so that the task throughput maximization, that is, maximize the CPU use. Therefore, the general Judge CPU as a bottleneck, mainly from two aspects: first, the CPU idle continues to 0, the second is the running queue is larger than the CPU core (3-4 times the experience value), you can determine the existence of bottlenecks, for CPU high consumption mainly caused by what may be unreasonable application, or may be insufficient hardware resources, Requires specific analysis of specific issues, such as problems with SQL statements, you need to track and optimize the SQL statements that cause the CPU to use too much.
A way to determine if memory is a bottleneck: There are generally at least 10% available memory, with an acceptable maximum memory usage of 85%. When the idle memory changes to an hour, the system began to transfer disk paging files frequently, the free memory is too small may be insufficient memory or memory leakage, need to monitor the analysis according to the actual situation of the system.
Determine if disk I/O is a bottleneck: disk I/O for the database server, file server, streaming media server system, more easily become a bottleneck, generally from the following aspects of disk I/O analysis to determine:
① calculation per disk I/O count
Each disk I/O count can be compared to the I/O capability of the disk, and if the calculated number of per disk I/O exceeds the disk nominal I/O capability, the disk's performance bottleneck is true, and each disk I/O calculation method is the following table:
RAID Type |
calculation Method |
RAID0 |
(reads+writes)/numbers of disks |
RAID1 |
(reads+2*writes)/2 |
RAID5 |
[Reads+ (4*writes)]/numbers of disks |
RAID10 |
[Reads+ (2*writes)]/numbers of disks |
② monitoring disk Read and write, if the disk long time for large data read and write operations, and the CPU waiting for more than 20%, the disk I/O problem, consider improving disk I/O read and write performance.
to determine whether network bandwidth is a bottleneck: whether the network bandwidth is the bottleneck of the performance of the system is the first condition is whether the network bandwidth will affect the performance of the system transaction execution. For example: Reduce the network bandwidth, the number of concurrent users, response time and transaction pass rate and other performance indicators are not acceptable, or increase network bandwidth, concurrent user number, response time and transaction pass rate and other performance indicators will be significantly improved.
In the actual performance test, if you find that you always report the connection timeout, and the actual manual access can be normal access, you can ping the application server IP or gateway IP, if there is serious network delay or packet loss, the network is unstable, you need to check the network.
Through the analysis of four indicators of resource indicators, in fact, all aspects are interdependent, can not isolate the single from a certain aspect of the investigation. When a performance problem occurs in one aspect, it often raises other performance issues, such as a large number of disk reads and writes that are bound to consume CPU and IO resources, and lack of memory can lead to frequent memory page writes to disk, disk write to memory operations, resulting in disk IO bottlenecks, and A large amount of network traffic can also cause CPU overload, so you need to consider all aspects when analyzing performance problems.
Ø System Index Analysis
Concurrent Users: The system can support the number of users is an important indicator of system capacity, concurrent users to measure the system in the high concurrent traffic access, the system's parallel processing capacity, generally if the system exists deadlock, resource contention, in concurrent access, because the request in queue waiting, The system response slows down over time.
In general, concurrent user access tests are performed with high throughput, high database I/O, and high commercial risk business functions.
The maximum number of concurrent users that the system can withstand is usually subject to the following conditions:
1, business function operation average response time within a reasonable range
2, business success rate within a reasonable range
3, System operation without fault (no abnormal downtime)
4, system resources indicators used within a reasonable range
average response time: for client users, the most intuitive experience is to access the page quickly or slowly, that is, the length of response time. For example, in the ongoing concurrent performance testing process, customer-aware access application is very slow, monitoring the average response time is also gradually getting longer, then need to rely on the monitoring of resource indicators, first of all to exclude resources constraints, and then from the application itself to locate, such as the use of page segmentation tools (such as HttpWatch, LoadRunner anaysis Page Component segmentation) analyzes slow-response pages.
transaction success rate and timeout error ratio: The higher the transaction success rate, the greater the system processing capacity, and the failure transaction is mainly due to the slow response of the system, which causes the Access service function to timeout, or the system business function is abnormal, not normal access, etc.
to sum up, software performance testing is the implementation, monitoring-〉 analysis-〉 The process of continuous tuning, that is, monitoring is to provide more reference data for analysis, the analysis is for tuning, tuning is to solve the current system performance bottlenecks, to provide users with a better, faster customer experience. Because the analysis, the tuning needs to carry on the concrete analysis according to the concrete question, this article does not do too much explanation, only the general key indicators for monitoring and analysis, it is suggested that in the actual work from the resource indicators and system indicators two aspects, layer detection, step-by-step troubleshooting, performance problems There is no place to hide, once found the cause of the problem, Performance problems can be solved.
Some of the data from the Web or other books are collated in the "note" article.