third, how to analyze the key indicators of monitoring? Through the second part of the monitoring collection of performance metrics key indicators, how to analyze, and determine whether there is a performance bottleneck? The following mainly from the resource indicators and system indicators two aspects are elaborated.
· Resource Index AnalysisThe method of determining whether the CPU is a bottleneck: Normally the CPU is working at full capacity, sometimes it cannot be judged as CPU bottlenecks, such as Linux always trying to get the CPU as busy as possible, so that the throughput of the task is maximized, that is, the CPU maximizes its use.determine the CPU as a bottleneck, generally from two aspects:
- CPU Idle duration is 0
- A bottleneck is determined by running a queue larger than the number of CPU cores (3-4 times the Experience value)
What is the main cause of high CPU consumption?
- The application may not be justified by
- It could be a lack of hardware resources, etc.
For specific problem specific analysis, such as problem SQL statements, you need to track and optimize the SQL statements that cause the CPU to be used too high.
A way to determine if memory is a bottleneck:Typically there is at least 10% available memory, and memory usage is capped at 85%. When the idle memory becomes an hour, the system starts to mobilize the disk paging file frequently, the idle memory is too small may be insufficient memory or the memory leak causes, needs to monitor the analysis according to the system actual situation.A way to determine if disk I/O is a bottleneck:Disk I/O for database servers, file servers, streaming media server systems,more likely to become a bottleneck, the disk I/O is generally judged from the following aspects:① Calculating the number of I/Os per diskThe number of I/Os per disk can be used to compare the I/O capability of the disk, and if the number of computed I/O per disk exceeds the nominal I/O capability of the disk, then there is a real disk performance bottleneck, and the per-disk I/O calculation method is as follows: Raid type calculation method RAID0 (reads+ Writes)/numbers of Disks RAID1 (reads+2*writes)/2 RAID5 [reads+ (4*writes)]/numbers of Disks RAID10 [reads+ (2*writes )]/numbers of Disks② Monitoring Disk read and write, if the disk for a long time for large data volume read and write operations, and the CPU waits more than 20%, it indicates a problem with disk I/O, consider improving disk I/O read and write performance.A method to determine whether network bandwidth is a bottleneck: Determining whether the network bandwidth is the bottleneck of system performance is the first condition of whether the network bandwidth will affect the performance of system transaction execution. For example: Reduce network bandwidth, number of concurrent users, response time and transaction pass rate and other performance indicators are unacceptable, or increase network bandwidth, the number of concurrent users, response time and transaction pass rate and other performance indicators will be significantly improved. In the actual performance test, if the discovery always reported that the connection time-out, and the actual manual access to normal access, you can ping the application server IP or gateway IP, if the network is severely delayed or dropped packets, the network is unstable, you need to check the network.through the analysis of four indicators of resource indicators, in fact, all aspects are interdependent, can not isolate the single from a certain aspect of the investigation. When there is a performance problem in one aspect, it often leads to other performance problems .For example, a large number of disk reads and writes are bound to consume CPU and IO resources, and insufficient memory will lead to frequent memory page write disk, disk write to memory operations, resulting in disk IO bottleneck, while a large amount of network traffic will also cause CPU overload, so in the analysis of performance problems, Needs to be considered from all sides.
• System Index AnalysisConcurrent users: The number of users that the system can support is an important sign of the system capacity, and the number of concurrent users is used to measure the system's parallel processing ability under high concurrency, generally if the system existsdeadlock, resource contention, under concurrent access, the system response slows over time because the request is in queue waiting. Under normal circumstances,Concurrent user access testing with high-throughput, high-database I/O, high-business-risk business functions。determine the maximum number of concurrent users that the system can withstand, usually to meet the following conditions:1, Business function operation average response time within a reasonable range 2, the success rate of business within a reasonable range3, System operation without fault (no abnormal downtime)4, the system resource indicators used within a reasonable range Average Response time: For client users, the most intuitive experience is to access the page quickly or slowly, that is, the length of response time. For example, in the continuous concurrent performance testing process, the customer perceived access to the application is very slow, monitoring the average response time also gradually become longer, then need to rely on the monitoring of the resource indicators, first to eliminate the resource constraints, and then from the application itself to locate, such as the use of page segmentation tools (such as HttpWatch, LoadRunner Page Component breakdown in anaysis analysis of pages with slow response.transaction success rate, time-out error: The higher the transaction success rate, the greater the system processing power, and the failure transaction mainly due to the slow system response, resulting in access to the business function timeout, or the system business function is abnormal, not normal access, etc., according to the transaction error message, specific analysis. In summary, software performance testing isPerform and monitor-〉 analysis-〉 The ongoing process of tuning, that is, monitoring is to provide more reference data for analysis, analysis is for tuning, tuning is to solve the current system of performance bottlenecks, to provide users with a better, faster customer experience. As the analysis, tuning needs to be based on specific problems of specific analysis, this article does not do too much to explain, only the common key indicators for monitoring and analysis, recommendations in the actual work from the resource indicators and system indicators two aspects, layer detection, step by step troubleshooting, performance problems are nowhere to hide, once found the cause of the problem, Performance problems will be solved! Original: http://www.51testing.com/html/18/n-3549018-2.html
[Go] How to analyze key metrics for monitoring