Performance Test Result Analysis

Last Update:2014-09-14 Source: Internet

Author: User

Tags high cpu usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Performance TestingEngineers can basically master the use of test tools for load,Stress TestingBut most people cannot start with how to analyze the test results collected by the tool.WorkI hope this will help you analyze the test results. Analysis principles:

1. Specific Problem Analysis (this is because of different application systems, different testing purposes, and different performance concerns)

2. Find the bottleneck in the following order, from easy to difficult.

Server hardware bottleneck-> network bottleneck (for LAN, do not consider)-> ServerOperating SystemBottleneck (parameter configuration)-> middleware bottleneck (parameter configuration,Database,WebServers)-> application bottlenecks (SQLStatements, database design, business logic, algorithms, etc)
Note: The above process is not required in each analysis. The analysis depth should be determined based on the test Purpose and requirements. For some low requirements, we have analyzed where the hardware bottleneck of the system will be under the heavy load pressure (number of concurrent users and data volume) of the application system in the future.

3. segmented division is very effective.
Source of the analysis information:
1. error message during scenario running
2. metric data collected based on test results

1. Error prompt analysis
Analysis instance:
1 error: failed to connect to server "10.10.10.30: 8080": [10060] connection
Error: timed out error: Server "10.10.10.30" has shut down the connection prematurely
Analysis:
A. the application service is dead.
(Small users: Program problems. Program-based database processing)
B. application services are not dead
(Application Service parameter settings)
For example, if many clients are rejected to connect to the WebLogic application server and no error is displayed on the server, the value of the acceptbacklog attribute of the server element in WebLogic may be too low. If the connection refused message is received during the connection, the value should be increased by 25% each time.
C. Database Connection
(1) The performance parameter in the application service may be too small. 2. Maximum number of connections started by the database (related to the Hardware Memory ))

2 error: page download timeout (120 seconds) has expired
Cause:
A. server bottleneck caused by too many application service parameter settings
B. There are too many images on the page
C. Check that there are too many fields when the program processes the table.

2. Monitoring Index data analysis
1. Maximum number of concurrent users:
Maximum number of concurrent users that the application system can withstand in the current environment (hardware environment, network environment, software environment (parameter configuration.
If more than three
If the user's business operation fails or the server shutdown occurs, the system cannot withstand the load pressure of the current concurrent users in the current environment, the maximum number of concurrent users is the number of concurrent users that have not encountered this phenomenon.
If the maximum number of concurrent users reaches the performance requirement, and the resources on each server are in good condition, and the service operation response time also meets the user requirements, then OK. Otherwise, the cause is further analyzed based on the resources of each server and the response time of business operations.

2. Service Operation Response Time:
The running status of the analysis scheme should begin with the average transaction response time graph and transaction performance summary graph. Using the transaction performance summary graph, you can determine the transactions that have a long response time during the execution of the scheme.
Segments transactions and analyzes the performance of each page component. View which page components are causing the transaction response time to be too long? Is the problem related to the network or server?
If the server takes too long, use the corresponding server diagram to identify the problematic server measurement and identify the cause of the server performance degradation. If the network takes too long, use the network monitor to identify the network problems that cause the performance bottleneck.

3. server resource monitoring metrics:

Memory:
1. in UNIX Resource Monitoring, the index page switching rate (paging rate). If this value increases occasionally, it indicates that there were threads competing for memory. If it continues high, memory may be the bottleneck. It may also be because the memory access hit rate is low.
2WindowsIn resource monitoring, if the value of the process/private bytes counter and the process/working set counter continues to increase for a long time, and the value of the memory/available bytes counter continues to decrease, memory leakage may occur.
Memory resources are a symptom of system performance bottleneck:
High pageout Rate );
The process enters the inactive status;
The number of disk activities in the SWAp area is high;
High CPU utilization of the global system;
Out of Memory Errors)

Processor:
1 CPU usage (CPU utilization) in UNIX Resource Monitoring (the same for Windows operating systems). If the value continuously exceeds 95%, the bottleneck is the CPU. You can consider adding a processor or changing a faster processor. If the server is dedicated to SQL Server, the maximum acceptable limit is 80-85%.
The valid range is 60% to 70%.
2 In Windows resource monitoring, if the system/processor queue length is greater than 2, and the processor utilization (processor time) remains low, there is a processor congestion.
CPU resources are a symptom of system performance bottleneck:
Slow response time (slow response time)
Zero CPU idle time (zero percent idle CPU)
High CPU usage (high percent user CPU)
High CPU usage (high percent system CPU)
Long running process Queue (large run queue size sustained over time)

Disk I/O:
1. in UNIX Resource Monitoring (the same as in Windows), the index disk rate. If the value of this parameter remains high, it indicates that I/O is faulty. Consider replacing a faster hard drive system.
2 In Windows resource monitoring, if the disk Time and AVG. Disk queue length values are very high, and the page reads/sec page read speed is very low, there may be disk bottle diameter.
I/O resources are a symptom of system performance bottleneck:
High Disk Utilization)
Long disk waiting queue (large disk queue length)
The percentage of time waiting for disk I/O is too high (large percentage of time waiting for disk I/O)
Too high physical I/O rate: large physical I/O rate (not sufficient in itself)
Low buffer cache hit rate (not sufficient in itself ))
Long running process queue, but CPU is idle (large run Queue with idle CPU)

4. Database Server:
SQL Server database:
1. cache hit ratio in sqlserver resource monitoring. The higher the value, the better. If the duration is lower than 80%, consider increasing the memory.
2 If the full scans/sec (full table scan/second) Counter shows a value higher than 1 or 2, you should analyze your query to determine whether full table scan is required, and whether SQL queries can be optimized.
3 Number of deadlocks/sec (number of deadlocks per second): deadlocks are harmful to the scalability of applications and lead to poor user experience. The counter value must be 0.
4 lock requests/sec (Lock request/second). By optimizing the query, you can reduce the number of reads and the value of this counter.

OracleDatabase:
1. If the free memory is close to 0 and the hit rate of fast database storage or quick data dictionary storage is less than 0.90, you need to increase the shared_pool_size.
Hit rate of fast memory (shared SQL zone) and fast data dictionary storage:
Select (sum (pins-reloads)/sum (PINs) from V $ librarycache;
Select (sum (gets-getmisses)/sum (gets) from V $ rowcache;
Free memory: Select * from V $ sgastat where name = 'free memory ';

2 If the data cache hit rate is less than 0.90, you need to increase the value of the db_block_buffers parameter (unit: block ).
Buffer cache hit rate:
Select name, value from V $ sysstat where name in ('db block gets ',
'Consistent gets', 'Physical reads ');
Hit ratio = 1-(physical reads/(db block gets + consistent gets ))

3 ifLogsThe value of the log_buffer parameter should be increased if the requested buffer value is large.
Application of log Buffer:
Select name, value from V $ sysstat where name = 'redo log space requests ';

4. If the memory sorting hit rate is less than 0.95, increase sort_area_size to avoid disk sorting.
Memory sort hit rate:
Select round (100 * B. value)/decode (. value + B. value), 0, 1, (. value + B. value), 2) from V $ sysstat A, V $ sysstat B where. name = 'sorts (Disk) 'and B. name = 'sorts (memory )'
Note: The preceding SQL Server and Oracle database analysis is only a simple and basic analysis, especially the analysis and optimization of Oracle databases. It is a specialized technology for further analysis and relevant information.

Performance testing result analysis is the top priority of performance testing. In actual work, the analysis of the test results is complex.

Complex and requires a lot of relevant professional knowledge, so I often feel that I don't know where to get the data. This is also meLearningPerformance

I felt awkward and difficult during the test. Therefore, after studying web performance testing practices, I made the following

Note: This is only part of the Web Application Performance Analysis in Chapter 1 of the book.

I hope to discuss it with you:

I. Basic knowledge of performance analysis:

1. Several important performance indicators: corresponding time, throughput, throughput, TPS (number of transactions processed per second), point

Hit rate.

2. There are two types of system bottlenecks: Network and server. Server bottlenecks mainly involve applications and Web services.

Server, database server, and operating system.

3. Conventional and rough Performance Analysis Methods:

When the system pressure is increased (or the number of concurrent users is increased), the throughput is roughly the same as the TPS curve.

It is basically stable. When the pressure increases, the throughput curve increases to a certain extent and then changes slowly or even flat.

Network bandwidth bottleneck occurs. Similarly, if the click rate/TPS curve changes slowly or flat, the server begins to have a neck.

4. I agree with the following basic performance analysis principles:

-- From the outside to the inside, from the table to the inside, layer by layer

The analysis steps can be divided into the following three steps:

Step 1: Compare the response time with the user's expected performance to determine whether a bottleneck exists;

Step 2: Compare TN (Network Response Time) and TS (server response time) to determine whether the bottleneck occurs on the network or on the server.

Server;

Step 3: further analyze and determine the response time of the finer component until the root cause of the performance bottleneck is identified.

Ii. Take Web applications as an example to illustrate the specific analysis methods:

1. User transaction analysis:

A. Transaction summary diagram (transaction summary): displays the success and

Failed. By analyzing the successful and failed data, you can directly determine whether the system is running normally. If many failed transactions exist

It indicates that the system has a bottleneck or the program has a problem during execution.

B. Average Transaction Response Time:

The average time used for transaction execution within one second during the test scenario. It also shows the transactions in the test scenario.

The maximum, minimum, and average values. It can be used to analyze the performance trend of the system. If the response time of all transactions is basically one

Otherwise, if the average Transaction Response Time slows down, the performance will decrease,

The cause of performance degradation may be caused by memory leakage.

C. Transaction per second (TPS)

In seconds, the number of transactions that pass, fail, and stop. It can be used to determine the actual transactions of the system at any given time point.

Load. If the number of transactions passed by the application system per unit time is decreasing as the test progresses, it indicates that the server has a bottle

Neck.

D. view the total number of transactions per second (total transactions per second): displays

The total number of transactions that pass, fail, and stop each second. If the curve is close to a straight line under the same pressure, the performance basically tends

Stable; if the total number of transactions passed per unit time is less and less, the overall performance will decline. The cause may be Memory leakage or

Defects in sequence.

E. Transaction performance summary graph (transaction performance summary): displays

The minimum and maximum average execution time can be used to directly determine whether the response time meets the customer's requirements (focus on the average and maximum transaction execution time ).

Execution time ).

F. Transaction Response Time and load analysis diagram (Transaction Response Time under load ):

The figure shows the relationship between the transaction response time and the number of users at any time point, so as to master the system's user concurrency.

Data.

G. Transaction Response Time (percentage) graph (Transaction Response Time (percentile ):

The figure is a comprehensive analysis chart based on the test results. This graph should be analyzed from the whole. If the maximum response of a transaction is possible

It takes a long time, but if most transactions have an acceptable response time, the system performance is consistent.

H. Transaction Response time distribution graph (Transaction Response Time (distribution ):

The figure shows the number of transactions with different response times during the test. If the system pre-defines the minimum and maximum acceptable transaction values

If the transaction response time is large, you can use this figure to determine whether the system performance is within the acceptable range.

By analyzing this step, we can only determine where the bottleneck may be. Further exploration is needed to locate the bottleneck.

. Without textures, it looks a bit difficult. If you know more about these images, it should be relatively simple.

Performance Test Result Analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More