Mixed up in the forum for many days, found that more and more performance testing engineers are basically able to use test tools for load stress testing, but most of the analysis tools to collect the results of the test is not possible, the following I put the personal work of the experience and collected the relevant information compiled out, Hope to be able to analyze the results of the test to help you.
Analysis principle:
• Specific problem specific analysis (this is due to different application systems, different testing purposes, different performance concerns)
• Find bottlenecks in the following order, from easy to difficult.
Server hardware bottleneck-〉 network bottleneck (for LAN, can not be considered)-〉 server operating system bottleneck (parameter configuration)-〉 middleware bottleneck (parameter configuration, database, Web server, etc.)-〉 application bottleneck (SQL statement, database design, business logic, algorithm, etc.)
Note: The above process is not required in every analysis, to determine the depth of the analysis according to the testing purposes and requirements. For some low-demand, we analyze the application system in the future under the load pressure (number of concurrent users, the amount of data), the hardware bottleneck of the system is enough.
• Segmented exclusion is effective
Information sources for analysis:
• 1 depending on the error message during the scene operation
• 2 monitoring metrics data collected based on the test results
A. Error message Analysis
Analysis Examples:
1 error:failed to connect to server "10.10.10.30:8080": [10060] Connection
error:timed out Error:server "10.10.10.30" have shut down the connection prematurely
Analysis:
A, application services died.
(Small User: A problem on the program.) Issues with the database on the program)
・the, app service not dead
(Application service parameter setting problem)
Example: In many client connections WebLogic The application server is rejected, and there is no error on the server side, it is possible that the Acceptbacklog attribute value of the server element in the WebLogic is set too low. If you receive a connection refused message when you connect, the value should be increased by 25% each time
C, database connection
(1, the performance parameters of the application service may be too small 2, the maximum number of database startup connections (related to hardware memory))
2 error:page Download timeout (seconds) has expired
Analysis: May be caused by the following causes
A, Application service parameter setting is too large to cause server bottleneck
・the, too many pictures on the page
C, check the field too much when the program is processing the table
two. Monitoring Metrics Data Analysis
1. Maximum number of concurrent users:
The maximum number of concurrent users that the application system can withstand in the current environment (Hardware environment, network environment, software Environment (parameter configuration)).
In the scenario run, if a business operation that has more than 3 users fails, or a server shutdown, the current environment, the system can not withstand the current load pressure of concurrent users, then the maximum number of concurrent users is the previous one does not appear the number of concurrent users.
If the maximum number of concurrent users measured reached the performance requirements, and the server resources in good condition, business operation response time has reached the user requirements, then OK. Otherwise, the reason is further analyzed based on the resource situation of each server and the response time of the business operation.
2. Business Operation Response Time:
• The analysis scenario operation should start with the average transaction response time graph and the transactional performance summary graph. Using the transactional Performance summary graph, you can determine which transactions are responding too long during scenario execution.
• Subdivide transactions and analyze the performance of each page component. See what page components are causing the long transaction response time? Is the problem related to the network or server?
• If the server is taking too long, use the appropriate server map to determine the server metrics that are problematic and to pinpoint the cause of server performance degradation. If your network is taking too long, use the Network Monitor graph to identify the network issues that are causing the performance bottleneck
3. Server Resource monitoring metrics:
Memory:
1 The indicator memory paging rate (Paging rates) in UNIX resource monitoring, if the value is occasionally higher, indicates that the thread is competing for memory at that time. If it continues to be high, then memory can be a bottleneck. It is also possible that the memory access hit ratio is low.
2 Windows Resource Monitoring, if the values of the Process\Private bytes counter and the Process\Working set counter continue to rise over a long period of time, and the value of the Memory\Available bytes counter continues to decrease, There is a good chance of a memory leak.
Memory resources are a symptom of system performance bottlenecks:
Very high page change rate (pageout);
Process enters inactive state;
The number of active times for all disks in the swap area is high;
Can be high global system CPU utilization;
Memory error Not enough (out of errors)
Processor:
1 UNIX resource monitoring (similar to the Windows operating system) in Metric CPU utilization (CPU utilization), if the value continues to exceed 95%, indicates that the bottleneck is CPU. Consider adding a processor or swapping it for a faster one. If the server is dedicated to SQL Server, the maximum acceptable limit is 80-85%
The range of reasonable use is 60% to 70%.
2 Windows Resource Monitoring, if the System\Processor Queue length is greater than 2, and processor utilization (Processor time) has been low, there is a processor blocking.
CPU resources are a symptom of system performance bottlenecks:
Very slow response times (slow response time)
CPU idle time is 0 (zero percent idle CPU)
Excessive user consumption CPU time (high percent user CPU)
Excessive CPU time (high percent system CPU)
Long running process queue (large run queue size sustained over time)
Disk I/O:
1 UNIX resource monitoring (similar to the Windows operating system), the indicator disk exchange rate, if the parameter value has been high, indicates a problem with I/O. Consider replacing a faster hard drive system.
2 Windows Resource Monitoring, if the value of disk Time and Avg.Disk Queue length is high and page reads/sec is low, there may be a disk bottle diameter.
I/O resources are a symptom of system performance bottlenecks:
High disk Utilization (utilization)
Too long disk wait queue (large disk Queue Length)
The percentage of waiting disk I/O is too high (large percentage of time waiting for disk I/O)
Too high physical I/O Rate: large physical I/O rates (not sufficient in itself)
Low cache Hit rate (ratio (not sufficient in itself))
Too long to run the process queue, but the CPU is idle (large run queue with idle CPU)
4. Database server:
SQL Server database:
1 SQL Server resource monitoring the indicator cache ClickThrough rate (cached hit Ratio), the higher the value the better. If it lasts below 80%, you should consider increasing the memory.
2 If the full scans/sec (whole table Scan/sec) counter displays a value that is higher than 1 or 2, you should analyze your query to determine whether full table scanning is really required, and whether the SQL query can be optimized.
3 Number of Deadlocks/sec (# of Deadlocks/sec): Deadlocks are very harmful to the scalability of the application and can lead to a poor user experience. The value of this counter must be 0.
4 Lock Requests/sec (Lock request/sec), the value of this counter can be reduced by optimizing the query to reduce the number of reads.
Oracle Database:
1 If free memory is close to 0 and the cache is fast or the data dictionary has a fast hit ratio of less than 0.90, you need to increase the size of the shared_pool_size.
Fast Save (Shared SQL area) and data dictionary fast hit ratio:
Select (sum (pins-reloads))/sum (pins) from V$librarycache;
Select (sum (gets-getmisses))/sum (gets) from V$rowcache;
Free Memory: The SELECT * from V$sgastat where name= ' freedom memories ';
2 if the cache hit ratio of the data is less than 0.90, the value of the Db_block_buffers parameter (unit: block) needs to be increased.
Buffer Cache Hit Ratio:
Select Name,value from V$sysstat where name in (' db block gets ',
' Consistent gets ', ' physical reads ');
Hit Ratio = N (physical reads/(DB block gets + consistent gets)
3 If the log buffer request has a large value, you should increase the value of the Log_buffer parameter.
Application of Log buffers:
Select Name,value from v$sysstat where name = ' Redo log space requests ';
4 If the memory sort hit ratio is less than 0.95, you should increase the sort_area_size to avoid sorting the disks.
Memory Sort hit Ratio:
Select Round ((100*b.value)/decode ((A.value+b.value), 0, 1, (A.value+b.value)), 2) from V$sysstat A, V$sysstat b where a.na Me= ' sorts (disk) ' and b.name= ' sorts (memory) '
Note: The above SQL Server and Oracle database analysis, just a few simple, basic analysis, especially the analysis and optimization of Oracle database, is a specialized technology, further analysis can find relevant information.
Description:
The above is only a personal experience and some of the data collation, does not represent the expert's statement. To make a point, there are different views and more in-depth analysis, I hope we have to speak in order to promote our domestic performance testing work.
[Go] performance test (concurrent load pressure) test analysis-Brief