Author:
Xiaoyou Tianya Xia
Analysis principles:
• Specific Problem Analysis (this is due to different application systems, different testing purposes, and different performance concerns)
• Locate bottlenecks in the following order, from easy to difficult.
Server hardware bottleneck-> network bottleneck (for LAN, you can ignore it)-> Server Operating System Bottleneck (parameter configuration)-> middleware bottleneck (parameter configuration, database, web server, etc) -> application bottlenecks (SQL statements, database design, business logic, algorithms, etc)
Note: The above process is not required in each analysis. The analysis depth should be determined based on the test Purpose and requirements. For some low requirements, we have analyzed where the hardware bottleneck of the system will be under the heavy load pressure (number of concurrent users and data volume) of the application system in the future.
• Segmentation division is very effective
Source of the analysis information:
• 1. error message during scenario running
• 2 metric data collected based on test results
1. Error prompt analysis
Analysis instance:
1 • error: failed to connect to server "10.10.10.30: 8080": [10060] connection
• Error: timed out error: Server "10.10.10.30" has shut down the connection prematurely
Analysis:
• A. The application service is dead.
(Small users: Program problems. Program-based database processing)
• B. application services are not dead
(Application Service parameter settings)
For example, if many clients are rejected to connect to the WebLogic application server and no error is displayed on the server, the value of the acceptbacklog attribute of the server element in WebLogic may be too low. If the connection refused message is received during the connection, the value should be increased by 25% each time.
• C. Database Connection
(1) The performance parameter in the application service may be too small. 2. Maximum number of connections started by the database (related to the Hardware Memory ))
2 error: page download timeout (120 seconds) has expired
Cause:
• A. server bottleneck caused by too many application service parameter settings
• B. There are too many images on the page
• C. Check that there are too many fields when the program processes the table
2. Monitoring Index data analysis
1. Maximum number of concurrent users:
Maximum number of concurrent users that the application system can withstand in the current environment (hardware environment, network environment, software environment (parameter configuration.
In the running of the Scheme, if the business operation fails for more than three users or the server shutdown occurs, it indicates that the current environment, the system cannot withstand the load pressure of the current concurrent users, so the maximum number of concurrent users is the number of concurrent users that did not.
If the maximum number of concurrent users reaches the performance requirement, and the resources on each server are in good condition, and the service operation response time also meets the user requirements, then OK. Otherwise, the cause is further analyzed based on the resources of each server and the response time of business operations.
2. Service Operation Response Time:
• The running status of the analysis scheme should begin with the average transaction response time graph and transaction performance summary graph. Using the transaction performance summary graph, you can determine the transactions that have a long response time during the execution of the scheme.
• Subdivide transactions and analyze the performance of each page component. View which page components are causing the transaction response time to be too long? Is the problem related to the network or server?
• If the server takes too long, use the corresponding server diagram to identify problematic server metrics and identify the cause of server performance degradation. If the network takes too long, use the network monitor to identify the network problems that cause the performance bottleneck.
3. server resource monitoring metrics:
Memory:
1. in UNIX Resource Monitoring, the index page switching rate (paging rate). If this value increases occasionally, it indicates that there were threads competing for memory. If it continues high, memory may be the bottleneck. It may also be because the memory access hit rate is low.
2 In Windows resource monitoring, if the value of the Process \ private bytes counter and the process \ working set counter continues to increase for a long time, and the value of the memory \ available bytes counter continues to decrease, memory leakage may occur.
Memory resources are a symptom of system performance bottleneck:
High pageout Rate );
The process enters the inactive status;
The number of disk activities in the SWAp area is high;
High CPU utilization of the global system;
Out of Memory Errors)
Processor:
1 CPU usage (CPU utilization) in UNIX Resource Monitoring (the same for Windows operating systems). If the value continuously exceeds 95%, the bottleneck is the CPU. You can consider adding a processor or changing a faster processor. If the server is dedicated to SQL Server, the maximum acceptable limit is 80-85%.
The valid range is 60% to 70%.
2 In Windows resource monitoring, if System \ processor queue length is greater than 2 and processor utilization (processor time) remains low, there is a processor congestion.
CPU resources are a symptom of system performance bottleneck:
Slow response time (slow response time)
Zero CPU idle time (zero percent idle CPU)
High CPU usage (high percent user CPU)
High CPU usage (high percent system CPU)
Long running process Queue (large run queue size sustained over time)
Disk I/O:
1. in UNIX Resource Monitoring (the same as in Windows), the index disk rate. If the value of this parameter remains high, it indicates that I/O is faulty. Consider replacing a faster hard drive system.
2 In Windows resource monitoring, if the disk Time and AVG. Disk queue length values are very high, and the page reads/sec page read speed is very low, there may be disk bottle diameter.
I/O resources are a symptom of system performance bottleneck:
High Disk Utilization)
Long disk waiting queue (large disk queue length)
The percentage of time waiting for disk I/O is too high (large percentage of time waiting for disk I/O)
Too high physical I/O rate: large physical I/O rate (not sufficient in itself)
Low buffer cache hit rate (not sufficient in itself ))
Long running process queue, but CPU is idle (large run Queue with idle CPU)
4. Database Server:
SQL Server database:
1. cache hit ratio in sqlserver resource monitoring. The higher the value, the better. If the duration is lower than 80%, consider increasing the memory.
2 If the full scans/sec (full table scan/second) Counter shows a value higher than 1 or 2, you should analyze your query to determine whether full table scan is required, and whether SQL queries can be optimized.
3 Number of deadlocks/sec (number of deadlocks per second): deadlocks are harmful to the scalability of applications and lead to poor user experience. The counter value must be 0.
4 lock requests/sec (Lock request/second). By optimizing the query, you can reduce the number of reads and the value of this counter.
Oracle Database:
1. If the free memory is close to 0 and the hit rate of fast database storage or quick data dictionary storage is less than 0.90, you need to increase the shared_pool_size.
Hit rate of fast memory (shared SQL zone) and fast data dictionary storage:
Select (sum (pins-reloads)/sum (PINs) from V $ librarycache;
Select (sum (gets-getmisses)/sum (gets) from V $ rowcache;
Free memory: Select * from V $ sgastat where name = 'free memory ';
2 If the data cache hit rate is less than 0.90, you need to increase the value of the db_block_buffers parameter (unit: block ).
Buffer cache hit rate:
Select name, value from V $ sysstat where name in ('db block gets ',
'Consistent gets', 'Physical reads ');
Hit ratio = 1-(physical reads/(db block gets + consistent gets ))
3. If the value requested by the log buffer is large, increase the value of the log_buffer parameter.
Application of log Buffer:
Select name, value from V $ sysstat where name = 'redo log space requests ';
4. If the memory sorting hit rate is less than 0.95, increase sort_area_size to avoid disk sorting.
Memory sort hit rate:
Select round (100 * B. value)/decode (. value + B. value), 0, 1, (. value + B. value), 2) from V $ sysstat A, V $ sysstat B where. name = 'sorts (Disk) 'and B. name = 'sorts (memory )'
Note: The preceding SQL Server and Oracle database analysis is only a simple and basic analysis, especially the analysis and optimization of Oracle databases. It is a specialized technology for further analysis and relevant information.
Note:
The above is just my personal experience and some documents, and does not represent an expert's words. I hope you will give a speech to promote our domestic performance testing.