Transferred from: http://coolshell.cn/articles/17381.html
Occasionally saw Ali middleware Dubbo performance test Report, I think this performance test report to people do not understand the performance test performance testing, I think this report will bring the public ditch, so, want to write this article, do a little science.
First, the main problems in this test report are as follows:
1) are all averages . To be honest, the average is very unreliable.
2) The response time is not pegged to the throughput Tps/qps . It is completely wrong to just test the low-speed rate.
3) Response time and throughput are not linked to success rates.
Why the average is not reliable
About why the average is not reliable, I believe that when you read the news, you can often see, average wages , prices , average expenses , and so on, and so on, you know why the average is not reliable. (These are math games, for students of science and engineering, innate immunity)
Software performance testing is the same, the average is also not reliable, here you can see this detailed article "Why averages Suck and percentiles is great", I am here to briefly mention.
We know that the performance test, the results of the test data is not always the same, but there is a high and low, if the average will appear such a situation, if, tested 10 times, 9 times is 1ms, and 1 times is 1s, then the average data is 100ms, it is clear that this completely can not reflect the performance test situation, Perhaps the 1s request is an abnormal value, is a noise, should be removed. Therefore, we will see in some of the judges score to remove a maximum of one of the lowest points, and then calculate the average.
In addition, the median (Median) may be slightly more reliable than the average, the meaning of the so-called median is to put a set of data in the order of size, in the middle of a number is called the median of this set of data, which means at least 50% of the data is lower or higher than the median.
Of course, the most correct statistical practice is to use percentages to distribute statistics. In the English language of tp–top percentile, TP50 means that 50% of requests are less than a certain value, and TP90 indicates that 90% of the request is less than a certain time.
For example: We have a set of data: [10ms, 1s, 200ms, 100ms], we put it from small to large order: [10ms, 100ms, 200ms, 1s], so we know, TP50, is 50% of the request ceil (4*0.5) = 2 time is less than 100ms, TP90 is 90% of the request Ceil (4*0.9) =4 time is less than 1s. So: TP50 is 100ms,tp90 is 1s.
What I used to do in the Reuters performance test for the response time of the financial system is that99.9% of the requests must be less than 1ms, and that the average time must be less than 1ms. Limit of two conditions.
Why response time (latency) is tied to throughput (thoughput)
The performance of the system is meaningless without looking at the response time if only the throughput is seen. My system can top 100,000 requests, but the response time has reached 5 seconds, such a system is no longer available, such a throughput is meaningless.
We know that when the amount of concurrency (throughput) goes up, the system becomes more unstable, the response time fluctuates more and more, the response time becomes slower, and the throughput is more and more low (as shown), including CPU usage. Therefore, when the system becomes unstable, the throughput is meaningless. Throughput is meaningful only when the system is stable.
Therefore, the value of the throughput must have a response time to the card. For example:when the TP99 is less than 100ms, the maximum number of concurrent systems that can be hosted is 1000qps. This means that we have to constantly test on different concurrent numbers to find the maximum throughput when the software is most stable.
Why response time throughput and success rates are linked
This should not be difficult for us to understand, if the request is not successful, will also do hair performance testing. For example, I say my system can be 100,000 concurrent, but the failure rate is
40%, then, this 100,000 concurrency is completely a joke.
The tolerance for performance test failure rates should be very low. For some critical systems, the number of successful requests must be at 100%, not at all ambiguous.
How to do performance testing rigorously
In general, performance testing should consider a number of factors uniformly:thoughput throughput ,latency response time , resource utilization (Cpu/mem/io/bandwidth ... ), success rate , system stability .
The following performance tests are essentially the source of self-old boss Thomson Reuters, a company that does real-time financial data systems.
One, you have to define the response time of a system latency, the recommendation is TP99, and the success rate . For example, the Reuters definition: 99.9% response time must be within 1ms, the average response time within 1ms, 100% of the request is successful.
Two, under this response time limit, find the highest throughput . The data used for the test need to have large and small data of various sizes and can be mixed. It is best to use the test data on the production line.
third, do soak test in this throughput, for example: Use the second step to test the throughput for a continuous 7-day uninterrupted pressure measurement system. then collect the CPU, memory, hard disk/network IO, and other indicators to see if the system is stable, for example, the CPU is stable, and the memory usage is smooth. So, this value is the performance of the system.
Four, find the limit value of the system. For example: In the case of a success rate of 100% (regardless of the length of response time), the system can adhere to 10 minutes of throughput.
Five, do burst Test. The throughput performed in the second step is executed for 5 minutes, then the limit value obtained in the fourth step is executed for 1 minutes, then the throughput of the second step is executed for 5 minutes, then the permission value of step fourth is executed for 1 minutes, and so on for a period of time, such as 2 days. Collect system data: CPU, memory, hard disk/network IO, observe their curves, and corresponding response time to ensure the system is stable.
six, low throughput and network packet testing. sometimes, at low throughput time, may lead to latency rise, such as Tcp_nodelay parameters do not open will cause latency rise (see the TCP of those things), and network packets will cause the bandwidth with dissatisfaction will also lead to performance, so, Performance testing also needs to be selected according to the actual situation to test these two scenes.
(note: In Reuters, Reuters will use the second step to get the throughput multiplied by 66.7% as the system's soft alarm line, 80% as the system's hard alarm line, and the limit value only to carry the sudden peak)
Is it annoying lock? Yes, just because, this is engineering, engineering is a science, science is rigorous.
You are welcome to share your experience and methods of performance testing.
(End of full text)
Go: What should the performance test do? (from coolshell.cn)