Have you ever had to perform a stress test on your application, but you finally found that you do not understand what the results mean? Maybe the problem is not in the application. Maybe the problem lies in how to configure the stress testing tool. If you have experienced this situation or are about to perform a stress test, consider the following.
How to perform a test?
I often come across some development teams who receive performance requirements such as "the client handles 20 customers every hour. The team tries to turn this requirement into a test. A common method for executing such a test is to repeatedly request the server in the form of an endless loop, and then observe its effect. Generally, things are not going very smoothly, which is why I will "meet them" as a performance professional consultant. The first question I usually ask is: "How did you perform the test ?" In general, the answer is: "We place the request in a loop and then calculate the number of requests that the server can process ." This answer makes me understand that the first thing to do is to adjust the test tool itself.
If you do not understand the problems with the above tests, do not worry-there are many people like you. Performing a practical stress test is not as simple as looking at it. The problem may be very subtle. Generally, you can only identify the problem by using a less simple method. However, this does not make you look deeply at Markov chains, state change models, queuing theory, probability distribution, and so on, let's explain how to solve this common problem in many stress tests in a less boring and easy-to-understand way.
The test method will affect the test.
The first thing we need to understand is that although tests are usually defined from the client activity perspective, they must be viewed from a server-centric perspective. From the server perspective, we can only see the frequency of client access and the time it takes to process each request. Let's consider a typical example: bank cashier. The cashier usually does not know when you arrived or where you came from. All they know is that you are here, and you want them to do something for you. Now, how many people in the queue will depend on the speed at which people arrive and the time they spend meeting their requirements.
More important than how many people in the queue, will the number of people in the room decrease, remain unchanged, or increase as the number of people in the queue continues to fill in the queue? Another problem with this is whether people enter the queue faster, slower, or slower than the departure speed? If the departure speed is faster than the arrival speed, the request processing speed is faster than the request submission speed. The second case indicates that a customer has just been processed and the next one will arrive. The last case shows that people reach faster than they do. In terms of mathematics, the first system is converged, the second is stable, and the third is divergent. In these three cases, the number of people in the room is determined by the Litos Law.
Only do what you can
For layers, the Litos theorem shows that you can only do so much work. The mathematical version says this: the number of requests in the system equals the request arrival speed multiplied by the product of their time in the system. If their time in the system depends on the outbound system speed (usually referred to as the service time), you can observe the request arrival frequency (request arrival interval) and compare it with the service time, determine the status of the system.
In each case, the Litos theorem describes how the system handles workloads. Although the status may be instantaneous and intermittent, the overall trend will also be determined by the average status. For example, in a converged system, a sudden increase may occur because many people enter the queue at the same time, but the queue will remain empty, because the tendency of a converged system is to become idle. However, the third scenario is divergent, and the number of requests will increase infinitely. Does it? The answer to this question is related to how to define the domain where the request is sent.
At a random time point, users in the whole world will send a request. This must be from the server-centric perspective. Most systems are based on the assumption that at any given point in time, only a portion of the domain will send requests. Experience tells us that 10% of Internet applications are active at any point in time. We need to know this information, if we want to define the actual stress test. For example, if there are 1000 users in the world, we expect 100 users to use the system all the time. As we estimate that 10% of concurrent use will occur, and there will be another 1000 users in the user library, all our tests should simulate 100 users to repeatedly execute some request series. The danger of defining a test using this method is that it reflects the client perspective.
When we switch from a server-centric perspective to a client-centric perspective, we cannot see the speed at which requests are sent to the server. If we limit or fix the number of users (threads) allocated to execute user requests, it will be even more vague. In this case, we can see that the server is processing a stable request stream, and the request processing time seems to be getting longer and longer.
Everyone can participate
If we want the simulated thread to send a request as quickly as possible, it means that all users in the simulation domain (or even more) send the request at the same time. We assume that the server model is single, because this is easy to understand. The multi-server model works in the same way, but it is faster. The system queues requests and processes only one request at a time. Once a request is cleared, the thread immediately returns to the queue header to send the next request. Although this event sequence seems to indicate that we are processing a stable system, we are actually processing a divergent system. The only reason it looks like a stable state system is that we have limited the number of threads sending requests. As mentioned above, in a divergence system, the response time of each subsequent user is longer than that of the previous user. This means that the average response time will continue to grow without limits. Even so, we artificially limit the number of clients, so the average response time will be stable at one point, depending on the product of the number of clients and the time taken to process a single request. The response time in such a system includes the time spent in the queue, and because the time spent in the queue is less than expected, we manually expand the measurement value. The final result is that your test limits your ability to determine the scalability of the system.
How to Fix
To fix the stress test, you need to know the speed at which the user/thread sends the request. The sum of the speeds of all users is converted into the speed at which the server accepts requests. Once this value is determined, you can adjust the request speed of the tool. The following table lists several values that can be used to maintain 50 requests per second (RPS. From the server perspective, the tool requires one request every 20 ms. This view reflects the situation of a single thread. If the tool is configured with two threads, each thread should maintain a 40 ms request interval. The table also lists the time intervals when five threads and 10 threads are used.
Number of threads |
Thread frequency |
Interval between requests (Inter-request interval) |
1 |
50/sec |
20 ms |
2 |
25/sec |
40 ms |
5 |
10/sec |
100 ms |
10 |
5/sec |
200 ms |
Choice
Theoretically, this table shows how to use one, two, five, and ten threads to achieve the goal of maintaining 50 RPS. But what if the service time is longer than the time interval between requests? In this case, the thread residing on the server cannot queue the next request, and the tool cannot deliver the expected load of 50 RPS. To avoid this, we need to build some slack in the system ). The use of a large number of threads is usually not feasible for us, because we are likely to be subject to the number of available hardware and/or license (for commercial load testing tools). A common solution is to balance the time interval between requests and excessive (computing/licensing) resources. We should always remember that if the testing tool uses few resources (whether hardware, software, or threads), it will affect the effectiveness of our testing.
Think twice
We use Apache JMeter to perform load tests on random Web applications to demonstrate how stress testing tools affect the test results. In addition to knowing that the entry point of an application is Servlet, the functions of the application and detailed information on how to implement it are not important for our discussion.
Figure 1 shows the effect of increasing the number of threads in the average response time. The Pink Line does not adjust the thread. The blue line is the thread after Ms of free time is added between every two threads. We can see that the results of the two cases are very slightly different. Each case clearly indicates that the response time will also increase as the system load increases. Since we already know that the server performance will decrease with the increase of the overall load, this result is not surprising. We can see the problems only when looking at the results shown in figure 2.
Figure 2 shows that the capability to maintain a stable request speed is initially limited by the number of threads. This also does not indicate the problem, because it is reasonable to assume that the server load cannot be properly maintained before the threshold value of the specified number of threads is exceeded. Figure 2 also shows that, once the server's capacity to process requests is exceeded, the increase in threads will not significantly affect the overall speed at which the tool sends requests to the server. Another point is that the increase in response time caused by these "extra" threads does imply that they affect the system load.
The question is: why does a thread that does not increase server load seem to reduce server performance? One possible answer is that the thread does not reduce the server's performance, but is queued as soon as the server stops providing services to the thread. Because the timer that measures the response time must start when the request is sent to the server and stop when the response is received, the response time must include all the time the thread waits for the service in the queue, plus the service time. This is because the thread enters the system as soon as it leaves: The thread must wait for each other to complete before it can be served. In this scenario, the more threads, the longer the queue and response time.
The Litos theorem tells us that such a system is divergent, and thus we can conclude that tools impede the ability to determine the real bottleneck (if any.
Slow down and do more
The Lito theorem consists of two parts: service time and frequency. If we look at the world with tools, we will find that we cannot control the service time. However, we can control the frequency. Since the previous work shows that we are doing too fast (or in the wrong direction too fast), the only thing we can control is the frequency, the only thing we can do is to slow down. We can achieve this by inserting intervals between every two requests. This will reduce the request startup speed of a single thread. The interval will reduce the thread's time in the queue and provide a more realistic response time.
To test, we will start 50 adjusted threads that generate 9 requests per second. If we find that we cannot maintain a reasonable request speed, these values can be adjusted. Use response time to evaluate the effect. Finally, set the interval. We can use the data obtained from previous operations to help us make decisions.
Back to Figure 1, we can see that 8 to 9 RPS will generate 2 to 3 seconds of response time. Litos's theorem tells us that we need enough threads so that we can freely enter the system after 2 to 3 seconds (assuming we can increase the average response time ). Therefore, the average interval is about 3 seconds. To practice, we will run a series of tests to explore the value range.
The first test uses a random value between 2 to 2.5 seconds. The average interval of the value in this range is 3.5 seconds. This information can be used to calculate the theoretical request speed: 50 (number of threads) divided by 3.5 + 2 (estimated value of the target response time ). The value is 9.1RPS. The second test uses a random value between 3 to 6 seconds. The final test uses a value ranging from 4 to 6. Result 3 of these tests is shown.
Figure 3 shows that increasing the interval will shorten the average response time. However, this information must be combined with the information in figure 4. In figure 4, we can see that when the interval is increased to 4-7 seconds, the requested request speed cannot be maintained. We can add more threads to increase the load, but there is a minimum value in this step, because these tests do provide us with effective configuration.
This series of tests can help to advance stress testing to a better configuration. Our conclusion is: we should configure our testing tool to use 50 threads, with each thread's intermittent time being 3 to 6 seconds.
Conclusion
Before starting performance tuning practices (or benchmarking performance), you need to confirm that the tool does not affect the test. Well-configured tools won't let us measure data that shouldn't be measured. The failure to deliver the appropriate load or the testing tool that measures the accidental response time will affect performance tuning for the application. To determine whether this happens, the key is to measure the effect of the tool running at a normal speed. This effect can be determined by the number of transactions or requests per second that the tool meets or supports. The tool should not rotate threads immediately (to send the next request ). If this happens, you need to reduce the speed of the tool to avoid artificially overflow of the server capacity. It is usually necessary to test several times to achieve the proper balance configuration of the test tool. In the early stages of testing, do not focus on the response time (it will improve with the process of tuning the application), but on configuring the tool. Finally, don't be afraid to slow down because doing so may help you figure out what affects the performance of your application.
[Reprinted] adjustment of stress testing tools