This series of articles index the "Response Spring's word Wizard"
Previously feed-response Flow | Reactor 3 Quick Start | Spring Webflux Get started quickly
This article source
1.4 Benefits of asynchronous non-blocking from load testing
The front is always "Amway" asynchronous non-blocking benefits, below we will actually feel the performance of responsive programming in high concurrency environment. The benefits of asynchronous non-blocking can be seen in the I/O operation, whether file I/O, network I/O, or database read/write, there may be blocking situations.
Our test content has three:
- By first creating a Web service based on WEBMVC and Webflux, to compare the performance gains of asynchronous non-blocking, we simulate a simple scenario with a delay and then start the service using Gatling for testing and analysis;
- Since the application of microservices architectures is becoming more and more widespread, our first-step test project is to further observe the test data in the case of invoking deferred services, in fact, mainly for the testing of the client: blocking
RestTemplate
and non-blocking WebClient
;
- Performance testing and analysis for MongoDB's synchronous and asynchronous database drivers.
Description: This section does not carry out rigorous load testing for specific business scenarios based on performance tuning requirements. The test scenario in this section is simple and straightforward, and the friends get to my point.
In addition: Because this section is mainly for horizontal contrast testing, so there is no need for specific hardware resource configuration, but it is recommended to test in the Linux environment , I originally ran on the WIN10, when the number of users came up after a lot of requests failed, The test data below was run on a laptop system for Deepin Linux (Debian).
So let's start building this simple, rough test environment.
1.4.1 Load test analysis with delay
1) Build a project to be tested
We created two projects based on WEBMVC and Webflux respectively: mvc-with-latency
and WebFlux-with-latency
.
To simulate blocking, we created an API with latency in each of the two projects /hello/{latency}
. For example /hello/100
, the response is delayed by 100ms.
mvc-with-latency
Created in HelloController.java
:
@RestController public class HelloController { @GetMapping("/hello/{latency}") public String hello(@PathVariable long latency) { try { TimeUnit.MILLISECONDS.sleep(latency); // 1 } catch (InterruptedException e) { return "Error during thread sleep"; } return "Welcome to reactive world ~"; } }
- Use sleep to simulate a blocking situation in a business scenario.
WebFlux-with-latency
Created in HelloController.java
:
@RestController public class HelloController { @GetMapping("/hello/{latency}") public Mono<String> hello(@PathVariable int latency) { return Mono.just("Welcome to reactive world ~") .delayElement(Duration.ofMillis(latency)); // 1 } }
- Use
delayElement
operators to implement delays.
The application.properties
port numbers 8091 and 8092 are then configured in each:
server.port=8091
Launch the app.
2) Writing load test scripts
In this section we use Gatling to test. Create a test project gatling-scripts
.
Add Gatling dependencies and plugins to the POM (currently gradle does not have this plugin yet, so it can only be a maven project):
<dependencies> <dependency> <groupId>io.gatling.highcharts</groupId> <artifactId>gatling-charts-highcharts</artifactId> <version>2.3.0</version> <scope>test</scope> </dependency> </dependencies> <build> <plugins> <plugin> <groupId>io.gatling</groupId> <artifactId>gatling-maven-plugin</artifactId> <version>2.2.4</version> </plugin> </plugins> </build>
src/test
under Create a test class, Gatling uses the Scala language to write test classes:
import io.gatling.core.scenario.Simulation import io.gatling.core.Predef._ import io.gatling.http.Predef._ import scala.concurrent.duration._ class LoadSimulation extends Simulation { // 从系统变量读取 baseUrl、path和模拟的用户数 val baseUrl = System.getProperty("base.url") val testPath = System.getProperty("test.path") val sim_users = System.getProperty("sim.users").toInt val httpConf = http.baseURL(baseUrl) // 定义模拟的请求,重复30次 val helloRequest = repeat(30) { // 自定义测试名称 exec(http("hello-with-latency") // 执行get请求 .get(testPath)) // 模拟用户思考时间,随机1~2秒钟 .pause(1 second, 2 seconds) } // 定义模拟的场景 val scn = scenario("hello") // 该场景执行上边定义的请求 .exec(helloRequest) // 配置并发用户的数量在30秒内均匀提高至sim_users指定的数量 setUp(scn.inject(rampUsers(sim_users).over(30 seconds)).protocols(httpConf)) }
As above, the scenario for this test is:
- The specified user volume is increased at a constant rate of 30 seconds;
- Each user repeats the request 30 times for the specified URL, with a random interval of 1-2 seconds in the middle of the thought time.
Where the URL and user volume pass, and variables are passed in, using the base.url
test.path
sim.users
maven plugin, start the test with the following command:
mvn gatling:test -Dgatling.simulationClass=test.load.sims.LoadSimulation -Dbase.url=http://localhost:8091/ -Dtest.path=hello/100 -Dsim.users=300
A test that represents a pair with a user volume of 300 http://localhost:8091/hello/100
.
3) Observe the number of threads
Before testing, we opened jconsole to observe the changes in the thread of the application (connection mvcwithlatencyapplication):
(resolution problem is not very good to display) is just starting without any requests come in, the default execution thread has 10, the total number of threads 31-33.
For example, when a test of 2,500 users is performed, the thread of execution increases to 200, the total number of threads peaks to 223, which is the increase of these 190 threads of execution. As follows:
Since the number of execution threads is randomly reduced back to 10 after the load has elapsed, it is not reliable to look at the maximum number of threads to estimate the number of threads, and we can use "peak thread count-23" to get the number of execution threads during the test.
4) Load test
First we test mvc-with-latency
:
- -dbase.url=http://localhost:8091/;
- -DTEST.PATH=HELLO/100 (delay 100ms);
- -dsim.users=1000/2000/3000/.../10000.
The test data is as follows (tomcat Max Threads 200, delay 100ms):
The above data shows that:
- The number of threads reached the default maximum of 200 when the user volume was close to 3000;
- Before the number of threads reaches 200, 95% of the request response length is normal (a little more than 100ms), followed by a straight upward trend;
- When the number of threads reaches 200, the throughput increase slows down gradually.
The reason for this is that when all available threads are blocked, subsequent requests for re-entry can only be queued, so the response time starts to rise when the maximum number of threads is reached. We take the 6000 user report as an example:
This graph is a graph of how long the response time of the request will vary, and you can see that it can be roughly divided into five segments:
- A. There are idle threads available and requests can be returned at 100ms+ time;
- B. The thread is full, and the new request starts to queue because the A and B phases are the level of user-level rise, so there are more and more queued requests;
- C. The volume of requests per second stabilizes, but the high response time is maintained for a period of time due to queuing;
- D. Requests completed by some users, the number of requests per second gradually decreased, the queuing situation gradually eased;
- E. When the user volume drops to the thread full load and the queue is digested, the request returns at normal time;
The response time for all requests is distributed as shown:
The difference between the length of the a/e segment and the C-segment is the average queue-waiting time. In the case of persistent high concurrency, most requests are in the C segment. And wait for the increase of Gizzard request volume and linear growth.
Increasing the number of threads that the servlet container handles requests can alleviate this problem by increasing the maximum number of threads from the default of 200 by 400, as above.
The number of threads up to 200 is the default setting for Tomcat, and we set it to 400 to test again. application.properties
added in:
server.tomcat.max-threads=400
The test data is as follows:
Because the number of worker threads is enlarged by one-fold, the request queuing situation is half-relieved, so you can compare the data:
- "The maximum number of threads 200 user 5000" "95% response Time" is exactly the same as "maximum number of threads 400 user 10000", I swear to God, this is absolutely true data, more coincidentally, the throughput is exactly 1:2 of the relationship! This coincidence is also because the test scenario is too simple and rude, haha;
- The slope of the "95% Response Time" curve is also twice times the relationship.
This also confirms our analysis above. Increasing the number of threads can indeed improve throughput to a certain extent, reducing the latency of response due to blocking, but at this point we need to weigh some factors:
- Adding threads is a cost, and by default the JVM allocates a 1M line stacks when creating a new thread, so more threads smell more memory;
- More threads lead to more thread context switching costs.
Let's take a look WebFlux-with-latency
at the test data for:
- There is no count of the number of threads, because the number of worker threads is always maintained at a fixed number for Webflux applications running on the Netty of asynchronous IO, usually this fixed number equals the number of CPU cores (through Jconsole you can see the known
reactor-http-nio-X
and parallel-X
Thread, I this is a quad-core eight thread of i7, so X
from 1-8), because the program logic is event-driven and does not require multithreading concurrency, because of asynchronous non-blocking conditions;
- With the increase of the number of users, the throughput tends to increase linearly.
- 95% of the responses are returned within the controllable range of the 100ms+, without delay.
It can be seen that non-blocking processing avoids thread queuing, allowing the processing of a large number of requests to be handled with a small number of fixed threads.
In addition, I one step directly test the situation of 20000 users:
- The
mvc-with-latency
test was failed due to many requests fail;
- And
WebFlux-with-latency
should be 20000 users have batting an eye heart, throughput reached 7228 req/sec (i wipe, just 10000 users under twice times, too clever today how, absolutely true data!) ), the 95% response duration is only 117ms.
Finally, two graphs of throughput and response time are given, and a more intuitive way to feel asynchronous non-blocking Webflux is how to ride the dust:
At this time, we have more understanding of Nodejs's pride, but our big Java language also has vert.x and now the spring Webflux.
In this section we perform server-side performance testing, and the next section continues to analyze the performance of the Spring Webflux Client tool WebClient, which brings a small performance boost to the MicroServices architecture system.
(6) Spring Webflux Performance Test--Response Spring's DAO wizard