This series of articles index the "Response Spring's word Wizard"
Previously summary Spring Webflux Quick Start | Spring Webflux Performance Test
1.4.3 Netflix's asynchronous case
The first two sections, through Gatling and simple examples, have seen the performance strength of the spring Webflux server and client, and on this basis, it is not difficult to understand the case below.
Netflix is the United States streaming media giant, the world's largest toll video site, Obama also chase, Xi also mention the "card House" is its home-made drama, a few months ago also bought a domestic drama "White Nights Chasing the murderer" of the right to play. But it is estimated that friends who use spring cloud may be more familiar with the word Netflix than those who like to play, because Netflix is almost open to all of the company's micro-service architecture technology stack, and Spring cloud is the perfect integration of Netflix's open source architecture. This architecture has proven to be reliable in a number of years of production environmental testing in Netflix's massively distributed microservices environment.
By 2016, Netflix already had 8300+ subscribers, playing 120 million hours a day, 1/3 of the peak downloads of North America's Internet. To support massive user access, Netflix has been determined to evolve from 2009 to a cloud-native microservices ecosystem, completing its overall application in 2016, migrating to the cloud with 500 + microservices, and Netflix's traffic has increased more than 1000 times in the 7-year evolution of the system. It's a rocket-engine change.
Here's how this cult-based MicroServices architecture practitioner uses asynchronous nonblocking technology to improve the performance of its API gateways (see Zuul 2:the Netflix Journey to asynchronous, Non-blocking Systems).
In a microservices architecture, a component such as a service API gateway is often not available. A friend of spring cloud should be familiar with the Zuul component. Zuul is an API gateway server for Netflix Open source, providing dynamic routing, monitoring, resiliency, security and other edge services on the cloud platform, equivalent to the "front office" of all service APIs.
If you don't know, it doesn't matter, first introduce a Zuul basic function--route:
, as the "foreground" of the Zuul can be in accordance with certain rules to the arrival of the request to the specific services, only for this function, Zuul usually carry large traffic, and similar to the example of the second Test above-the business logic is very small, its response time is mainly determined by the length of the response of the routed service.
Before 2016, Netflix's Zuul 1 was essentially a web servlet application, so it was multithreaded and blocked, meaning that each HTTP connection would be handled with a separate thread. In Zuul 1, for IO operations, a worker thread is executed, and the IO operation in the worker thread notifies the thread that processed the request before the operation completes and the latter is blocked.
This approach is similar to the last Test in the previous test project restTemplate-as-caller
, except that Zuul 1 is not a reactor Scheduler
, but may be a ExecutorService
thread pool implemented in a similar way.
Netflix's services are hosted on AWS, which normally works well in multi-core AWS Cloud instances, but can have a ripple effect if there is a problem with the backend service.
Recall the restTemplate-as-caller
result of the last Test, the delay of service B is 100ms, when the concurrent user reaches 6000, service A's 95% response time will reach 236ms, if there is a service C depends on service A, the response time is longer, In addition, the user may be impatient to refresh the page frequently, the situation will be worse. At this point, if you look at the load on the CPU, it may not be high, because almost all requests are blocked and queued.
This is a vicious cycle. In a distributed system, where many dependencies inevitably have similar dilemmas and even call failures, Netflix has developed the Hystrix component (also integrated in spring Cloud) to address this problem by providing fuses, quarantines, Fallback, caches, Monitoring and other functions to ensure that the system is still available when one or more dependencies occur, such as timeouts, anomalies, and other issues.
But Hystrix is not a fundamental solution, the root of the problem is synchronous blocking of service invocations. So Netflix developed the Zuul 2, which is based on Netty, which processes requests in an asynchronous, non-blocking manner, one CPU core concentrating on one thread, and the lifetime of each request exists in the event loop and callback. As shown in the following:
From the point of view of resource cost, the cost of the file descriptor and callback listener can be avoided because there is no separate thread for each request, which avoids the waste of resources caused by CPU thread switching and large amount of thread stack memory, so the cost of HTTP connection is significantly reduced. In addition, because it works on one thread, the CPU can make better use of the level one or two CPU cache, in addition to running around hundreds of threads, to further improve performance.
Zuul 2 was not disappointed by Netflix, and it easily took care of the persistent connection (persistent connection) of 83 million users (multiple devices or browsers per user).
Unlike compute-intensive applications, Web applications are often highly concurrent and I/O intensive, especially in applications with microservices architectures, where CPU execution times are typically much shorter relative to blocking times, and the more so, asynchronous non-blocking can play a significant performance boost. As you can see from this case, replacing blocking and multithreading in an asynchronous, non-blocking way is an effective way to improve performance.
(8) Netflix's asynchronous transformation of API gateways--response spring's word wizard