Netflix officially open source its API Gateway Zuul 2

Source: Internet
Author: User

Tag: Service component Image error introducing logical Service Set concurrent host schedule

On May 21, Netflix announced the official open source MicroServices Gateway component Zuul 2 on its official blog. Netflix is a model for the MicroServices community, with successful applications for large-scale production-grade microservices, as well as a significant number of micro-service components (see GitHub homepage), highly recognised by industry peers. Zuul is a gateway component of Netflix's open source on June 12, 2013, and currently has more than 4,000 concerns on GitHub, including Riot, Ctrip, Pat-on-loan and other companies that have been used in production environments.

Zuul in English is a monster, StarCraft in the Zerg also have Zuul,netflix for the gateway named Zuul, meaning see God-keeper beast. For about 2013 years, InfoQ had an interview with former Netflix architect director Adrian Cockcroft, which asked Adrian: "Netflix open source so many projects, which one you think is the most indispensable (most indispensable)" , Adrian replied: "In Netflixoss open source project, there is an easy to ignore, but one of Netflix's most powerful basic services, it is Zuul Gateway services. The Zuul gateway is primarily used for intelligent routing, while also supporting authentication, zone and content-aware routing, aggregating multiple underlying services into a unified external API. One of the highlights of the Zuul gateway is dynamic programmability, which can be configured to take effect in seconds. " From Adrian's answer, we can feel the importance of the Zuul gateway to the microservices infrastructure.

Because Zuul open source time Earlier, there are some problems in the architecture, so in September 2016, Netflix announced that they will adjust the Zuul structure. The Zuul originally adopted the synchronous blocking architecture, which is called Zuul 2 and adopts asynchronous non-blocking architecture. The main difference in architecture between Zuul 2 and Zuul 1 is that Zuul 2 runs on an asynchronous, non-blocking framework, such as Netty. Zuul 1 relies on multithreading to support the growth of throughput, while Zuul 2 uses a Netty framework that relies on event loops and callback functions.

The following is an introduction to Zuul 2 from the official Netflix blog for readers ' reference.

The Netflix Cloud Gateway team runs and maintains more than 80 Zuul 2 clusters, distributing traffic to approximately 100 (and growing) backend service cluster with more than 1 million requests per second. Almost all of this traffic comes from client devices and browsers that have the familiar discovery and playback experience enabled.

This article details some of the interesting features of Zuul 2 that Netflix released today, and discusses some of the other projects we are building with Zuul 2.

How the Zuul 2 works

The following is a general architectural diagram of Zuul 2:

The Netty event Processor (handler) of the front and back of the filter is primarily responsible for handling network protocols, WEB servers, connection management, and proxy work. After the internal work is abstracted, all the major work is given to the filter to complete. The inbound filter runs before the proxy request and can be used for authentication, routing, or decorating requests. The endpoint filter can be used to return a static response, or to proxy a request to a backend service. The outbound filter runs after the response is returned, and can be used for things such as compression (gzipping), metrics, or adding or deleting custom request headers.

The functionality of Zuul is almost entirely dependent on the logic of each filter. This means that it can be deployed in a variety of contexts, using the configured and running filters to solve different problems.

We use Zuul at the entrance of all external traffic into the Netflix cloud service, and we also start using it to route internal traffic. When Zuul is used as an external traffic gateway and an internal traffic gateway, the core architecture of Zuul is the same, but the filter that implements the function is much less when used as an internal traffic gateway.

Official Open source

The Zuul code that runs today is the most stable and resilient version of Zuul. After several stages of code base evolution and refactoring, we are delighted to share it with you.

Today we will release many core features. Here are some of the most exciting:

Server protocol

    • http/2--Full Inbound (inbound) HTTP/2 Connection server support

    • Bidirectional TLS (Mutual TLS)--supports running in a more secure scenario Zuul

Elastic properties

    • Adaptive retry--netflix Core retry logic for enhanced resiliency and availability

    • Source concurrency protection-configurable concurrency limits to avoid source overloading, isolating individual sources behind Zuul

Operational features

    • Request passport--to track all life-cycle events for each request, which is useful for debugging asynchronous requests

    • Status classification--possible state enumeration of request success and failure, finer than HTTP status code

    • Request to try-track each agent's attempts and status, especially useful for debugging retries and routing

We are also studying some of the upcoming features, including:

    • websocket/sse--Support channel push notifications

    • Current limit and speed limit-prevent malicious client connections and requests, and help protect against large-scale attacks

    • Power-Down filters disable some CPU-intensive features when--zuul overload

    • Configurable routing-file-based routing configuration without the need to create a route filter in Zuul

Zuul 2 application in Netflix

At Netflix, we've been working on several key features, but not yet open source. Each one deserves a special blog post, but we're just going to give you a brief introduction.

Self-service routing

The most widely used feature of our partners is self-service routing. We provide users with applications and APIs to create routing rules based on request URLs, paths, query parameters, or any condition in the request header. We then publish these routing rules to all Zuul instances.

The primary use case is to route traffic to a specific test or staging cluster. However, there are many use cases for actual production traffic. For example:

    • Services that need to split traffic create routing rules that map certain paths or prefixes to different sources

    • Developers go online with new services by creating new host names to map routes to new sources

    • Developers run load tests, route a certain percentage of existing traffic to a small cluster, and ensure that the application gracefully services degraded under load

    • By progressively creating rules that map traffic, one path at a time, the team that reconstructs the application can gradually migrate to the new source

    • The team tested the changes by sending a small amount of traffic to the instrumented cluster running the new version (Canary test)

    • If team test changes require multiple successive requests for new versions, they will run the Sticky Canary test, routing the same batch of users to the new version in a short period of time

    • The security team creates a rule based on a path or request header, rejecting "malicious" requests in all Zuul clusters

As you can see, we use self-service routing extensively, and are increasing the customization and scope of routing to support more use cases.

Elastic load Balancing

The other key feature we've been working on is to make load balancing smarter. We are able to bypass the failures, slowness, GC problems, and various other problems that often occur when running a large number of nodes. The goal of this feature is to improve the resiliency, availability, and quality of service for all Netflix services.

Here are a few of the cases we've dealt with:

Cold Example

When new source instances start, we reduce their traffic over time until they become hot. We have observed this problem in applications that have large code libraries and use huge metadata spaces. These applications take a significant amount of time to interpret (JIT) the code and are ready to handle a large amount of traffic.

If you happen to hit a cooling instance that affects speed, we will usually also bias the traffic to an older instance, and we can always retry the hot instance. This gives us an order of magnitude improvement in usability.

High error rate

Errors always occur for a variety of reasons, whether due to errors in code, instances of errors, or invalid configuration properties. Fortunately, as a proxy, we can reliably detect errors-whether it's a 5xx error or a service connection problem.

We track the error rate for each source, and if the error rate is high, this means that the entire service has a problem. We limit the number of retries for the device and disable internal retries for service recovery. In addition, we track successive failures of each instance and blacklist failed instances for a period of time.

Overload instances

With this approach, we send less traffic to the servers in the cluster that limit or deny connections, and mitigate the impact by retrying these failed requests on other servers.

We are now introducing an additional approach that aims to avoid overloading the server at the outset. This is achieved by sending the source to Zuul to send their current utilization, Zuul then uses the utilization as a factor in its load-balancing selection-reducing error rates, retries, and delays.

The source adds a request header for all responses, indicating its percent utilization, as well as the target utilization of the entire cluster that is expected. Calculating percent utilization is entirely up to each application, and engineers can use any metric that best suits them. This can provide a general solution compared to the general approach we present.

With this feature, we assign a score (a combination of instance utilization and other factors) to each instance and perform a two-choice load-balancing selection.

Exception detection and Context warning

As we develop from a handful of sources to anyone who can quickly launch a container cluster and deploy it behind Zuul, we find it necessary to automatically detect and determine the source's failure.

Thanks to the Mantis real-time event stream, we built an anomaly detector that summarizes the error rate for each service and notifies us in real time when there is a problem with the service. It creates a timesheet for all problematic sources based on all exceptions within a given time window. We then create an alert e-mail message that contains the context, which contains the event schedule and the affected service. This enables OPS to quickly correlate these events and clarify ideas, debug specific applications or features, and ultimately find the root cause.

In fact, sending notifications is very useful to the source team itself. In addition to Zuul, we have added more in-house applications to build more extensive event timelines. This provides significant help in the production of accidents, helping operations personnel to quickly identify and resolve problems before a serious disruption occurs.

Original address: Https://medium.com/netflix-techblog/open-sourcing-zuul-2-82ea476cb2b3

Netflix officially open source its API Gateway Zuul 2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.