Use Netflix Hystrix to write resilient, fault-tolerant applications

Source: Internet
Author: User
Tags time interval

Elasticity refers to the ability to provide and maintain an acceptable level of service and normal operation in the face of various failures and challenges in a complex network environment.
-From Wikipedia

Since long-term services and recent micro-services have been well known and used, many application developers have transformed the monolithic API into simple, functionally single micro services. However, such conversions result in additional wastage in order to ensure consistent response time and elasticity, and when dependencies become unusable. For example, a single Web application that performs a retry is, to some extent, resilient, because it can be recovered when certain dependencies, such as a database or other service, are not available. This recovery capability does not have any additional network loss or complexity of code.

For a service that needs to be choreographed, each invocation is expensive, and failure can lead to a reduction in the user experience, especially when trying to recover from a failure. will create a greater pressure breaker mode for back-end services

Consider a typical use case: an ecommerce website, in black Friday when the server overload, due to the pressure is too large, the supplier of payment system offline for a few seconds. Because of high concurrency requests, users begin to see a long, unresponsive checkout. These conditions also cause all application servers to be blocked, and these blocking threads are waiting to receive a response from the vendor. After a long wait time, the end result is failure.

These events resulted in an invalid shopping cart, the user tried to update or reset the order, further expanding the application server load, the application server has accumulated a large number of waiting threads, resulting in network congestion.

Circuit Breaker is a simple design structure, it is always vigilant to monitor the failure. In these cases, when the breaker discovers that a long wait time occurs when the supplier interface is invoked, the fail fast policy is used to return an error response to the user instead of keeping the thread waiting for a long time. Therefore, the circuit breaker can prevent users from waiting too long.

The basic idea behind the circuit breaker is very simple. In a circuit breaker object, a protected function call is included and is monitored by the circuit breaker object. Once the failure occurs and a certain threshold is reached, the circuit breaker trips, and the rest of the calls in the circuit breaker will return an error instead of continuing all the steps. Usually, if the circuit breaker is tripped, you need further monitoring and alarms.
–martin Fowler

Recovery time is critical to underlying resources, with a fast-fail circuit breaker to protect overloaded systems and enable downstream services to recover quickly.

Circuit breaker is always active in the system, always monitoring the system's dependent calls. To prevent a high failure rate, a circuit breaker can stop the proliferation of failed calls in a very short time, rather than simply returning a standard error. ebay and circuit breaker

Early on, we used a simple setup scheme called Auto_mark_down to prevent lengthy dependency calls waiting for problems. By shorting the failed calls until they are restored through the mark_up tag. The automated inspection system periodically checks each machine for each dependent Auto_mark_down state and executes mark_up.

However, automated inspection systems and MARK_UP facilities are not embedded into the application system, but are located outside. Because there is no continuous feedback on the amount of requests and failure rates, there is a system dependency exception that is marked as MARK_UP without verification. Relying on this setting also leads to false positives, which cannot be evaluated because the automatic checking system is outside the client.

Another major flaw in the design is that there is no way to make comprehensive, real-time monitoring of all application dependencies. This old system is slow and unstable, without constant telemetry, blindly labeling all the auto_mark_down of the system, assuming that the application's dependencies will cause further failure. The results are unpredictable and difficult to assess correctly. recovery of Circuit breaker

The circuit breaker needs to pay attention to the trip dependent service at the right time. A more complex system requires constant vigilance to determine whether dependent calls are available and, if there is no problem, let the dependency call continue.

This behavior can be achieved in 2 ways:
1. Allow all calls to execute, execute at a normal time interval, and check for errors.
2. Allow a single call to execute, more frequent speed to measure usability.

Auto_mark_down is the first way in which the circuit is turned off without any recovery and relies on error identification problems.

The second approach is a more complex mechanism because it does not allow multiple invocations to execute at the same time, because the call may take a long time to execute but still fail. However, only a single call execution is required to ensure faster execution, which enables the recovery of the system circuit and faster convergence. the ideal circuit breaker

A harmonious system that should have an ideal circuit breaker, real-time monitoring, and the ability to quickly restore the fault, so that the application can achieve true flexibility and fault tolerance.

Circuit Breaker + real-time monitoring + recovery = resilient and fault-tolerant
– Anonymous

Using the above E-commerce Web site for example, in an elastic system, the circuit breaker continues to evaluate the system, and in the event of a failure of the payment processor, it is found to be a long waiting time caused by the supplier. In this case, it breaks the circuit and quickly fails. As a result, the user is informed of the system failure and the vendor has enough time to recover.

At the same time, the circuit breaker is constantly sending a request to confirm that the supplier system is recovering. If so, the circuit breaker will be circuit, allowing the rest of the calls to perform properly, effectively eliminating network congestion and prolonged waiting problems. Netflix Hystrix

Hystrix is a library that provides greater fault tolerance for delays and failures by isolating access to the nodes of remote systems, services, and third-party libraries, preventing cascading failures, and making complex distributed systems more resilient.
–netflix

Since its inception in 2012, Hystrix has become a solution to many attempts to improve the processing power and resolve problems of the system. It has a fairly mature interface and a highly tunable configuration system that enables application developers to provide the best service-dependent invocation. state of Hystrix Circuit breaker

The following state diagram describes the operation of resilient and fault-tolerant systems in different states in the life cycle of the circuit breaker.

normal operation (Closed)

When a system runs smoothly, the success status counter is used to measure the stability of the elastic system, and the fault table is used to track any failures. The design ensures that when a failure threshold is reached, the breaker disconnects the circuit to prevent further resource requests. failure Status (Open)

At this moment, each dependent call is shorted, and the hystrixruntimeexception exception is thrown, accompanied by the Shortcircuit failure type, giving an unusually clear reason. Once the wait time expires, the Hystrix circuit breaker moves to a half-open state. semi-open state

In this state, the Hystrix is responsible for sending the first request, checking the availability of the system, and letting other requests fail quickly until a dependent response is received. If the call is successful, the breaker is reset to the closed state, and if a failure occurs, the system returns to the open state, and the entire process continues to loop. How to use Hystrix

Hystrix GitHub has a comprehensive document that describes how to use Hystrix. This is simple, just use the Hystrix library to create the class and invoke the service.

public class Commandhelloworld extends hystrixcommand<string> {

        private final String name;

        Public Commandhelloworld (String name) {
            Super (HystrixCommandGroupKey.Factory.asKey ("Examplegroup"));
            this.name = name;

        @Override
        protected String Run () {
            //A real example would does work like a network called here return
            "Hello" + name + "!";
        }
    }

reference:https://github.com/netflix/hystrix/wiki/getting-started

Internally, this class uses the Rxjava library to perform invocation of service dependencies asynchronously. This design uses the thread of the application, maximizes program performance, and intelligently manages service resource calls. For application developers who perform parallel processing management dependencies using deferred calls, Hystrix also provides future how ebay uses Hystrix

On ebay, many applications have started using Hystrix, either as a stand-alone library or using our platform packaging. Our platform packaging version, through the JMX beans way to expose hystrix configuration, convenient centralized management. For critical systems, our packaged version also injects custom Hystric plug-ins to capture the metrics that is released in real time and feeds to our monitoring system.

Hystric's dashboard as part of the core server monitoring system enables teams to view the dependencies of their applications at different times.

The execution hook provided by Hystrix is a key component of system integration because it facilitates real-time monitoring/alerting, especially errors and rollback failures, thus helping us to investigate and solve problems more quickly, with little impact on the user. Examples of ebay use: Secure Token service

Ebay has a range of internal and external API services. All these services are authenticated by token, and security token service is the issuer and authentication of the token. All token services are now upgraded and used based on the Hytrix circuit breaker, which makes the security token service highly available. When a service is busy, the circuit breaker on the service is turned on, which does not stress the token service and allows other services to function properly.

A circuit breaker is a feature provided by default in the Hystrix library. The function of the circuit breaker can be summarized as follows:
1. The circuit breaker verifies all the state of the call.
2. The closed state of the circuit allows the request to pass.
3. An open state failed all requests.
4. A half-open state (when the sleep Wait time completes), allowing a request to pass, and converting to a closed or open state on success or failure.

Summary

Hystrix is not only a circuit breaker, but also a complete library with extensive monitoring capabilities that can be easily implanted into existing systems. We've started exploring for future usage, using the library's request to crash and request caching functionality. Of course, there are other Java implementations, such as Akka and Spring breakers, however, according to our resilient environment, the Hystrix has proven to be a mature library that provides high availability for any period of time. References http://wiki.ittc.ku.edu/resilinets_wiki/index.php/Definitions#Resilience Martin Fowler hystrix Github http://techblog.netflix.com/2012/11/hystrix.html https://github.com/Netflix/Hystrix/wiki http:// Techblog.netflix.com/2011/12/making-netflix-api-more-resilient.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.