1. Circuit Breaker
In the microservices architecture, there are multiple microservices, there may be dependencies between each other, when a unit fails or the network is not available, it will cause the failure of the dependency of the spread, resulting in the overall system paralysis, relative to the traditional architecture more unstable. In order to solve this problem, the circuit breaker mode is produced.
Circuit breaker itself is a switch device, used to protect the circuit overload, when the circuit has a short circuit, "circuit breaker" can promptly cut off the fault power, to prevent the occurrence of overload, heat and even fire serious consequences.
In the distributed architecture, the function of the circuit breaker mode is similar, when a micro-service failure, through the fault monitoring of the circuit breaker, return an error response to the caller, rather than a long wait, so that the thread will not cause the call failure service is prolonged occupation is not released, to avoid the spread of the failure in the distributed system.
Netflix Hystrix
The hystrix is used in spring cloud to realize the function of the circuit breaker. Hystrix is one of Netflix's distributed suites that aim to provide greater fault tolerance for latency and failure by controlling the nodes that access remote systems, services, and third-party libraries. The Hystrix features thread and signal isolation with fallback mechanism and breaker functionality, request caching and request packaging, and monitoring and configuration.
In the previous article, we have created a service registry, two service providers, and two service consumers.
Circuit Breaker Working principle
The service downgrade logic on the server is triggered by the Hystrix command invocation dependent service timeout, which means that the call service timeout enters the break-through callback logic processing. However, even if this is limited by the Hystrix timeout, the call is still likely to generate a heap.
This time the breaker will work. Here are three important parameters of the circuit breaker:
Snapshot Time Window
The circuit breaker determines whether to open the need to count some requests and error data, and the time range of the statistics is the snapshot time window, the default is the last 10 seconds.
Minimum number of requests
Within the snapshot time window, the minimum total number of requests must be met to qualify for the fuse. The default is 20, which means that in 10 seconds, if the number of calls to the Hystrix command is less than 20, the breaker will not open even if all requests have timed out or otherwise failed.
You are welcome to study the relevant technology to understand the source of friends directly seeking exchange and sharing technology: 2147775633
Error percent lower limit
When the total number of requests exceeds the lower limit in the snapshot time window, such as 30 calls, if there are 16 time-out exceptions in these 30 calls, that is, exceeding the 50% error percentage, the circuit breaker will open at the default setting of 50% lower limit.
Therefore, the condition of the circuit breaker is open: Within the time snapshot window period (by default, 10s), at least 20 service calls occur, and the service invocation error rate is more than 50%.
If the circuit breaker does not open when the condition is not met, the service invocation error only triggers the service demotion, which is called the fallback function, and each request time delay is approximate hystrix time-out. If you set the time-out to 5 seconds, each request will be delayed by 5 per second. When the circuit breaker finds that the total number of requests exceeds 20 and the error rate exceeds 50% in 10 seconds, the circuit breaker will open. After the request is called again, the main logic is not invoked, but the downgrade logic is called directly, and this time it does not wait 5 seconds before returning fallback. The circuit breaker realizes auto-discovery errors and switches the downgrade logic to primary logic, reducing the effect of response delay.
After the breaker is opened, the processing logic does not end, when the downgrade logic has been switched to the primary logic, then how to restore the original master logic? In fact, Hystrix also achieved this point: when the circuit breaker opens, the main logic of the fuse, Hystrix will start a sleep time window, in this time window, the demotion logic is a temporary main logic, when the sleep time window expires, the circuit breaker will enter the semi-open state, release a request to the original main logic, If the request returns normally, then the circuit breaker will be closed, the main logic recovery, if the request is still a problem, the circuit breaker continues to enter the open state, Sleep time window re-timing.
In other words, the circuit breaker retries once every once in a while to see if the original primary logic is available, is turned off, and continues to open if it is not available.
Through the above mechanism, the Hystrix circuit breaker realizes the processing of dependent resource failure, automatically switches the downgrade strategy and recovers the main logic automatically. This enables our microservices to be very well protected when relying on external services or resources, while automating switchover and recovery for some business requirements with degraded logic, which is smarter and more efficient than the traditional way of setting switches from monitoring and operation to switching.
Springcloud Base Circuit Breaker