Before I explained the Eureka Registration Center protection mode, because in this mode can not eliminate the failure node, so according to the original configuration in practice does not remove the total feeling is not too good, so in-depth study. Of course, here again, it is necessary for the consumer to implement the Ribbon retry mechanism, regardless of whether the instance is effectively removed.
Under the background, in the micro-service architecture, there is a cap principle (consistency, availability, reliability), three due to the existence of mutual exclusion, can only meet the second, the 3rd need to have a certain abandon. Eureka abandon strong consistency, so after entering the protection mode, the consistency of the failure node cannot be guaranteed.
The following are some of the ways I have verified that the service can be eliminated in a timely manner.
1, turn off the self-protection mode eureka.server.enable-self-preservation=false to close. However, this approach violates the Eureka cap principle, so I do not recommend this approach.
2, before the explanation, we first look at the source code.
The following code determines whether the steps to refresh the list of services are entered. The code is in Abstractinstanceregistry.java and Peerawareinstanceregistryimpl.java.
public void evict (long Additionalleasems) {Logger.debug (" Running the evict Tas K " // mostly to see the isleaseexpirationenabled return value if (! isleaseexpirationenabled ()) {logger.debug ( ds:lease expiration is currently disabled. ") return ; // }
Public Boolean isleaseexpirationenabled () { ///isselfpreservationmodeenabled is determined by whether the protected mode configuration is turned on, by default true if (! Isselfpreservationmodeenabled ()) { returntrue; } return numberofrenewsperminthreshold > 0 && getnumofrenewsinlastmin () > Numberofrenewsperminthreshold; }
Mainly see Numberofrenewsperminthreshold and Getnumofrenewsinlastmin () methods. We mainly look at where Numberofrenewsperminthreshold is initialized. The Getnumofrenewsinlastmin () method is to count the total number of heartbeat statistics in the last minute.
//Peerawareinstanceregistryimpl.java Public voidOpenfortraffic (Applicationinfomanager Applicationinfomanager,intcount) { //renewals happen every seconds and for a minute it should is a factor of 2. This. Expectednumberofrenewspermin = Count * 2; This. Numberofrenewsperminthreshold = (int) ( This. Expectednumberofrenewspermin *serverconfig.getrenewalpercentthreshold ());//....}
Count is the total number of instances registered in the registry, including the high availability of another registry. As we can see, here is a hard-coded way to initialize the expectednumberofrenewspermin, through the number of instances. Why * *, because the instance default send heartbeat time is 30s, so through the statistics of a minute class should receive the total number of heartbeats, because it is hard-coded, so it is not recommended to modify the instance-side heartbeat cycle time!. The numberofrenewsperminthreshold is multiplied by the total number of heartbeats by the percentage of allowable failures. The default is 0.85. Expectednumberofrenewspermin is refreshed every 15 minutes by default.
From the above analysis, we have a general understanding of the conditions that trigger the service culling operation without shutting down the self-protection mode. The following two kinds of schemes are obtained to eliminate the failure nodes (pro-test is effective).
- By modifying the minimum rate of heartbeat success per minute to control that the registry does not enter self-protection mode, the node is rejected in real time (it needs to wait until the instance defines the expiration time and the registry triggers the refresh cycle). Eureka.server.renewal-percent-threshold to configure the scale. For example, currently we have 4 service instances, calculated that there is a 4*2=8 heartbeat per minute, when the need to implement an instance unexpectedly dropped, then every minute actually received a heartbeat of 6, 8*x<6 will not enter the protection mode, then x< 0.75, CONFIGURED to 0.75 below can be roughly implemented culling operation, but due to network instability factors, there is a heartbeat anomaly, so the value as far as possible to set less than 0.75多 points, such as 0.5. Why the default will be 0.85, I think there should be an algorithm support, but from the results, the protection mode is suitable for large-scale service cluster, when one or a number of hanging off, will not enter the protection mode, the default scale, you can achieve the elimination of instances. If the cluster is small, such as 2 units, hanging off one will enter the protection mode, and the protection mode is not significant, you can turn off the self-protection mode.
- Change the heartbeat time of the instance to a smaller period. Due to the need to determine the actual number of heartbeats per minute > Total number of instances *2*0.85, will not enter the self-protection mode, so set the heartbeat period is small, so that the actual number of heartbeats more than the proportion of the heartbeat, can achieve the timely elimination of instances. instance of the heartbeat cycle settings: Eureka.instance.lease-renewal-interval-in-seconds, such as set to 5, and so on, according to the above algorithm to calculate the appropriate value.
Spring Cloud Eureka's self-protection model and example offline culling