Consistency Algorithm Quest (Extended Version) 8

Source: Internet
Author: User
Tags switches

6 Cluster Membership Changes

Up until now we have assumed, the cluster configuration (the set of servers participating in the consensus algorithm) is fixed. In practice, it'll occasionally is necessary to change the configuration, for example to replace servers when they fail Or to change the degree of replication. Although this can is done by taking the entire cluster off-line, updating configuration files, and then restarting the CLU Ster, this would leave the cluster unavailable during the changeover. In addition, if there is any manual steps, they risk operator error. In order to avoid these issues, we decided to automate configuration changes and incorporate them into the Raft consensus Algorithm.

For the configuration change mechanism to be safe, there must is no point during the transition where It is p Ossible for and leaders to is elected for the same term. Unfortunately, any approach where servers switch directly from the old configuration to the new configuration is unsafe. It isn ' t possible to atomically switch all of the servers at once, so the cluster can potentially split into the Independe NT majorities during the transition (see figure).

In order to ensure safety, the configuration changes must use a two-phase approach. There is a variety of ways to implement the both phases. For example, some systems (e.g, [+]) use the first phase to disable the old configuration so it cannot process client re Quests Then the second phase enables the new configuration. In Raft the cluster first switches to a transitional configuration we call joint consensus; Once the joint consensus has been committed, and the system then transitions to the new configuration. The joint consensus combines both the old and new configurations:

    • Log entries is replicated to all servers in both configurations.

    • Any server from either configuration may serve as leader.

    • Agreement (for elections and entry commitment) requires separate majorities from both the old and new configurations.

The joint consensus allows individual servers to transition between configurations at different times without comp Romising safety. Furthermore, joint consensus allows the cluster to continue servicing client requests throughout the configuration change.

Cluster configurations is stored and communicated using special entries in the replicated log; Figure one illustrates the configuration change process. When the leader receives a request to change the configuration from Cold to Cnew, it stores the configuration for joint Co Nsensus (cold,new in the figure) as a log entry and replicates that entry using the mechanisms described previously. Once a given server adds the new configuration entry to its log, it uses this configuration for all future decisions (a SE RVer always uses the latest configuration in its log, regardless of whether the entry is committed). This means the leader would use the rules of cold,new to determine when the log entry for Cold,new are committed. If The leader crashes, a new leader May is chosen under either Cold or Cold,new, depending on whether the winning candidat E has received cold,new. In any case, cnew cannot make unilateral decisions during this period.

Once Cold,new have been committed, neither Cold nor Cnew can make decisions without approval of the other, and the Leader Completeness property ensures this only servers with the Cold,new log entry can be elected as Leader. It's now safe for the leader to create a log entry describing cnew and replicate it to the cluster. Again, this configuration would take the effect on each server as soon as it is seen. When the new configuration had been committed under the rules of Cnew, the old configuration was irrelevant and servers not In the new configuration can is shut down. As shown in figure one, there is no time if Cold and cnew can both make unilateral decisions; This guarantees safety.

There is three more issues to the address for reconfiguration. The first issue is this new servers may not be initially store any log entries. If they is added to the cluster in this state, it could take quite a while for them to catch up, during which time it MIG HT not is possible to commit new log entries. In order to avoid availability gaps, Raft introduces a additional phase before the configuration change, in which the new Servers join the cluster as non-voting members (the leader replicates log entries to them, but they is not considered fo R majorities). Once the new servers has caught up with the rest of the cluster, the reconfiguration can proceed as described above.  

The second issue is, the cluster leader may not Be part of the new configuration. The leader steps down (returns to follower state) once it has committed the Cnew log entry. This means that there'll be a period of time (while it's committing cnew) when the leader was managing a cluster that does Es not include itself; It replicates log entries but does not count itself in majorities. The leader transition occurs when Cnew is committed because the new configuration can operate Independently (it always is possible to choose a leader from cnew). Before this point, the it may is the case, which is a server from Cold can is elected leader.

The third issue is a removed servers (those not in cnew) can disrupt the cluster. These servers won't receive heartbeats, so they would time out and start new elections. They would then send requestvote RPCs with new term numbers, and this would cause the current leader to revert to follower s Tate. A new leader would eventually be elected, but the removed servers would time out again and the process would repeat, Resultin G in poor availability. 

To prevent the problem, servers disregard requestvote RPCs when they believe a current leader exists. Specifically, if a server receives a requestvote RPC within the minimum election timeout of hearing from a current leader, It does not update its term or grant it vote. This does isn't affect normal elections, where each server waits at least a minimum election timeout before starting an elec tion. However, it helps avoid disruptions from removed servers:if a leader are able to get heartbeats to their cluster, then it WI ll is deposed by larger term numbers.

6 changes in the membership of the cluster

So far, we have assumed that the cluster configuration (a group of servers participating in the consistency algorithm) is fixed. In fact, it is occasionally necessary to change the configuration, for example, when the server hangs and then changes the extent of replication to replace them. While this can be done by offline the entire cluster, updating the configuration file, and then restarting the cluster, this will cause the system to be unavailable during the cluster conversion. In addition, the risk of error operations in any step increases greatly.

To ensure that the configuration change mechanism is secure, it makes no sense if it could lead to two leader in the same term during the conversion. Unfortunately, it is not safe for any server to switch directly from the old configuration to the newly configured policy. It is not possible for a one-time atom to switch all servers, so the cluster can be divided into two separate sections during the switchover (see Figure 10).

To ensure security, the configuration switch must take a two-phase strategy. There are many ways to achieve phase two. For example, some systems (such as [22]) use a one-phase disable legacy configuration, so it cannot handle requests from clients, and then the two phase takes on a new configuration. In raft, the cluster first switches to the transition configuration we call joint consensus, and once the joint consensus commit is complete, the system transitions to the new configuration. The joint consensus combines the old and the new configuration:

    • The logs are copied to all servers under both configurations.

    • Any server under any configuration can act as a leader.

    • The Protocol (election and entry submission) requires the separation of most servers regardless of the old or new configuration.

Joint consensus run individual servers to switch configurations at different times without compromising security. Additionally, joint consensus runs the cluster during a configuration change to continue the service client request.

The cluster configuration uses special replicated log entries for storage and communication; Figure 11 shows the process of configuring updates. When leader receives a request to configure the switch from cold to cnew, it stores the configuration of the joint consensus as an entry (cold,new in the diagram) and replicates it using the mechanism described earlier. Once a given server adds a new configuration entry to its log, it uses the configuration of future decisions (the server always uses the latest configuration in its logs, regardless of whether the entry is committed). This means that leader will use cold,new rules to determine when Cold,new log entries are committed. If leader crashes, a new leader will likely be elected under Cold or cold,new , depending on whether the winning candidate received Cold,new. In any case, cnew cannot make a unilateral decision during this period.

Once Cold,new is submitted, neither cold nor cnew can make a decision without the consent of the other party, and the Leader completeness property ensures that only the server log entries under Cold,new are elected as Leader. It is now safe to create a new log entry for leader to describe the cnew and copy it to the cluster. Once again, the configuration will be able to play what you see as the result of the gain. When the new configuration is committed under the Cnew rule, the cold configuration does not matter, and services that are not under the new configuration can be shut down. 11, there is no time for cold and cnew simultaneous unilateral decisions, which guarantees safety.

There are 3 more issues that need to be addressed to reconfigure. The first problem is that the new server may not have any log entries stored at the beginning. If they are in this state of access to the cluster, it may take a while to catch up with the progress, during which time it may be unable to submit new log entries. To avoid the possibility gap, raft introduced an additional stage before the configuration update, and the new server joined the cluster as a non-voter (leader copied the log entries to them, but they did not have the right to vote). Once the new server catches up with the other server's progress, the reconfiguration can be done as shown above.

The second problem is that the leader of the cluster may not be the newly configured part. Leader step down once the Cnew log entry is submitted (return to the follower state). This means that there is a period of time (when it commits the Cnew entry) when leader does not include itself when managing the cluster, it replicates the log entries, but does not calculate itself to most. Leader only transitions when Cnew is committed (always selecting leader from Cnew) because the new configuration can be manipulated independently. Before that, there may be only one cold server that can be elected leader.

The third problem is that removing servers (those not in cnew) can disrupt the cluster. These servers will not receive a heartbeat and they will time out and start a new election. They will send the new term value of the Requestvote RPC, which will cause the incumbent leader to revert to the follower state. A new leader is elected, but the removal of the server will time out again, and the process will repeat again, resulting in a poor usability.



Consistency Algorithm Quest (Extended Version) 8

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.