Raft Why is it easier to understand the distributed consistency algorithm--(1) leader at the time, by leader to follower synchronization log (2) leader Hang up, choose a new leader,leader election algorithm.

Source: Internet
Author: User
Tags ack

The easy-to-understand description of the Raft protocol

Although Raft's thesis is easier to read than the Paxos simple version, the paper still radiates more and is relatively lengthy. After reading after the volume of meditation think or tidy up to be more secure, become really belong to their own. This is where I use the first lazi of black and white chess to describe and validate the work of the Raft protocol under the proof of concept.

There are three types of roles in a cluster organized by the Raft protocol:

    1. Leader (leader)
    2. Follower (Mass)
    3. Candidate (candidate)

Like a democratic society, leaders are elected by popular vote. At first there was no leader, all the participants in the cluster were the masses, and then a general election was launched, and all the masses were able to run during the election, when the role of all the masses became candidates, and the democratically elected leaders began the term of this leader, and then the election ended, All candidates except the leader are returned to the mass role to obey the leadership of the leader. A concept "term of office" is mentioned here, expressed in terms of term. The core concepts and terminology about the Raft agreement are so much and very well matched to the reality of democracy that it is easy to understand. The changes to the three roles are as follows, which are easy to understand in the context of the electoral process.

Leader election process

Under minimalist thinking, a minimal Raft Democratic cluster would require three participants (e.g. A, B, C), so that a majority of votes could be cast. The initial state of ABC is Follower, and then there are three possible scenarios when an election is initiated. The first two can be selected Leader, the third is that the ballot is invalid (Split votes), each party has voted for themselves, and no one won the majority of votes. Each participant then randomly took a break (election Timeout) to re-sponsor the poll until one of the parties received a majority vote. The key here is the random timeout, the first to resume voting from timeout in the direction of the other two parties in the timeout request to vote, then they can only vote for each other, and soon agreed.

After electing Leader, Leader maintains its rule by sending heartbeat information to all Follower on a regular basis. If Follower has not received Leader's heartbeat for some time, it is thought that Leader may have been hung up again to initiate the main selection process.

The effect of Leader node on consistency

The Raft protocol strongly relies on the availability of LEADER nodes to ensure consistency of cluster data. The flow of data can only be transferred from the Leader node to the Follower node. When the Client submits data to the cluster Leader node, the data received by the Leader node is in an uncommitted state (uncommitted), and then the Leader node replicates the data to all Follower nodes and waits for the response to be received. Make sure that at least half of the nodes in the cluster have received data before confirming to the Client that the data has been received. Once the data has been sent to the Client to receive an ACK response, indicating that the data state enters committed (Committed), the Leader node sends a notification to the Follower node informing the data that the state has been committed.

In this process, the master node may be hung up at any stage to see how the Raft protocol guarantees data consistency for different stages.

1. Before the data reaches the Leader node

This stage Leader hanging out does not affect consistency, not much to say.

2. Data arrives at Leader node but not replicated to Follower node

This phase Leader hangs, the data belongs to the uncommitted state, and the Client does not receive an ACK to consider the timeout failure to safely initiate the retry. Follower node does not have this data, re-select the primary after the Client retry resubmit can be successful. The original Leader node is restored as Follower joins the cluster to re-synchronize the data from the new Leader of the current term, forcing the consistency of the Leader data.

3. The data reaches the Leader node and is successfully replicated to all Follower nodes, but has not yet received the Leader response

This stage Leader hangs, although the data in the Follower node in the uncommitted state (uncommitted) but consistent, re-elected Leader can complete the data submission, the Client because I do not know the success of the submission is not, you can retry the submission. In this case, Raft requires the RPC request to achieve idempotent, that is, to implement the internal de-heavy mechanism.

4. The data reaches the Leader node and is successfully copied to the Follower partial node, but has not yet received the Leader response

This phase Leader hangs, the data in the Follower node is in the uncommitted state (uncommitted) and inconsistent, the Raft protocol requires that the vote can only be cast to the node with the latest data. So the node with the latest data will be selected as Leader and then forced to synchronize the data to Follower, the data will not be lost and eventually consistent.

5. Data arrives at Leader node, successfully replicated to Follower all or most nodes, data in Leader in committed state, but Follower in uncommitted state

This stage Leader, re-elect the new Leader after the processing process and stage 31-like.

6. Data arrives at Leader node, successfully replicated to Follower all or most nodes, data is in committed state at all nodes, but not yet responding to Client

At this stage Leader hangs, Cluster internal data is in fact already consistent, Client repeated retry based on the power of the strategy for consistency without impact.

7. The brain fissure caused by the network partition, appear double Leader

The network partition separates the original Leader node from the Follower node, Follower the Leader Heartbeat will initiate an election to generate a new Leader. This creates a double Leader, the original Leader alone in a zone, submitting data to it that cannot be copied to the majority node so that the commit is never successful. The submission of data to the new Leader can be successful, and after the network recovery The old Leader discovers that the new Leader in the cluster is automatically downgraded to Follower and synchronizes the data from the new Leader to achieve cluster data consistency.

Exhaustive analysis of the minimum cluster (3 nodes) facing all the situation, it can be seen that the Raft protocol is a good response to the consistency problem, and easy to understand.

Summarize

The correctness, efficiency and simplicity of the algorithm are the main design objectives.
While these are valuable goals, these goals will not be achieved until the developer writes out a usable implementation.
So we believe that comprehensible is equally important.

Think deeply, think of Paxos algorithm is Leslie Lamport in 1990 in the public published on their website, think about when we just heard? When is there a usable implementation? And the Raft algorithm is published in 2013, we can see in the reference [5] above the number of different languages open-source implementation library, which is the importance of understanding.

Reference

[1]. LESLIE LAMPORT, ROBERT Shostak, MARSHALL Pease. The Byzantine general problem. 1982
[2]. Leslie Lamport. The part-time Parliament. 1998
[3]. Leslie Lamport. Paxos Made Simple. 2001
[4]. Diego Ongaro and John ousterhout. Raft Paper. 2013
[5]. Raft Website. The Raft Consensus algorithm
[6]. Raft Demo. Raft Animate Demo

Raft Why is it easier to understand the distributed consistency algorithm--(1) leader at the time, by leader to follower synchronization log (2) leader Hang up, choose a new leader,leader election algorithm.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.