Original: http://daizuozhuo.github.io/consensus-algorithm/
The raft protocol is indeed too much understood than the Paxos protocol.
Consistency issues
The consistency algorithm is used to solve the consistency problem, so what is the consistency problem? In a distributed system, the consistency problem (consensus problem) is that for a set of servers, given a set of operations, we need a protocol to make the final result agree. A more detailed explanation is that when one of the servers receives a set of instructions from the client, it must communicate with other servers to ensure that all servers receive the same instruction in the same order, so that all servers produce consistent results that look like a single machine.
The consistency algorithm in real production requires the following properties:
- Safety: It doesn't return the wrong result anyway
- Available: As long as most of the machines are normal, they can still work. For example, a cluster of five machines allowed up to two machines to break down.
- Do not rely on time to ensure consistency, that is, the system is asynchronous.
- In general, the running time is determined by most machines and will not affect overall efficiency because of the small number of slow machines.
Why resolve consistency issues?
We can say that the reliability of a distributed system reaches 99.99...%, but it cannot be said that it reaches 100%, why? It is because the consistency problem cannot be solved completely. Issues in the following four distributed systems are related to consistency issues:
- Reliable Multicast Reliable multicast
- Management of members in membership Protocal (Failuer detector) cluster
- Leader election election algorithm
- Mutual exclution mutexes, such as the exclusive and allocation of resources
Raft consistency algorithm
Before I introduced some of the textbook election algorithms, they are also a consistency algorithm, that is, all the last server leader is consistent. Now there are two Paxos and Raft in the mainstream consistency algorithm in practical application. Zookeeper is the choice of Paxos, and ETCD uses the raft. As a go enthusiast, I'll take a look at raft.
Raft is because Paxos too difficult to understand too difficult to achieve and proposed, the purpose is to be reliable in the case of Paxos, as simple as possible to understand. But raft's paper In Search of an Understandable Consensus Algorithm
still has 18 pages, and I'm going to be easier to understand than that.
Raft to divide the consistency problem into three small problems:
- Leader election elections
- Log replication logging, synchronizing
- Safety security
Basic concepts
Each server has three states: leader, follower, candidate
- Follower: do not send request but will only reply to leader and candidate request.
- Leader: Handling requests from the client
- Candidate:leader's candidate
Raft divides the time into terms. Each term starts with a single election. Each term has a maximum of one leader, or no leader.
RPC implementation
The algorithm requires two types of RPC, Requestvote RPC: initiated by candidates during the election process, and when another server receives the RPC, only when the other party's term and log are at least as new as their own, will the vote be voted for, Candidate, who received most of the votes in favour, will be elected leader.
Appendentries RPC was initiated by leader to distribute the log, forcing Follwer's log to be consistent with itself.
Leader election
If a follower in election timeout time did not receive leader information, enter the new term, turn into candidate, vote for themselves, the election requestvote RPC. This state persists to any of the following three occurrences:
- It won the election
- In addition, the server obtains the election
- 1 A term has passed, or there is no election result
Why is there a 3 this situation, that is when the election, if everyone at the same time to vote for themselves, then no server can get the majority of votes, this time to enter the next term, and then choose again. To prevent this from happening, each server's election time is randomly set to a different value, so the next election can be initiated first by a timeout.
Log replication
After choosing the leader, you can distribute the log.
Each log has a log index and a term number. When most follower copy this log, it is said that this log is committed and can be executed. Leader remembers the maximum log index that has been commit and uses it to distribute the next appendentries RPC. This function is the same as the number of the TCP segment.
When a leader is re-elected, its log and follower log may be inconsistent, then it forces all follower to be consistent with their own log. First leader to find the largest number consistent with the follower between the log, Then overwrite the log after that.
Safety
But so far there is no guarantee of security. For example, when leader in commit log, a follower dropped, and then this follower was later selected as leader, it would overwrite the Follwer now committed the log, Since these logs have been executed, different machines will execute different instructions as a result. In the course of the election, one more restriction could prevent this from happening, namely:
Leader completeness property: 对于任意一个term, leader都要包含所以在之前term里committed的logs.
This is the complete raft algorithm.
Note: Images are from paper in Search of an understandable Consensus algorithm
If you find it useful, please point to star
[Reprint] Consistency problem and raft consistency algorithm