Zookeeper
Zookeeper is based on a simplified version of Paxos Zab, I think it is really difficult to understand, I have seen many times before, "from Paxos to Zookeper" feel indefinitely, but after a few months to find another face blinded, here in a tidy up (only to express my own understanding)
There are three states in the Zab protocol, each of which belongs to one of the following three types:
1. Looking: The system is in an election state when it starts or when the leader crashes
2. Following:follower node in the state, follower and leader in the data synchronization phase;
3. Leading:leader in the state, the current cluster has a leader main process;
At the beginning, all nodes are looking state and each node wants to be the leader node, and each node sends a proposal to the cluster that chooses itself as the leader node, and the proposal number is ZXID
(The Zab protocol uses ZXID as the transaction number, ZXID is a 64-bit number, low 32 bits is an incremented counter, and each client's transaction request leader generate a new transaction after the counter will add 1, high 32 bits for the leader period epoch number, When a new leader node is elected, leader will take out the maximum transaction proposal in the local log Zxid parse out the corresponding epoch to add the value 1 as the new epoch, the lower 32 bits from 0 to generate a new ZXID Zab uses the epoch to differentiate between different leader cycles), if the proposed ZXID is larger than its own, then the node data of the case is updated, then the votes are agreed to vote, otherwise continue to vote for themselves, first get the majority of the agreed node elected as leader
Now leader can be managed, zookeeper is also the implementation of two commits, when the client commits a transaction request, the leader node generates a transaction proposal for each request and sends it to all follower nodes in the cluster. After receiving more than half follower feedback, the transaction is started and the Zab protocol uses the Atomic Broadcast Protocol
The leader node assigns a queue for each follower node to be placed into the queue in the order of transaction Zxid, and the transaction is sent according to the rule FIFO of the queue. After the follower node receives the transaction proposal, the transaction is written to the local disk as a transaction log, and after a successful feedback ACK message to the leader node, the leader commits the transaction after receiving the ACK feedback from the half follower node. To broadcast a commit message to all follower nodes at the same time, the follower node begins to commit the transaction after it receives a commit;?
If the leader down, then the cluster into the recovery phase, according to the above way to elect zxid the largest node elected leader, and into the recovery phase, all follewer will own Zxid send a leader, By leader according to the follower Zxid with their own phase number of different steps to achieve the entire cluster of data synchronization (if the difference is small, Then use leader send the diff instruction from Follolwer.lastzxid to Leader.lastzxid motion to synchronize the data to follower, if the difference is large, send leader snapshot to follower directly.
Etcd
Etcd a high availability as a recent fire? Key value to service discovery system is widely used by kubernetes and other systems
Is it easier for him than for zookeeper to be more efficient in the face of smaller clusters? And his programming language go itself is a multithreaded programming language, indeed there is a lot of attraction (although I do not understand the go language, but in learning Docker is also a glimpse of the style)
In raft, any time a server can play one of the following roles:
Leader: Handles all client interactions, log replication, and so on, typically with only one Leader at a time.
Follower: Like voters, completely passive
Candidate candidate: A lawyer like proposer can be chosen as a new leader.
1.leader election phase
Just like zookeeper. All nodes will want to choose their own as a leader, but not the same is etcd but much simpler, he maintains a timer, when the timer is out of date has not leader the heartbeat information or the candidate sent a poll message, Just when the voter is automatically sent to the other nodes to choose their own leader request, if the majority of votes to be elected, if it is not coincidentally, there are two nodes simultaneously initiated and obtained the same number of votes, then after 300 milliseconds two nodes again to compete, this time the same number of votes is very low probability, Not really.
2. Message Synchronization Phase
Assuming that the leader leader has chosen to receive the message at leader, then leader adds a log request, such as "GO" for the log:
2. Leader requires Followe to follow his instructions and append this new log content to their respective logs:
3. After the most follower server writes the log to the disk file, confirm that the append succeeds and issue commited Ok:
4. In the next heartbeat Heartbeat, leader will notify all follwer to update the commited project.
Repeat the process for each new log record.
3. Network partitioning issues
If a network partition or network communication failure occurs in this process, so that leader cannot access most follwers, then leader can only update the follower servers it can access, and most of the servers follower because there is no leader , they re-elect a candidate as a leader, and then this leader as a representative of the outside world, if the outside request to add a new log, the new leader will follow the above steps to notify most followers, if the network failure to repair, Then the original leader become follower, in the phase of the loss of the old leader any update can not be counted as a commit, roll back, accept new leader new updates.?
Said so much?------------------animated!!!!!!
After reading the moment cheerful Ah, do not know which is the big God to do ~?
http://thesecretlivesofdata.com/raft/
Comparison of Zookeeper and ETCD