One: Zab protocol overview
--->zookeeper does not fully adopt the Paxos algorithm, but uses a protocol called Zookeeper Atomic broadcast (Zab,zookeeper Atom Message Broadcast Protocol) as the core algorithm for its data consistency.
The--->zab protocol is an Atom broadcast protocol specifically designed for distributed coordination Service zookeeper that supports Peng recovery.
--->zookeeper implements a master-and-standby system architecture to maintain data consistency across replicas in the cluster. Specifically, Zookeeper uses a single master process to receive and process all client transaction requests, and uses the Zab Atomic broadcast Protocol to broadcast the state changes of the server to all replica processes in the form of transaction proposal. This master-and-standby model of the ZAB protocol ensures that only one main process in the cluster can broadcast the state change of the server at the same time, so the client's large number of concurrent requests can be handled well.
---> All transaction requests must be coordinated by a globally unique server called the leader server, while the rest of the other servers become follower servers. The leader server is responsible for translating a client transaction request into a transaction proposal (proposed) and distributing the proposal to all follower servers in the cluster. After the leader server waits for feedback from all follower servers, once more than half of the follower servers have the correct feedback, then leader will distribute the commit message to all follower servers again. Requires it to submit the previous proposal.
II: ZAB Protocol Introduction
---the two basic modes of the >ZAB protocol: Crash recovery mode and message broadcast mode.
---> Crash Recovery model
The Zab protocol allows the ZK cluster to enter the crash recovery model in the following scenarios:
(1) When the service framework is in the process of launching
(2) When the leader server has a network outage, crashes and exits with a restart and other anomalies.
(3) When more than half of the servers in the cluster are not already in service with the leader server to maintain normal communication.
What does the ZAB protocol do when it enters recovery crash mode?
(1) When there is a problem with leader, enter recovery mode and elect a new leader server. When the new leader server is elected, and more than half of the machines in the cluster have completed state synchronization (data synchronization) with the leader server, exit the crash recovery mode. Enter message broadcast mode.
(2) When a new machine into the cluster, if the cluster already exists a leader server is responsible for the message broadcast, then the newly added server will consciously enter the data recovery model. Locate the leader server and synchronize the data with it, then enter the message broadcast mode to participate in the message broadcast process.
Two: Zab's message broadcast
---The >ZAB protocol differs from the two-phase commit protocol, the ZAB protocol removes the interrupt logic during the two-phase commit process.
The--->ZAB protocol begins to commit proposal after more than half of the follower servers have feedback ack, without waiting for all follower servers in the cluster to respond with feedback.
---> About zab a single-point outage in leader if transaction commit is guaranteed and data consistency is ensured, the crash recovery model is introduced to solve the problem.
The message Broadcast protocol of--->zab is based on the TCP protocol with FIFO (first-in-out) feature, which ensures the order of receiving and sending messages during message broadcasting.
---> During the entire message broadcast process, the leader server requests processing steps for each transaction:
(1) The leader server generates a global incremental transaction ID (that is, ZXID) for the transaction request, guaranteeing the order of causation for each message.
(2) The leader server generates a corresponding proposal for the transaction to broadcast.
(3) The leader server allocates a separate queue for each follower server, letting the transaction proposal that need to be broadcast into those queues, and sending messages based on the FIFO policy.
(4) Each follower server, after receiving this transaction proposal, first writes the local disk as a log, and feeds back to the leader server an ACK response after successful write
(5) When the leader server receives an ACK response of more than half of the follower, leader itself completes the commit to the transaction. At the same time, a commit message is broadcast to all the follower servers to notify the transaction to commit. Each follower server will also complete the commit to the transaction after it receives the commit message.
Three: Zab's crash recovery
---> Basic features to ensure data consistency when leader a single point of issue, after a new election leader
The ZAB protocol needs to ensure that transactions that have already been committed on the leader server are eventually committed by all servers
(1) Suppose a transaction is committed on the leader server and has received ACK feedback from more than half of the follower server, but the leader server hangs before it sends a commit message to all follower machines.
(2) Server1 is LEADER,C2 to complete the transaction commit on leader, but notify the follower server to be committed when the commit, to ensure C2 on Server2 and Server3 submitted
The ZAB protocol needs to ensure that transactions that are raised only on leader servers are discarded.
(1) Assuming that the initial leader server Server1 after a transaction Proposal3 (P3) has not been sent to the follower request, hoping to get an ACK before it hangs. The P3 transaction is discarded.
Four: Zab's leader election algorithm requirements
---> Data synchronization
(1) After the old leader down, the election of the new leader, the old leader will not be again after the resumption of the election of the new leader.
(2) The old leader down, in the remaining follower server to select the new leader standard, must be the largest transaction ID follower become the new leader. (That is, the latest follower server for data synchronization)
(3) The transaction ID (ZXID) is a 64-bit number. Where the lower 32 bits can be done is a simple monotonically incrementing counter, and a high of 32 bits represents a leader from the birth to the dead epoch number.
(4) The new leader is elected, the epoch number of the old leader is analyzed from the transaction proposal, and incremented by 1, as the new transaction ID of the high 32 bits, and then the lower 32 bits of the new transaction ID are restarted from the 0-bit count.
(5) The new leader ensures data synchronization by comparing transaction IDs with transaction IDs on all follower machines. Ensure that data is synchronized with it on all follower. The new leader on the old is abandoned. The follower server is added to the list of available follower servers when the data reaches synchronization. Then start the message broadcast.
Three: Zookeeper's Zab protocol