10 minutes to learn about the Zab (Zookeeper Atomic Broadcast) agreement

Source: Internet
Author: User
Tags rounds zookeeper

Zookeeper based on Zab (Zookeeper Atomic Broadcast), the system architecture in the master and Standby mode is implemented to maintain data consistency among the replicas in the cluster.

The ZAB protocol defines the four stages of an election (election), Discovery (Discovery), Sync (Sync), broadcast (broadcast).

Election (election) is the selection of the host, the Discovery (Discovery), synchronization (sync) when the main elected, to do the recovery of data stage;

Broadcast (broadcast) when the host and from the selection and synchronization of good data, the normal main write synchronization from the stage of writing data.

The following is a brief introduction to the Zab protocol, which is designed to quickly understand the essence of the Zab protocol. Then through the paper to understand the details of the specific agreement. It mainly introduces two stages of election and broadcasting.

Basic concepts

We understand some of the basic concepts of ZK.

The ZK cluster has three roles:

    • Leader is what we call the Lord;

    • Follower is what we say from;

    • Observer can be considered as the main clone copy, do not participate in voting, this article can be ignored;

A node of the ZK cluster, with three states:

    • Looking election status, current leaderless;

    • The state of leading leader;

    • The state of following follower;

Every time a successful message is written, there is a globally unique identifier called ZXID. is a positive integer of 64bit, 32 is called Epoch for the election era, the lower 32 bits are the self-increment ID, and each write is added one at a time. Can be imagined as the ancient Chinese era name, such as Wanli 15 years, Wanli is epoch, 15 years is the ID.

ZK cluster is generally an odd number of machines (2n+1), only one host leader, the rest are slave follower. Choose the main or write the data, must have the >=n+1 Taiwan election same, can carry out the election operation.

Voting priority: Prioritize compare zxid, if equal, then compare machine IDs, in order from large to small.

Election

When the cluster is new, or the host freezes, or the host loses contact with half or more of the slave, it will trigger the selection of a new host operation. There are two kinds of algorithms fast paxos and basic paxos .

Fast Paxos

The algorithm used by default Zab is the Fast Paxos algorithm.

Each election has to add one to the number of election rounds, similar to the Epoch field in Zxid, to prevent different rounds of elections from interfering with each other.

Each node that enters the looking state is the first to vote for itself, and then sends the poll message to other machines. The content is the number of rounds of voting in <, the zxid of the node being dropped, the numbering > of the node being dropped.

Other looking state nodes are received after the

1 First determine whether the ticket is valid. The validity of the method is to see whether the number of polling rounds and the number of polling rounds recorded locally are equal:

2.1 If the number of rounds is smaller than the local polling, discard.

2.2 If it is larger than the number of local polling rounds

    1. 证明自己投票过期了,清空本地投票信息,

    2. 更新投票轮数和结果为收到的内容。

    3. 通知其他所有节点新的投票方案。

2.3 If the number of votes is equal to that of the local polling, the votes received and the ballot papers voted on are compared according to the priority of the poll.

  2.3.1 如果收到的优先级大,更新自己的投票为对方发过来投票方案,把投票发出去。

  2.3.2 如果收到的优先级小,则忽略该投票。

  2.3.3 如果收到的优先级相等,The voting for the corresponding node is updated.

3 after each poll, review the list of voting results that have been received to see if there are more than half of the votes. If it does, it terminates the vote, declares the election, and updates its status. The discovery and synchronization phases are then performed. Otherwise, the poll continues to be collected.

Basic Paxos

1 Each looking node makes a request to ask for another node's vote.

The other nodes return their votes < ZK's Id,zxid >. For the first time all cast themselves.

2 after receiving the results, if the votes received are larger than the zxid of their own votes, update their votes.

3 When all the nodes are returned, the poll is counted, and the election of one node is more than half, then the election is successful. Otherwise continue to start the next round of inquiries until you select the leader end.

Basic Paxos and Fast Paxos differences

Here fast is the initiative to push out, as long as the results are updated, and immediately sync to other nodes. Other nodes may not have their own tickets notified to all nodes, they found that their votes are low priority, to update the vote, and then update and then re-notify all nodes.

Basic has to ask each node to know what the new results are, and then ask the other nodes for the new election results.

Fast is faster than basic, is a node, do not have to exchange with each node vote information, to know whether their tickets to update. Will reduce the number of interactions.

Broadcast--Master-Slave synchronization

Master-Slave synchronization data is relatively simple, when there is a write operation, if the slave receives, will go to the host. Do a forwarding to ensure that the write is done on the host.

The LORD proposed a business first, received more than half reply, and then sent the submission. When the master receives a write operation, the locally generated transaction generates a ZXID for the transaction and then sends it to all follower nodes. When follower receives a transaction, it writes the log of the proposed transaction to the local disk and returns it to leader after success. Leader receives more than half of the feedback to commit the transaction. Then notify all follower commit transactions, follower receive and commit the transaction, after submission, the client can be distributed.

Summarize

By only the master control is written and then synchronized from, the generated global ZXID is guaranteed to not conflict. Globally unique ZXID is able to prioritize election and synchronization data. The main part of the election is understanding the fast Paxos principle. The core idea is to increment the Zxid order, which ensures that the node with the highest priority can be the primary. master-Slave synchronization by proposing and submitting two stages , more than half of the nodes write successfully, and the data is considered to be written successfully.

Welcome attention

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.