zookeeper-very important Zab protocol-"Big data five minutes a day"

Source: Internet
Author: User
Tags ack zookeeper

The previous article Paxos and consistency said Zab is on the basis of Paxos made an important transformation, solve a series of problems, this one we will say this zab.

The full name of the Zab agreement is Zookeeper Atomic broadcast, the Zookeeper "Atom" "broadcast" protocol. It specifies two modes: Crash recovery and message broadcast

Recovery mode

When do you get in?

    • When the entire service framework is in the process of booting
    • When the leader server has a network outage, crashes, exits, restarts, and so on abnormal conditions
    • When a new server is added to the cluster and the cluster is in a normal state (broadcast mode), the new service synchronizes the data with the leader and then enters the message broadcast mode

In all three cases, Zab will enter the recovery mode.

What have you done?

Elect to generate a new leader server, while the more than half of the machines already in the cluster will be in sync with the leader, and the Zab protocol will exit the crash Recovery mode when the work is done.

Broadcast mode

When do you get in?

Cluster state stable, with leader and more than half of the machine state synchronization completed, exit the crash Recovery mode after entering the message broadcast mode

What have you done?

Normal message synchronization, the process of synchronizing daily production data from leader to learner

Summing up the two modes of the ZAB protocol have undergone three steps in practice, for example, let me say in detail what these two processes have done.

1. Crash recovery state-that is, the main process of selection

Entering the crash recovery mode indicates that there is a problem with the cluster at this point, then it is necessary to start a process of selecting the master.

The default selection master algorithm used by zookeeper is fastleaderelection, which is the standard fast Paxos algorithm implementation that solves the problem of slow convergence of leaderelection election algorithms (as mentioned in the previous article).

Zab State of the agreement

Looking current cluster no leader, ready for election
Following already exists leader, the current server is the follower
Leading the only leader to maintain the heartbeat between the follower
Observing observer State. Indicates that the current server role is observer

Voting process

The basis of the vote

Vote is based on the following two ID, voting is to send (MYID,ZXID) information to all servers

myID: Users configure themselves in the configuration file, each node will be configured with a unique value, starting from 1 to accumulate in the future.

Zxid: Zxid has 64 bits, divided into two parts:

    1. The high 32-bit is the epoch of the leader: election clock, each time a new Leader,epoch is selected to accumulate 1

    2. The low 32-bit is the transaction ID within the epoch: The cluster will accumulate 1 for each update operation of the user.

Note : ZK resets the epoch and transaction IDs together, each time the epoch changes, the lower 32-bit ordinal is reset, so as to facilitate the comparison of the latest data, to ensure the zxid of the global increment. (In fact, there will be problems, although the probability is small, here will not say the following article will be detailed).

About sending ballot papers

The first round of the vote for themselves, then each service to send all the above information to all other services, the ballot box will only record the last vote of each voter

About receiving a poll

The server will attempt to get a vote from another server and write it into its own ballot box. If you cannot get any external polls, you will confirm that you are maintaining a valid connection with other servers in the cluster. If yes, send your vote again, or if not, connect to it right away.

About election rounds

Since all valid votes must be in the same round. Each start of a new round of voting itself Logicclock 1.

    • Received logicclock greater than their own. Explain oneself behind, update logicclock after normal.
    • The received Logicclock is smaller than its own. Ignore the ticket.
    • Received the logickclock with their own equal, normal judgment.
About voting judgment

Compare themselves and receive (MYID,ZXID)

    • First, compare the Zxid high 32-bit election clock epoch
    • Consistency compares the transaction ID of Zxid low 32
    • Still consistent compared to the user's own configured myID
      Elected after the election (MYID,ZXID)

About the end of the election
More than half of the servers selected the same, then the poll ended, according to the poll results update their status to leader or follower

There are two more questions.

As stated above, Zookeeper is an atomic broadcast protocol, in which the process of recovering from a crash manifests its atomicity, and zookeeper guarantees two questions in the process of selecting the master:

    • No loss of commit data
    • Data discarded without commit

The ballot design (MYID,ZXID) just solves both of these problems.

    • More than half of the data that has been entered into the election is follwer, and the condition of being leader is to have the highest transaction ID that the data is up to date.
    • Uncommitted data exists only in leader, but leader is not able to take part in the first round, and the epoch will be small and the final data will be discarded.
2. Message broadcast status-data synchronization

For example, the client initiates the request, the read request is returned directly by follower and Observer, and the write request is forwarded to the leader.

Leader first allocates a globally monotonically incrementing unique transaction ID (that is, ZXID) for this transaction.

Then the proposal to Follower,leader will assign a separate queue for each Follower, then place the transaction proposal that need to be broadcast into those queues, and send messages based on the FIFO policy.

Each Follower, after receiving this transaction proposal, will first write it to the local disk as a transaction log, and feedback to the Leader server an ACK response after a successful write.

When the Leader server receives an ACK response of more than half Follower, a commit message is broadcast to all the Follower servers to notify them of the transaction commit, while
Leader itself will also complete the submission of the transaction.

Postscript

zookeeper-operation and application scenario-"Big data five minutes a day"

zookeeper-Architecture Design and role Division-"Big data five minutes a day"

Zookeeper-paxos and consistency-"big data five minutes a day"

Recently this several theoretical things too much, the next write a bit simple code, zookeeper distributed lock implementation. Thanks for reading.

zookeeper-very important Zab protocol-"Big data five minutes a day"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.