Zookeeper in-depth understanding (V) Paxos algorithm

Source: Internet
Author: User
Tags zookeeper

The Paxos algorithm is Leslie Lambert (Leslie Lamport, the "La" in LaTeX, which is now in Microsoft Research), which was presented in 1990 as a consistency algorithm based on message-passing and highly fault-tolerant features. [1]

Questions and assumptions

There are two models for node communication in Distributed systems: Shared memory and message delivery (Messages passing). In a distributed system based on the messaging communication model, the following errors inevitably occur: The process may be slow, broken, restarted, messages may be delayed, lost, and duplicated, and in the underlying Paxos scenario, the possibility of a message tampering, or Byzantine error, is not considered first. The problem with the Paxos algorithm is how to agree on a value in a distributed system where the above anomalies can occur, ensuring that no matter what happens above, the consistency of the resolution is not compromised. A typical scenario is that in a distributed database system, if the initial state of each node is consistent, each node executes the same sequence of operations, then they can finally get a consistent state. To ensure that each node executes the same sequence of commands, a "consistency algorithm" is executed on each instruction to ensure that the instructions seen by each node are consistent. A general consistency algorithm can be applied in many scenarios and is an important problem in distributed computing. Therefore, the research on consistency algorithm has not stopped since the 1980s.

To describe the Paxos algorithm, Lamport fictitious a Greek city-state called Paxos, which enacted laws in accordance with the political model of parliamentary democracy, but no one wants to put all their time and energy into it. So neither the senator, the chancellor or the waiter who passes the note can promise to show up when someone else needs it, and cannot commit to approving the resolution or delivering the message. But here is the assumption that there is no Byzantine general problem (Byzantine failure, that is, although it is possible that a message has been passed two times, there is absolutely no error message); As long as you wait enough time, the message will be uploaded. In addition, members of the Paxos Island will not oppose the resolution proposed by other members.

Corresponding to the distributed system, the legislators correspond to each node, and the laws enacted correspond to the state of the system. Each node needs to enter a consistent state, for example, in a symmetric multiprocessor system of a standalone cache, when each processor reads a byte of memory, it must read the same value, or the system violates the requirement of consistency. Conformance requirements correspond to a single version of the legal provisions. The uncertainty of parliamentarians and waiters corresponds to the unreliability of nodes and message-passing channels.

Algorithm

The present and proof of algorithm

First, the role of members is divided into proposers,acceptors, and learners (to allow the number of jobs). Proposers proposed that the proposal, including the proposal number and the proposed value;acceptor, be accepted (accept) upon receipt of the proposal, and if the proposal was accepted by a majority of the acceptors, the proposal was approved (chosen); learners can only " To learn "approved proposals." After you divide the roles, you can define the problem more precisely:

    1. The resolution (value) cannot be ratified until it has been proposers (the non-approved resolution is called "proposal (proposal)");

    2. In an execution instance of a Paxos algorithm, only one value is approved (chosen);

    3. Learners can only obtain the value approved (chosen).

Also need to ensure progress. This point is discussed later.

The author obtains the Paxos algorithm by continuously strengthening the above 3 constraints (mainly the second one).

In the process of approving value, first proposers value is sent to Acceptors, after which acceptors accepts the value (accept). In order to satisfy the constraint of approving only one value, the value accepted by the "majority" (majority) is required to become a formal resolution (known as the "approval" resolution). This is because the two groups of "majority" have at least one common acceptor, whether by number or by weight, and if each acceptor can accept only one value, the constraint 2 is guaranteed.

The result is an obvious new constraint:

P1: A acceptor must accept the proposal received for the first time (accept).

Note that P1 is not complete. If only half of the proposals accepted by Acceptor had value A and the other half accepted the proposal with value B, it would be impossible to form a majority and not approve any of the value.

Constraint 2 does not require approval of only one proposal, suggesting that there may be multiple proposals. As long as the value of the proposal is the same, approving multiple proposals does not violate constraint 2. Thus the constraint P2 can be generated:

P2: Once a proposal with value V is approved (chosen), then the proposal for approval (chosen) must have value v.

Note: There is a way to assign a number to each proposal, to establish a full-order relationship between proposals, so-called "after" refers to all proposals with a larger number.

If both P1 and P2 are guaranteed, then constraint 2 can be guaranteed.

Approving a value means that more than one acceptor accepts (accept) the value. Therefore, the P2 can be enhanced by:

P2A: Once a proposal with value V is approved (chosen), then any proposal that Acceptor accepts (accept) must have value v.

Because the communication is asynchronous, P2A and P1 can collide. If a value is approved, a proposer and a acceptor awake from hibernation, the former proposing a proposal with a new value. According to P1, the latter should be accepted, according to P2A, should not be accepted, in this scenario P2A and P1 have contradictions. Instead, a change of mind was needed to constrain the behavior of proposer:

P2B: Once a proposal with value V is approved (chosen), then any subsequent proposal by proposer must have value v.

Since Acceptor can accept the proposal must be proposed by proposer, so p2b implication P2A, is a stronger constraint.

But according to P2B difficult to put forward the means of implementation. It is therefore necessary to further strengthen the P2B.

Suppose that a value V with number M has been approved (chosen) to see under what circumstances any proposal with a number n (n>m) contains value v. Since M has been approved (chosen), apparently there is a acceptors majority C, they all accept V. Given that any majority and C have at least one public member, you can find a constraint p2c that implies P2B:

P2C: If a proposal with number n has value V, then there is a majority, either none of them accept (accept) any proposal with a number less than n, or they have accepted (Accpet) that the proposal with the number less than n is the largest of the proposals with value V.

The P2C implication p2b can be proved by mathematical inductive method:

Assuming that the proposal m with value V is approved, when n=m+1, the use of the rebuttal method, if the proposal n does not have value V, according to P2C, there is a majority S1, or none of them have received any proposal of a number less than n, or they have accepted the proposal of all numbers less than n The proposal with the largest number does not have the value v. Since there is at least one public acceptor between the S1 and the majority C when the proposal M is adopted, the above two conditions are not tenable and the contradiction is deduced to overturn the hypothesis, proving that the proposal n must have value V;

if (m+1): (n-1) All proposals have value V, using the rebuttal method, if the new proposal n does not have value V, according to P2C, there is a majority S2, or they have not received 0. (n-1) In any proposal, either they have accepted that the proposal in which the number is less than n is the highest number of the proposal does not have the value v. Since there is at least one public acceptor between the S2 and the majority C through M, at least one acceptor has accepted M, which can also be introduced in the number of proposals in the S2 that have been accepted in the proposal numbered less than n the largest number range in M. (n-1) between, and according to the initial hypothesis, M. All proposals between (n-1) have value V, so the proposal with the largest number in the proposal in S2 that has been accepted in the number less than n must have the value V, exporting the contradiction and thereby overthrowing the assumption that the new proposal N does not have the value v. According to the mathematical induction method, we prove that if the p2c is satisfied, the P2B must be satisfied.

P2C can be implemented through the messaging model. In addition, after the introduction of P2C, it also solves the problem of incomplete P1 mentioned in the previous article.

The content of the algorithm

To meet the constraints of P2C, proposer before proposing a proposal, first to communicate with a acceptors sufficient to form a majority, to obtain their latest acceptance (accept) proposal (prepare process), and then determine the value of the proposal based on the information collected, form a proposal to begin voting. When the majority of Acceptors accepted (accept), the proposal was approved (chosen), by proposer the news to inform learner. This abbreviated process is further refined to form the Paxos algorithm.

In a Paxos instance, each proposal needs to have a different number, and there must be a full-order relationship between the numbers. This can be accomplished in a number of ways, for example, by stitching together the names of the ordinals and proposer. How to do this is not within the scope of the Paxos algorithm discussion.

If one of the most recent acceptance (accept) proposal number m of acceptor in the prepare process answered a proposer for proposal N (n > M), but before starting to vote on N, but also accept (accept) the second number less than n A proposal (for example, n-1), if n-1 and M have different value, this vote will go against p2c. Therefore, in the prepare process, acceptor's answer should also include a commitment: No more acceptance of the proposal (accept) number less than N. This is the strengthening of the P1:

P1A: If and only if acceptor has not responded to a prepare request with a number greater than N, Acceptor accepts (accept) the proposal with the number N.

It is now possible to propose a complete algorithm.

The proposal and approval of the resolution

The adoption of a resolution is divided into two phases:

    1. Prepare stage

      • Proposer Select a proposal number n and send the prepare request to a majority in acceptors;

      • Acceptor after receiving the prepare message, if the proposal has a number greater than all prepare messages it has replied to, Acceptor replies to Proposer's last accepted proposal, and promises not to revert to the proposal less than N;

    2. Approval phase:

      • When a proposor receives a majority of Acceptors's response to prepare, it enters the approval phase. It sends an accept request to the acceptors that responds to the prepare request, including the number n and the value determined by P2C (if no value has been accepted based on the P2C, then it is free to determine value).

      • Without violating its commitments to other proposer, Acceptor accepts the request after receiving the acceptance request.

This process is guaranteed to be correct at any time of interruption. For example, if a proposer finds that there are already other proposers proposing higher-numbered proposals, it is necessary to interrupt the process. Therefore, in order to optimize, in the above prepare process, if a acceptor found that there is a higher number of proposals, you need to notify proposer, remind them to interrupt the proposal.

Instance

Use practical examples to more clearly describe the above process:

There are A1, A2, A3, A4, A5 5 members to make a resolution on the tax rate issue. Senator A1 decided to set the tax rate at 10%, so it sent a draft to everyone. The contents of this draft are:

What is the current tax rate? If there is no decision, it is recommended that it be set at 10%. Time: The 3rd year of the current parliament March 15; Sponsored by: A1

In the simplest case, no one competes with it; Information can be communicated to other members in a timely and smooth manner.

So, A2-a5 responded:

I have received your proposal, awaiting final approval.

A1 issued a final resolution after receiving 2 replies:

The tax rate had been set at 10% and the new proposal could not discuss the issue again.

This actually degenerated into a two-paragraph commit agreement.

Now let's assume that, while A1 proposes the proposal, A5 decides to set the tax rate at 20%:

What is the current tax rate? If there is no decision, it is recommended that it be set at 20%. Time: The 3rd year of the current parliament March 15; Sponsored by: A5

The bill is to be sent to other members ' desk through the attendants. A1 's draft will be sent to the A2-A5 by 4 attendants. Now, the attendants responsible for A2 and A3 the draft, and the attendants in charge of A4 and A5 are not going to work. A5 's draft was sent smoothly to A3 and A4 's hands.

Now, A1, A2, A3 received the proposal of A1; A3, A4, A5 received a proposal from A5. According to the agreement, A1, A2, A4, A5 will accept the proposal they received, the squire will hold

I have received your proposal, awaiting final approval.

Reply back to the sponsors.

And A3 's behavior will decide which one to approve.

Situation One

Suppose A1 's proposal was first sent to A3, and A5 's attendant decided to take a break. So A3 accepted and sent the attendants. A1 waited until the two attendants, plus it already constituted a majority, so the tax rate of 10% will become a resolution. A1 sent a squire to send a resolution to all Members:

The tax rate had been set at 10% and the new proposal could not discuss the issue again.

A3 received a proposal from A5 a long time later. He decided not to pay attention to the tax rate as it had been discussed. But he has to complain:

The tax rate has been set at 10% in the previous ballot, you should not bother me again!

This reply may help A5, because A5 may not be able to communicate with the outside world for a long time because of some reason. Of course, it is more likely to have no effect on A5, because A5 may have obtained the resolution just now from A1.

Situation Two

Still assume that A1 's proposal was first sent to A3, but this time the A5 of the attendants is not a holiday, just halfway through a delay. This time, A3 still will "accept" reply to A1. But before the resolution was formed, it received A5 's proposal. There are two ways to deal with this protocol:

1. If the proposal of A5 is earlier, the vote should be chaired by an earlier sponsor, as is traditionally the case. Now it appears that the two proposals are in the same time (3rd year of the current parliament, March 15). But A5 is a big shot that can't be provoked. So A3 reply:

I have received your proposal for final approval, but you have previously proposed to set the tax rate at 10%, please examine.

So, A1 and A5 have received enough reply. There were two proposals on the tax rate at the same time. But A5 knew that a tax rate of 10% had been raised before. So A1 and A5 will broadcast to all members:

The tax rate had been set at 10% and the new proposal could not discuss the issue again.

Consistency has been ensured.

2. A5 is a small, insignificant person. At this time A3 no longer pay attention to him, A1 soon will be broadcast tax rate is set to 10%.

Situation Three

In this case, we will see that it is meaningful to decide whether to respond according to the time of the proposal and the power of the sponsor. Here, time and the power of the sponsors constitute the basis for the numbering of the proposals. Such a number conforms to the requirement of "a partial order between any two proposals".

A1 and A5 also put forward the above proposal, when A1 can normally contact A2 and A3; A5 can also contact these two people normally. This A2 first received A1 's proposal; A3 first received A5 's proposal. A5 is more powerful.

In this case, A2, who had already answered A1, found a new proposal with a tax rate of 20% compared to A1 's more powerful A5, and replied A5:

I have received your proposal pending final approval.

A3, who responded to the A5, found that the new sponsor, A1, was a small figure and ignored it.

A1 did not reach the majority, A5 reached, so A5 will preside over the vote, the content of the resolution is A5 proposed tax rate of 20%.

If A3 decides to treat every member on an equal footing, it will be confusing for A1 to make a reply that "you have previously proposed to set the tax rate at 20%". In this case both A1 and A5 will try to preside over the vote, but the contents of the two proposals are different.

In this case, if the A3 to A1 reply, can only say:

There are bigger people who are concerned about this, please wait for him to make a decision.

Also, in this case, A4 has lost contact with the outside world. When he resumes contact and needs to know the tax rate, he (in the simplest agreement) will propose a proposal:

What is the current tax rate? If there is no decision, it is recommended that it be set at 15%. Time: The 3rd year of the current Parliament April 1; Sponsored by: A4

At this point, (in the simplest agreement) other members will reply:

The tax rate has been set at 20% in the previous ballot, you should not bother me again!

Release of the resolution

An obvious way to do this is to send this message to all learner when acceptors approves a value. However, this method can cause too much message volume.

Because of the assumption that there is no Byzantine failures,learners can obtain the adopted resolution through other learners. So acceptors just sends the approved message to one of the specified learner, and the other learners asks it for the resolution that has been passed. This method reduces the amount of messages, but specifying learner failures will cause the system to fail.

Therefore, acceptors needs to send the accept message to a subset of learners, which is then notified to all learners by these learners.

However, due to the uncertainty of message transmission, there may be no learner to obtain the approval of the resolution of the message. When learners needs to understand the resolution, a proposer can be made to re-make a proposal. Note that a learner may concurrently proposer.

Progress's Guarantee

According to the above process, the proposal will be terminated when a proposer finds that there is a larger number of proposals. This suggests that a larger number of proposals would terminate the previous proposal process. If two proposer, in this case, turn to a larger-numbered proposal, they could fall into a live lock and violate Progress's requirements. The solution in this case is to elect a leader, allowing only leader to submit proposals. But due to the uncertainty of message passing, there may be multiple proposer who think they have become leader. Lamport describes and solves this problem in the part-time Parliament article.

Zookeeper in-depth understanding (V) Paxos algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.