"Turn" Paxos algorithm 3-Implementation discussion

Source: Internet
Author: User
Tags ack

--turn from: {Old yards ' column}

The first two Paxos algorithm discussion, let us to the Paxos algorithm theory formation process to have the approximate understanding, but distance it to become an executable algorithm program still has a long way to go, because a lot of details and errors are not considered. The authors of Google Chubby say that the Paxos algorithm is far less simple to implement, because Paxos's fault tolerance is limited to server crash, but the actual implementation of the project to consider disk corruption, file corruption, leader identity loss and many other errors.

1. Paxos functions of the various roles

In the Paxos algorithm, there are five roles of client, proposer, proposer leaer, acceptor and learn, which can be reduced to three main characters: proposer, acceptor, learn. Roles only exist logically, and in actual implementations, nodes can be multi-professional.

In our discussion, we first assume that there is no proposer leader this role, in the case of a live lock, if the algorithm process is correct, then the leader role of the algorithm process is certainly correct.

In addition to the five roles, there are three important concepts: instance, proposal, value, respectively: The value of each Paxos election process, proposal, proposal

Of course, there are 4 key processes:

    • (PHASE1): Prepare
    • (PHASE1): Prepare ACK
    • (PHASE2): Accept
    • (PHASE2): Accept Ack

For acceptor, also contains is the promise, accept, reject three action.

First, a picture, a more intuitive understanding of the functions of several roles (the specific functions of each role reference Lamport's paper is sufficient):

is not very strict, only for the performance of the relationship between the roles.

2. Proposer

In proposer, acceptor, learn are related to the number of proposal, the number should have proposer to make changes, the other role is read-only, which guarantees that there is only one data source. When multiple proposer submit proposal at the same time, it is necessary to ensure that each proposer number is unique and comparable, and that the practice has been mentioned before. It is also important to emphasize that it is not enough for each proposer to increase the number by its own rules, but also to understand the maximum number of "outside" numbers, for example, P1, P2, P3 (see: Paxos algorithm, "numbering problem: Numbering uniqueness")

    • The current number of the P3 is the initial number 2
    • When P3 submits proposal, it is found that there is already a larger number 16 (16 is P2 proposed, by rule: 5*3+1)
    • P3 the new no >16 must be guaranteed when the code is launched, and must be selected according to the preceding rules: 5*3+2 = 17, but not only according to its own rules: 1*3+2=5

This requires acceptor to give the current maximum number in the Reject message, proposer may be down, resume service after restart, reject message will help it quickly find the next correct number. But when multiple acceptor reply to their respective reject messages, things become complicated.

When proposer sends proposal to a acceptor, there are three results:

    • Timeout: timeout, aceptor response not received
    • Reject: The number is not large enough to refuse. With the current maximum number
    • Promise: Accept and ensure that proposal less than this number are not approved. With the current maximum number and the corresponding value

A sufficient regulation in judging whether Phase2 can be carried out is that there must be a acceptor majority promise the current proposal.

The following discusses proposer behavior from PHASE1 and Phase2, respectively:

Phase1-prepare: Send prepare to acceptor

Proposer Select the proposal number locally, send it to acceptor, and receive several response of the situation:

(a) No response received from the majority

Message loss, server downtime, and no majority response, under reliable message transfer (TCP), it should be reported that downtime causes the remaining servers to not continue to provide services, and in practice a majority is less likely to go down at the same time.

(b). Reject received by the majority

Acceptor can cause arbitrary errors, such as loss of messages, downtime, and so on, causing each acceptor to see the maximum number inconsistency, so response to proposer in reject message the maximum number is not consistent, This situation proposer should take its maximum as the comparison object, recalculate the number after the continuation of the PHASE1 prepare stage.

(c) Promise received from the majority

Depending on the value contained, these promise are divided into three different cases:

    • The value of the majority is the same, stating that the final resolution has been reached before
    • Value is different, and no final resolution has been reached
    • The returned value is all null

The case of all null is better handled, as long as the proposer free to determine the value can be, the majority of the agreement is also good to deal with, select the resolution of the value of the submission can be, the value of different cases have two ways to deal with:

    • Scenario 1: Free to determine a value. Cause: Anyway, no resolution has been reached before, this time the submission of which value should be unlimited.
    • Scenario 2: Select the value with the largest promise number. Reason: The election (instance) already has a proposal, although not approved, but the future proposal should continue before, Lamport in Paxos simple paper to choose this way.

In fact, the essence of the problem is: within a instance, a acceptor can accept multiple value? Constraint P2 only requires that if a value V has been selected, then it should also be V; Conversely, if value V has not been accepted by the majority, there is no limit to acceptor accept only one value.

It is possible to feel both ways of processing, as long as you choose a value, you can agree in the process after Paxos. In fact, it is possible that value V has become the final resolution, but Acceptor does not know that if you do not select Value V and choose other value, will result in a instance within two resolution.

Will there be a situation where a, B, C, D are the promise,a, B, C of the proposal number of the majority, value is (a), D is (2,2)? That is, the numbers are not consistent, but the small number has reached the final resolution, and the large number of not?

Set: Small number of proposal for P1,value for V1, large numbered proposal for P2,value

    • If P1 elect the final resolution, then must be completed Phase1, Phase2. There is a acceptor majority C1,P1 for its maximum number, and each acceptor is accept v1;
    • When P2 executes Phase1, there is a majority C2 response promise, then C1 and C2 have a public member whose maximum number is P1 and accept V1
    • According to the rules, P2 can only choose V1 continue phase2, that is to say v1=v2, whether Phase2 can succeed, will never leave in acceptor like (2,2) such a value

That is, simply selecting value according to "Scenario 2" will guarantee the correctness of the result. The key to this is the mystery of the majority, which has two crucial effects:

    • In Phase1 reject the small numbered proposal
    • Phase2 forcing proposal to select the specified value

The reason for the majority to work is that any two majorities have at least one public member, and this public member has a decisive influence on the subsequent proposal, if the majority rejects the subsequent proposal, These proposal will not go on because of the inability to form a new majority. This is the essence of the Paxos algorithm.

Phase2-accept: Send the Accept to acceptor

If everything is OK, proposer will choose a value to send to acceptor, this process is relatively simple

Accept will also receive 2 responses:

(a). Acceptor most of the parties accept value

Once the majority has taken the value, the final resolution has been reached and the rest is left to learn to study and close the election (instance).

(b). Acceptor majority reject value or timeout

Description acceptor is not available or the number submitted is not large enough to continue Phase1 processing.

This is probably the case with proposer, but there are several issues to consider when actually programming:

    • Duplicate messages from acceptor
    • The news of the time-out suddenly arrived.
    • Message persistence

The other 2 questions are relatively simple, and the problem of persistence needs to be discussed.

The purpose of persistence is to continue to participate in the Paxos process when proposer server is "awake".

From the previous analysis, it can be seen that the correctness of proposer work by the correctness of the number to ensure that the correctness of the number is proposer to the number of the initial write and acceptor reject together to ensure that, so long as acceptor can work normally, Proposer does not need to persist the current number.

3. Acceptor

Acceptor's behavior is relatively simple, is to decide whether to accept the proposal according to the number of the proposal, the number depends on promise and accept two kinds of messages, so acceptor must be connected to the received message to do persistent processing. According to the previous discussion, it is also known that the persistence of acceptor also affects the correctness of proposer.

At the time of Acceptor's decision to proposal, there was an important concept that was not discussed in detail, namely instance. Any judgment on proposal is based on a instance, that is, a paxos process, and when this instance is announced (the final resolution is chosen), the Paxos process shifts to the next instance. There are a few problems that can be derived:

    1. When was instance closed? Who is closing?
    2. Does Acceptor's behavior depend on instance's closure or not?
    3. Will the acceptor majority agree on two different value within the same instance?

According to a discussion of the functions of each role in 1, whether the resolution was elected was decided by learn, and when learn learned that a certain value V had been accepted by a majority, it considered the resolution to be elected and announced the closure of the current instance. As mentioned in 2, for network reasons acceptor may not be aware that instance has been shut down and will continue to answer questions about the instance to proposer. That is, in any case acceptor can not accurately know whether the instance is closed, acceptor the correctness of the program can not rely on instance is closed. But acceptor can provide more information when rejecting proposer when it is already known that instance has been closed, for example, to make proposer choose a higher instance to resubmit the request.

Of course, as long as proposer the proposal in the manner referred to in 2, there would be no case of two resolutions arising from the same instance.

4. Learn

Learn's main duty is to study the resolution, but the resolution must be one in order to learn, can not jump number, such as learn already know the resolution 100,102, must know at the same time 101 can learn together. Just wait for the arrival of resolution 101th is obviously not a good way, learn also to take the initiative to acceptor to inquire about the situation of resolution 101th, Acceptor will make the message persistent, do this is obviously not difficult.

Learn also periodically check the value received, once aware that the resolution has been reached, it is necessary to close the corresponding instance, and notify Acceptor, proposer, etc. (as needed can notify any number of objects).

Learn there is still a problem is that the choice of a server to do learn or select multiple, if there are n acceptor,m learn, it will generate N*M traffic, if M is large then the traffic will be very large, if m=1, the traffic is small but will form a single point. The tradeoff is to select a relatively small m, so that these learn notify other learn.

The learn in Paxos is relatively abstract, well understood but difficult to imagine what can be done, because the application of Paxos is not clear. There are generally two scenarios for using Paxos:

    • Paxos as a separate service, such as Google's Chubby,hadoop zookeeper
    • Paxos as part of the application, such as Keyspace, BerkeleyDB

If Paxos is used as a separate service, the Learn function is to inform the client when a resolution is reached, and if it is part of the application, learn executes the business logic directly, such as starting data replication.

Persistence:

All of the information that learn relies on is value and instance, which is persisted in acceptor, so learn does not need to persist the message, and when the learn is added or restarted, it is possible to take the information back through the acceptor.

Error handling:

Learn may be restarted or newly added to the "What happened before" is unclear, the solution is to enable learn to continue to listen to messages until a instance corresponding value agreed, learn to acceptor request all instance before.

At this point, we further discussed the role of Paxos roles and possible implementation of the analysis, away from our Paxos algorithm into an executable program of the goal and into a step, so that we paxos the way to achieve a general heart, but there are many problems need further discussion, such as error handling. Although some errors and handling methods are mentioned, there is no systematic consideration of all errors.

The next discussion will focus on error handling in a distributed environment.

"Turn" Paxos algorithm 3-Implementation discussion

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.