[Transfer] paxos algorithm 2-algorithm process (Implementation)

Source: Internet
Author: User

Refer to the previous article: paxos algorithm 1.

 

1. Number Processing

According to P2c, Proposer will first consult the acceptor to view the maximum number and value it approves before deciding which value to submit. Previously, we have been emphasizing higher-numbered proposal without explaining how to handle low-numbered Proposal.

| -------- Low number (L <n) -------- | -------- current number (n) -------- | -------- high number (h> N) -------- |

The correctness of P2c is guaranteed by the higher serial number H generated by the current serial number n. The lower serial number l may also comply with P2c at a certain time point, however, due to the unreliable network communication, L is delayed to submit at the same time as H, and l and h may have different values, which obviously violates P2c, the solution is that the acceptor does not accept any proposal with an expired number. A more accurate description is as follows:

 

P1a: an acceptor can accept a proposal numbered n iff it has not responded to a prepare request having a number greater than N.

Apparently, the first proposal received by the acceptor meets this condition, that is, p1a contains P1.

For more information about serial numbers, see 【On the numbering problem: unique number].

 

2. paxos algorithm Formation

To reorganize P2c and p1a, you can propose the paxos algorithm. The algorithm is divided into two phases:

Phase1: Prepare

(A) Proposer selects a proposal number N and sends it to a majority in the acceptor.

(B) If the acceptor finds that N is the largest number in the request it has replied to, it will reply to the maximum proposal and the corresponding value (if any) of its accept ); there is also a commitment (promise can also include a number, which will be introduced later): proposal with a number less than N will not be approved

 

Phase2: accept

(A) If the proposer receives a majority response, it sends an accept message (proposal numbered N, value V) to the majority of the acceptor (which can be different from the prepare majority)

The key is what the value V is. If the acceptor response contains the value, take the one with the largest number as v. If the response does not contain any value, then a proposer selects

(B) Check after the acceptor receives the accept message. If there is no greater proposal than N, the value corresponding to the accept; otherwise, the acceptor rejects or does not respond.

A diagram is referenced in the general process to illustrate:

 

, Picture from http://coderxy.com/archives/121

It seems that the algorithm process is very simple, but it is very difficult to understand how the algorithm is formed. After careful consideration, this algorithm will have more questions:

On the numbering problem: unique number

One important factor to ensure the correct running of paxos is the proposal number, which must be comparable in size/order. If it is a proposer, it is easy to do it. If it is a proposal by multiple proposer at the same time, what should I do? Lamport does not care about this problem, but requires that the numbers be in full order, but we must be concerned. This problem seems simple, but it is a little tricky, because it is essentially a distributed problem.

 

This method is provided in Google's chubby paper:

Assume that there are n proposer, each numbered as IR (0 <= IR <n), and any value of proposol number S should be greater than its known maximum value, and satisfy: S % N = IR => S = m * n + IR

 

The maximum value that Proposer knows comes from two parts: the value that Proposer receives from the auto-increment number and the value obtained after receiving the reject of the acceptor.

Take three proposer P1, P2, and P3 as examples. Start m = 0, numbers are 0, 1, and 2, respectively.

When P1 was submitted, it was found that P2 had been submitted. P2 number is 1> P1 0, so P1 recalculates the number: New p1 = 1*3 + 0 = 4

P3 is submitted as number 2 and found to be 4 smaller than P1. Therefore, P3 is re-numbered: new P3 = 1*3 + 2 = 5

 

The entire paxos algorithm is basically centered on the Proposal Number: Proposer is busy selecting a larger number to submit proposal, and acceptor compares whether the number of submitted proposal is the largest, as long as the number is determined, the corresponding value is also determined. Therefore, in the paxos algorithm, nothing is more important than the proposal number.

 

Live lock

When the poposal submitted by a proposer is rejected, it may be because the acceptor promise has a larger proposal, so the proposer increases the number and continues to submit. If both proposer find that their numbers are too low and then propose a higher proposal, it will lead to an endless loop, also known as a live lock.

 

Leader Election

 

In theory, the live lock problem exists. The solution provided by Lamport is to elect a proposer as the leader, and all proposal is submitted by the leader, when the leader goes down, other leaders will be elected immediately.

The leader can solve this problem because it can control the submission progress. If the previous proposal has no result, the subsequent proposal will wait and don't worry about increasing the number to submit again, it is equivalent to converting a distributed problem into a single point of failure. The robustness of a single point of failure is ensured by the election mechanism.

 

The problem seems to be getting more and more complicated, because another leader election algorithm is required, but Lamport thinks this problem is relatively simple in fast paxos, because the leader election failure will not affect the system, he does not want to discuss this issue. However, he said later that the results of Fischer, Lynch, and Patterson studies indicate that a reliable election algorithm must use random or time-out (lease ).

 

Paxos is an election algorithm. Can we use paxos to elect a leader? The election leader is part of the election proposal. Is paxos already used recursively when the leader is elected? The paxos algorithm simplified version called paxoslease can be used to complete leader election, such as keyspace, libpaxos, Zookeeper, and Goole chubby. We will discuss paxoslease in detail later.

Although Lamport mentioned random and timeout mechanisms, I personally think paxoslease is a more robust and elegant approach.

 

Puzzles brought by leader

Leader solves the live lock problem, but introduces a question:

Now that you have a leader, you only need to set a queue on the leader, and all proposal can be numbered globally, except that the leader can be elected, it is very similar to the single-point MQ mentioned in paxos algorithm 1.

That's not to say, as long as you select one from multiple MQ as the master, it is equivalent to implementing the paxos algorithm? Currently, MQ itself supports the master-master mode. Is paxos a dual master mode?

 

From the number point of view, it is true that, as long as a single master is elected to receive all proposal, the number problem can be solved, there is no need to go through the acceptor process. However, the paxos algorithm requires that no matter what error occurs, a value can be selected in each election and learned by learn. For example, Leader, acceptor, and learn may all go down, and then they may "Wake Up". These processes must ensure the correctness of the algorithm.

If there is only one master, the election results cannot be learned by learn when the machine goes down. That is to say, the leader election mechanism is more about ensuring the correctness of algorithms in exceptional circumstances, paxos was not a master-master.

Here, we first mentioned the role "Learn". After the value is selected, learn's job is to learn the final resolution, and learning is also part of the algorithm, we also need to ensure correctness under any circumstances. The main work in the future will be centered on "Learn.

 

Paxos and 2-segment submission

Google once said that other distributed algorithms are simplified in the form of paxos.

Assume that the leader only submits one proposal to the acceptor:

  • Send prepared to majority Acceptor
  • Receive majority response
  • Send the accept to the majority for the value to be approved

In fact, it is a two-segment commit problem. The entire paxos algorithm can be seen as multiple cross-execution and inter-impact two-segment commit algorithms.

 

How to select multiple values

The paxos algorithm is described in the "one election" process. As mentioned above, the actual execution of the paxos algorithm is one round after another, and each round has a proprietary title: instance (translated into Chinese), each instance selects a unique value.

 

In each instanc, a proposal may be submitted multiple times to obtain the acceptor's approval. Generally, if the acceptor is not accepted, the proposer increases the number and continues to submit. If the acceptor has not selected a value (approved by the majority), the proposer can submit the value at will; otherwise, the comments must be submitted for selection, which has been described in P2c.

 

Another question to be raised in paxos is that in the prepare stage, the proposal number is submitted, and then the value to be submitted is determined, that is, the value and number are submitted separately, this is a little different from our thinking.

 

3. Learning resolutions

After a resolution is finally selected, the most important thing is to let learn about the resolution. Learning about the resolution is to decide how to handle the resolution.

In the course of learning, the first problem encountered was how learn knew that the resolution had been selected. The simple practice was that every acceptor that approved proposal told every learn to be learned, however, the traffic is very large. A simple optimization method is to tell only one learn so that the unique learn can notify other learn. This reduces the traffic, but the disadvantage is also obvious, which leads to a single point of failure; of course, the compromise solution is to tell a small part of learn, and the complexity is that there will be a distributed problem between learn.

In any case, it is certain that each acceptor sends an approval message to learn. If not, learn cannot know whether the value is the final resolution, therefore, the optimization problem is reduced to one or more learn problems.

 

Can I select a leader for learn like the leader of Proposer? Because each acceptor has persistent storage, this can be done, but it will make the system more and more complex. We will discuss this issue in detail later.

When learn learns a resolution, another important problem is to learn in sequence. The previous election algorithm spent a lot of energy on global numbers for all proposal, so that they can be used in order. However, the order of the resolutions received by learn may be different. Therefore, learn may first receive Resolution 10, but the 9th has not arrived yet. At this time, it is necessary to wait for the 9th to arrive, or take the initiative to request the acceptor to obtain the 9 th and 10 th resolutions.

 

4. Exception and persistent Storage

Many exceptions may occur during Algorithm Execution, such as proposer downtime, acceptor downtime after receiving proposal, Proposer downtime after receiving message, and acceptor downtime after accept, learn goes down and many other errors, such as storage failure.

However, no matter what errors, the paxos algorithm must be correct. This requires proposer, aceptor, and learn to enable persistent storage so that the server can still correctly participate in paxos processing after waking up.

  • Propose this storage has submitted the maximum proposal number, resolution number (instance id)
  • The maximum number of the acceptor that stores the promise; the maximum number and value of the acceptor, and the resolution number.
  • Learn stores learned resolutions and numbers

The above is a rough introduction to the paxos algorithm. The goal is to have a rough understanding of the paxos algorithm, know what problems the algorithm solves, the role of the algorithm, and how it is generated, there is also the process of Algorithm Execution, the core, and the requirements for fault tolerance Processing.

However, it is difficult to translate an executable algorithm program based on the above description, because there are many problems to solve:

  • Leader Election Algorithm
  • The leader is down, but the new leader has not been selected. What is the impact on the system?
  • Whether more cross-occurring errors can ensure the correctness of the Algorithm
  • How to Learn how to determine when learn arrives
  • Where is the maintenance of instance no and proposal no?
  • Performance

A large number of problems fly like snow. We can discuss the implementation issues only after these issues are solved one by one. Of course, the most important question is that the paxos algorithm has been proved to be correct, but how can the program be proved to be correct?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.