Source: Internet
Author: User

Tags: style blog http io ar color OS using SP

--turn from: {Old yards ' column}

The difficult understanding of the Paxos algorithm is as admirable as the popularity of the algorithm, and from my personal experience, the difficult reason is not that the algorithm is so advanced that the IQ is not enough, but that the lamport is too obscure to express the algorithm and lacks a complete application scenario. If the master can express the algorithm in a different way, you may be more receptive:

- First, we propose a scenario for the algorithm, and give a case that most readers can understand.
- Next describes how the Paxos algorithm solves this problem
- Again give the origin of the algorithm (that is, the Greek city-state metaphor and algorithmic process)

Lamport first proposed the origin of the algorithm, in the absence of any auxiliary scenario, has let a lot of people into the mire, in the premise of full of doubts, can not continue to contact the specific content of the algorithm, but also do not understand the essence of the algorithm. In this paper, the Paxos algorithm is re-described by means of different expressions.

All of our descriptions assume that readers have been familiar with Lamport's paxos-simple, and therefore no longer explain the various concepts.

In addition to Lamport's several papers, the Paxos algorithm describes a relatively concise Chinese article is: http://zh.wikipedia.org/zh-cn/Paxos%E7%AE%97%E6%B3%95, the translation of the text in place, However, there are some ambiguities in the key details and some incorrect understanding of the original text, which may lead to more confusion among readers about the Paxos algorithm, but reading this article can quickly have a general understanding of the Paxos algorithm.

**1. Application Scenarios**

(1) Consistency in the distribution

Paxos algorithm is mainly to solve the consistency problem, about "consistency", in different scenarios have different interpretations:

- NoSQL field: Consistency more emphasis on "read new write" is read-write consistency

- Database domain: Consistency emphasizes "all data state consistent", after a transaction, if the transaction succeeds, all the table data according to the SQL in the transaction operation, the modification of the modified, the increase, the deletion of the delete, cannot modify the modified, the deletion is not deleted; If the transaction fails, All the data is still in the initial state;

- State machine: Consistency in the state machine emphasizes that the state must be consistent with each other after executing a sequence of commands on a state machine with a consistent initial state, that is, sequential consistency. The consistency in the Paxos algorithm refers to this situation, and we will discuss this scenario further.

(2) MQ

If the log information of all the systems is written to an MQ server, and then each log instruction is sent asynchronously to multiple log servers through MQ to write to the file (the reason for writing to multiple log servers is to make a backup of the log file to prevent data loss), all log The data on the server must be consistent (the log content and order is exactly the same), because MQ itself has a sort function, as long as the Q data into the order, the equivalent of a globally unique number, no matter how many files to write this data, as long as the number of documents, the contents of each file must be consistent, but an MQ The server is obviously a single point, and if it goes down, it will affect the availability of the entire system.

(3) Multiple MQ

To address MQ single point issues, the preferred scenario is to use multiple MQ servers, even with an MQ Cluster, where clients can access any MQ server, different clients may access different MQ servers, data content on different MQ servers, the order may be inconsistent, If this problem is not resolved, the content of each MQ server writing to log server is inconsistent, which is obviously not the result we expect.

(4) Data updates in NoSQL

The general NoSQL is guaranteed to be available in the form of data replication, but when the client operates on multiple data, there may be a number of servers or servers that are sent for operations on the same data, possibly executing: Insert, update A, update B .... Update N, in the case of an insert successive updates, the final replication server must also perform this updating operation, if the replication server received by the thread pool, network, server resources and other causes of the update order is inconsistent, so that the replication data lost meaning , if there are even serious consequences in the financial sector.

The above inconsistency problem is the Paxos algorithm to solve, of course, these problems are not only Paxos can solve, before the Paxos before the problem has been resolved, such as by using the dual-master mode of MQ solution MQ single point problem, by using the master The server addresses NoSQL replication issues, but there are some flaws in these workarounds, which are either difficult to scale horizontally or affect availability. Of course, in addition to the Paxos algorithm and some other algorithms are also trying to solve such problems, such as: viewstamped replication algorithm.

The common denominator of the scenarios described above is to hope that the state of the multiple servers is consistent, that is, consistency, and then the Chinese wiki begins with:

In a distributed database system, if the initial state of each node is consistent, each node executes the same sequence of operations, then they can finally get a consistent state. To ensure that each node executes the same sequence of commands, a "consistency algorithm" is executed on each instruction to ensure that the instructions seen by each node are consistent

You may have a deeper understanding of the description.

**2.Paxos How to solve this kind of problem**

Paxos the solution to this type of problem is to try to globally number the status on each server, if it can be numbered successfully, then all operations are executed in numbered order, consistency is self-evident. When the server in cluster receives some data, how do I number it? Is the vote, let all the server to vote, to see which server on which data should be ranked first, which row second ..., as long as most servers agree with a number of the number of rows, then the first number.

Obviously, in order to give each data a unique number, each vote can produce only one data, otherwise the vote will have no meaning. Paxos's algorithm has all the energy to put in a single vote to produce only one data. Further, we call the voting data the core and essence of the Value,paxos algorithm is to ensure that each vote produces only one value.

**3.Paxos algorithm**

We complement the concept of the original:

**Promise**: acceptor to proposer commitment, if there is no larger number of proposal will accept it submitted proposal**Accept**: acceptor did not find a larger number than the previous proposal proposal, approved the proposal**chosen**: When Acceptor majority accept a proposal, the proposal is the final choice, also known as the resolution

In other words, Acceptor has two actions for proposer: promise and accept

The following explanations are also mainly about "only* a single value is chosen,*" and then look at the conditions P1,

P1: An acceptor must accept the first proposal that it receives.

At first glance, this condition is obvious, because before any value,acceptor should accept the first proposal, but think about it, the feeling P1 this condition is very not strict, in the end is a simple description of the problem or a mathematical strict requirements? These questions boil down to 2 questions:

(1) What is this condition essentially guaranteeing?

(2) What about a second proposal?

In the subsequent algorithm, see if a acceptor is approving whether a value has nothing to do with whether it is the first or not, and is related to the proposal number. Does that not indicate that P1 is not guaranteed? At first I also baffled its solution, and later after communicating with friends found that P1 in the "accept" in fact refers to the acceptor to proposer "promise", is the language description and the algorithm of the step description between the existence of ambiguity, Therefore, I think that the problem of the algorithm should be the use of mathematical grammar rather than text language.

So, P1 is emphasizing the first proposal to be promise, but the second one is not mentioned, which is also a question.

It is also obvious that P1 can not guarantee the Paxos algorithm, because it may not be able to form a majority, then the next discussion should be to consider how to compensate for the shortcomings of P1, so that it is guaranteed Paxos algorithm, that is, we hope that the future conditions should be explained:

- How to solve the problem that can't form majority in P1
- How to choose a second proposal

So the constraint P2 appeared:

P2: If A proposal with the value V is chosen and then every higher-numbered proposal that's chosen has value v.

P2 the appearance of the people are surprised, P2 did not follow the P1 road down, also did not solve the P1 of the above 2 incomplete, but from the other side of the discussion on how to ensure that only one value can be selected. P1 discussion is how to choose, P2 discussion is once selected, after the choice should be the same, that is, P1 is still discussing the question of choice, P2 has been elected, there is a fault in the middle, how to choose no discussion.

In fact, from the back Lamport constantly on the P2 enhancement can be seen, P2 inside contains P1 (through the proposal number, the first time no numbering, so choose), P2 really give the specific process of how to choose, from the analysis of hindsight, P1 gave the first how to choose, P2 gives all the options and the conditions are a bit repetitive. Therefore, it is inaccurate to think of P1 and P2 as two independent conditions, so the Chinese wiki mentions that " * if P1 and P2 are guaranteed, then constraint 2 can guarantee* " and have a certain effect on subtle understanding.

is not to say that P1 is not used, in turn, P2 is an unknown problem, and P1 is the known part of the unknown problem, from the contractual point of view, P1 is a constant, any increase in P2 can not be too head to meet the P1 this invariant, that is, P1 is the bottom line of P2 enhancement.

Is there any other immutable type that needs to be followed? Has this unknown invariant been destroyed in the process of P2 enhancement? These high-difficulty problems involve the correctness of the Paxos algorithm, and the rigorous mathematical proofs of MIT are beyond the scope of this article.

In addition, the Chinese wiki describes P2 as: "* P2: Once a value is approved (chosen), then the value of the approval (chosen) must be the same as this value* . "The original use of higher-numbered more to describe the future of the proposal numbering this fact, and the Chinese use of"

We temporarily press the P1 not table, the close observation of P2, in order to ensure that each time a VALUE,P2 rule is selected in the case of a value has been selected, if there are other proposer to submit value, then the approved value should be in front of a consistent, When a value is actually selected, the subsequent proposer cannot commit different value to disrupt the previous result. This is a general description, but if this description can be achieved, the Paxos algorithm can be guaranteed, so P2 also called "Safety property".

The next discussion is based on the "If a proposal with value V is chosen", How to guarantee "then every higher-numbered proposal that's chosen has value V", specifically how Do "A proposal with the value V is chosen" aside.

P2 More is from the ideological level to propose how to solve this problem, but the specific implementation of the work requires a lot of detailed steps, Lamport is through the gradual enhancement of the way to implement P2, mainly from the following aspects:

- Request for the entire result (P2)
- Request for acceptor (P2A)
- Request for Proposer (P2B)
- Simultaneous request for acceptor and proposer (P2C)

It is unclear why Lamport can divide the process so clearly, but from the articles published by Lamport, he has a deep knowledge of the distribution, and for a long time, can have such a result, with his on the basis of distribution and the huge efforts behind the great relationship. But for us, the process is known only as a result, and it is always felt that it does not know why.

We continue along the line of thought:

p2a: If a proposal with value V are chosen, then every higher-numbered proposal accepted by any acceptor have value V.

This condition is in the limit acceptor, obviously, if the p2a is satisfied, satisfies P2 is certain, but P2A's enhancement destroys the P1 invariant the bottom line, the concrete reference original text, therefore P2A itself does not have the meaning, in turn from the proposer side enhancement.

p2b: If a proposal with value V are chosen, then every higher-numbered proposal issued by any proposer have value v.

This condition is to limit the proposer, if can limit the proposer, the restriction of acceptor of course can be satisfied. At the same time, because the limit proposer must commit value V, it also guarantees the P1 (the first one is definitely value V)

However, P2B is difficult to achieve, because multiple proposer can submit arbitrary value proposal, cannot limit proposer cannot commit a value, so need to find the equivalent condition of P2B:

p2c: For all V and N, if a proposal with value V and number N are issued, then there is a set S consisting of a Maj Ority of acceptors such that either (a) no acceptor in S have accepted any proposal numbered less than n, or (b) v is the Value of the highest-numbered proposal among all proposals numbered less than n accepted by the acceptors in S.

According to the original text, the P2C contains P2B, but the p2c derivation of P2B is the most difficult part to understand.

First of all to understand what p2c to do, because p2b difficult to achieve directly, p2c to do is to solve the problem of P2B, is to solve the "If Value V is selected, the higher number of the proposal has a value V", that is:

- R: "For all V and N, if a proposal with value V and number N are issued" is the result, and
- C: "Then there is a set S consisting ..." is the condition

is to prove that if C is established, then the result of R is set up, and the original expression is "if R is established, then there is a condition R", easy to confuse causality, again sigh if the use of mathematical symbols to express such ambiguity will certainly reduce a lot.

P2C to solve the problem is: not directly try to meet the P2B, but to find a sufficient condition to meet the P2B, if the sufficient conditions can be met, then the P2B satisfaction is obvious. It is also important to emphasize that proposer can submit arbitrary value, how can you limit my commit must be value v? In fact, in the original "for any V and N, if a proposal with value V and number n is issued" refers to "if a number is N proposal commit value V, **and value V can be ACC Eptor accepts** , "If you want to be accepted, you cannot simply submit a value, it must be a restricted value, and the premise of this discussion is that value V is to be accepted. Then we look again, whether the condition C is satisfied, and the result R is established.

(a) No acceptor in S have accepted any proposal numbered less than n

If this condition is true, then N is the first proposal in S, according to P1, must accept, so the result R is established

(b) v is the value of the highest-numbered proposal among all proposals numbered less than n accepted by the acceptors in S

This proof first assumes that the proposal with number n has value x selected, there must be a set of C, where each acceptor accepts value x, and each acceptor in the set S accepts Value V, because S, c are the majority, So there exists a public member U, which accepts both X and V, in order to guarantee the uniqueness of the choice, it must be x=v.

You may find that the proof is a little too strict, "max number less than n" and "n" There are many proposal, those proposal also have some value, those value will not be v?

This will be used in the original text of the mathematical induction, that is, arbitrary number m proposal has the value V, then n=m+1 is, according to the above also has the value V, then backward recursion, arbitrary n >m have value v. In the Chinese wiki, the inductive proof does not need to m...n-1 forward, and to the N to disprove, through the mathematical induction is fully able to produce the final result.

In other words, p2c is a strengthening of P2B, satisfies the p2c can satisfy the P2B.

We looked closely at the P2C, and found that as long as the proposer before submitting proposals, consult Acceptor, see their highest number is what, they choose a value V, and then according to the acceptor answer to choose a new number, value submission, can satisfy the P2C. By numbering, it is possible to unify (a) and (b) two conditions.

In fact, P2C to express the idea is very simple: if the previous value V is selected, then commit this value V, otherwise proposer decide which value to commit, the specific practice is to consult beforehand, the decision in the matter, after the submission, that is, can be achieved through the message delivery model. Lamport to express this problem through conditions, sets, inductive proofs, etc., without mentioning the purpose of doing so, will lead to a difficult understanding. You may be more puzzled, is it possible to select only one value from beginning to finish? In fact, the selection here refers to an election, not the entire electoral cycle, you can run Paxos multiple times, each time only one value is selected.

Satisfying p2c from the side also reflects the need to submit a correct value V, to the proposer, acceptor at the same time limit, only limit one side of the problem is not resolved.

Again, the recursive relationship between the conditions p2c=>p2b=>p2a=>p2, that is, P2C finally guaranteed the P2, that is, to solve how to do a value V is selected, the selected number of the larger proposal have value V, P2C not only guarantee the results of P2, but also put forward the "How to choose" The problem, is the above stage, which fills the P1 and P2 between the lack of how to choose the fault, there are P1 2 incomplete problems from the intuitive feel will be resolved, the specific to see the algorithm process chapters.

**P1 of the problem:**

P2C also solved the P1 problem, because proposer submitted value is limited by the acceptor, will not be submitted in an election two different value, even if can be submitted also because the proposal number problem one will be rejected, So that the majority can be formed.

Another question about how to choose the second one is also obvious.

Once again, P2 inside contains p1,p1 is only the unknown problem P2 invariant formula.

"Turn" Paxos algorithm 1-Algorithm formation theory