Architecture Design: System Storage (24)--data consistency and Paxos algorithm (middle)

Source: Internet
Author: User
Tags rounds zookeeper

(Above "Architecture Design: System Storage (23)-Data consistency and Paxos algorithm (above)")

2-1-1. Prapare Preparation Phase

First, there are several data attributes that need to be persisted on the acceptor role:

    • Preparevote holds the maximum voting rounds for the completed voting authorization received by the current acceptor
    • Acceptedvote saves the polling rounds of the current acceptor at the end of the assignment phase
    • Acceptedvalue holds the value assigned to the current acceptor in the assignment phase.

1, the first stage proposer and acceptor at least one network communication, the main purpose is to determine whether the current polling round of proposed x can be authorized. In other words, based on the principles of acceptor in the preparation phase. Even if you determine that the number value for the current polling round is greater than the value of preparevote recorded in acceptor. The process is very easy, that is, proposer to all acceptor for a new round of voting for the initiation of proposal X. And wait for each acceptor to respond. Of course there will be a time-out, assuming that more than this time has not been acceptor response. Feel that they have been rejected. assuming that more than N/2 + 1 nodes do not respond within the specified time, it indicates that the entire electoral system has found a problem, the abort operation throws an error, and the client is fed the exception information .

2. After receiving this authorization request to initiate a new round of voting operations. Each acceptor starts to infer whether it is sufficient to authorize. The principle of inference is only relevant to preparevote, assuming that the current application's polling round is less than or equal to the Preparevote assignment. To accept authorization, and change the Preparevote property to the current new poll utterly number.

Here is a hidden meaning, the acceptor new authorized polling round number. Must be greater than the value of the previous preparevote.

3, proposer will be responsible for the summary of all acceptor response, and according to the summary of the situation to infer the next operation. Whether Acceptor is authorizing this vote or rejecting the vote, it is best to include the Preparevote, Acceptedvote, and acceptedvalue information recorded in the current acceptor in the response message. This facilitates proposer analysis of why it was rejected by acceptor. Shows the various scenarios that proposer may occur when summarizing the responses of all acceptor:

watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvewlud2vuamll/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/southeast "alt=" here to write a picture descriptive narrative "title=" ">

Of course, there is a situation of three, is more than (including) N/2 + 1 acceptor nodes in the specified time without feedback results, this situation directly determine the Paxos system crashes, so do not further discussion.

Note that, in either case, only the acceptedvalue of the N/2 + 1 acceptor node is the same value, it feels that the result of the proposed X is finally consistent, and the entire Paxos algorithm process ends.

3.1. In response to proposer, at least N/2 + 1 acceptor allow this round of voting. These acceptor form a set of Q, this set of Q will continue the next step of the operation.

It is important to see if the acceptor in the set Q already has Acceptedvalue: assuming that there is no value in the set Q, no matter what one of the acceptor's acceptedvalue attributes. The current proposer will propose its own value in the next step as the assignment target for each acceptor in the set Q, assuming that there is at least one acceptor acceptedvalue attribute in the collection Q with a value, Then proposer selects a Acceptedvote maximum Acceptedvalue property value, which is the target of the current Proposer acceptor assignment in the next step.

3.2, in the proposer received in response to the situation of two, has not reached N/2 + 1 acceptor allow this round of voting-whether it is not satisfied with the acceptor authorization principle or acceptor timeout is not responding. Only at least N/2 + 1 acceptor have the same value as the Acceptedvalue property. The result is that there is finally a consistent outcome for the proposed X, and no more voting is needed.

Otherwise, proposer will add its own polling-round number and then start voting-the add-on rule may be discussed again. Readers can now feel that it is +1.

2-1-2. Accept Assignment Phase

Once a N/2 + 1 acceptor node authorizes the poll, proposer is able to enter the second phase-the assignment phase. In the second stage, the majority set Q formed in the above phase is used as the operational target. For example, as seen in:

1. Proposer will be sent to each acceptor in set Q together with the value identified in step 3.1 of the above phase and its own vote . and wait for a reply.

2, acceptor receive the assignment request, will follow the inference principle to confirm whether to assign value. This inference has been said in principle. Say it again. Assuming that the currently received vote is less than the Preparevote property value of the current acceptor, no assignment is made. Why is the preparevote on acceptor changed? This is due to the operational gap between the proposer from the first stage to the second stage, and one or more proposer using a larger number of vote to initiate an update round of balloting and get the current acceptor authorization.

Assuming that the current received vote equals the value of the current Acceptor preparevote property, then acceptor will change its Acceptedvote property to vote and change its Acceptedvalue property to value.

Note a situation. Will acceptor receive an assignment request with a vote greater than the current preparevote in the second stage operation? This is not, because no matter what acceptor to replace the preparevote. Only a larger value than the current preparevote can be replaced. So the vote that was previously allowed by acceptor must be less than or equal to the Preparevote attribute value of the current acceptor.

3, after the assignment operation is complete. Acceptor will return the Acceptedvalue property and Acceptedvote property to proposer after the assignment operation. In other words, even if Acceptor rejects the second-stage assignment, it also returns the Acceptedvalue property value to proposer. The following are the possible scenarios for proposer-side summary statistics:

3.1, Acceptor received all the acceptor in the set Q result, will be summarized inference. Assume that all assignment results are the same value. It felt that the result was finally consistent with the subject x. The entire voting process is over, and value is the final value reached.

3.2. Suppose to receive the assignment result of all acceptor in set Q. And in the process of comparative comparison. Found that no matter what an assignment result is inconsistent, the assignment operation fails. Proposer will then add its own polling-round number and then go back to the first stage and start voting again.

2-1-3. Analyzing an extreme situation

A core idea of Paxos algorithm is to form the majority resolution, to form the core idea acceptor must according to own two work principle carries on the authorization operation and assigns the value operation.

That's why we're introducing the Paxos algorithm, which often mentions the number of boundary nodes in N/2 + 1.

As soon as the majority is formed. A key vote that ultimately aligns will be able to fall on the only node, that is, assuming that both the X1 and the X2 polling assumptions have been granted at the same time to the majority acceptor, Then their core strategic point is to preempt the acceptor collection that authorizes X1 to vote and the acceptor set of authorization X2 votes to find at least one acceptor node after the intersection . The good news is that the numbering of the X1 rounds of the assignment phase and the number of X2 polling are always different. There are always differences in size. And Acceptor's way of working determines that it will only accept the assignment of a larger number of voting operations.

This principle can be extended to a random number of proposer, then some readers will ask, will there be a acceptor each of the Acceptedvalue attributes are assigned, and have not reached the special case of the majority (for example, as seen)?

The answer is no. In this section we will use the reverse proof to illustrate the way. First of all, there are three X1 assignment results Proposer1 can not continue to assign this phenomenon, it is only possible that the Proposer1 in a two-stage operation. And at the time of the Acceptor1~acceptor3 assignment. The latter three nodes on the preparevote equals Proposer1 vote, while the Proposer1 is ready for a three node assignment. But found that the Acceptor4~acceptor6 Preparevote changed.

For example, as seen in:

At this time to launch V2 round vote proposerb. There are two kinds of situations. For example, as seen in:

In the first case, Proposerb has not received the authorization result of at least N/2 + 1 acceptor nodes, and is in the first phase (there are 6 acceptor nodes, so at least 4 acceptor authorization results should be received). At this point Proposerb will wait for the rollup, assuming that the wait timeout has not received the necessary number of results, then Proposerb continues to add its voting number. and was launched again. At this point the same guarantees that the Acceptor collection will not be assigned to the previous proposer-because the voting number is still the largest.

The assumption is another case of what you see. It's even easier. The Proposerb node has received authorization results for at least 4 acceptor nodes, and these 4 acceptor nodes. At least one node carries a value of Acceptedvalue to X1 -because the number of acceptor nodes currently assigned to X1 is exactly half 3 nodes, and N/2 + 1 nodes intersect, there is at least one overlapping acceptor node (note that the cardinality is even, Assuming an odd number, there is no extreme situation that we are discussing today. This means that when the Proposerb node enters the second stage assignment operation. The values passed to the remaining acceptor nodes are the same as X1, not the X2 of their original proposal, so the extreme cases initially described in this section do not occur.

2-1-4, the initial and added rules of a polling round

In the design we are discussing now, we refer to the determination of a polling round. Polls initiated by different proposer can not necessarily be incremented globally. And the same proposer must ensure that their own launch of the new ballot number vote, must be incremental (increment must be 1?). This is not necessarily). This is because acceptor in the first phase of the work principle is to accept only vote greater than the current preparevote of a new round of voting authorization, then for a proposer. The worst-case scenario is that no matter how you launch a new round of voting, you will not get acceptor's authorization, and finally the result will only be dictated by the values proposed by other proposer .

It is clear that no matter how Proposerb launches a new round of balloting, it will be rejected by acceptor in the first phase. Since at least one proposer's polling-round number has been bigger than his. The simple processing of this algorithm, although to some extent, reduces the difficulty of writing code, but can only be a pseudo-algorithm implementation (at present very many systems in order to balance the performance, maintainability and ease of implementation. will be used in this way). Since the initiation and assignment of all polling rounds will only be obeyed by the only one proposer. There is no voting conflict at all during the operation of the algorithm, and there is one aspect that is not fair to the client who cannot finally confirm the selection in the concurrency state, because the choice of all client to finally assign will only obey the only client (forget to say, The client is also a role in the entire Paxos, which is responsible for submitting proposals to the appropriate proposer that require a vote.

So we need a way to increase the number of polling rounds to ensure that different rounds of voting requests really compete.

Some readers may be the first to think of this is the Zookeeper distributed coordinator to generate numbers, to ensure the uniqueness and increment of numbers, and even a lot of academic materials, Web articles are mentioned in this way.

Come on. If you can use zookeeper to solve the problem. And what do we do with the Paxos algorithm? All the work of conflict resolution is given to zookeeper. The reason to implement the Paxos algorithm is to build a coordination component with the ZK, which is the same working layer, so the problem of numbering can only be done on its own.

Here is an introduction to the numbering of polling rounds, which can effectively reduce number collisions and really create competition in the voting process of needle proposal X. The precondition of this numbering method is that each proposer work in a LAN environment and can find each other by means of multicast. Within each proposer, there is a list of proposer collections that are sorted using Proposerid. This can be numbered by the remainder principle:

watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvewlud2vuamll/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/southeast "alt=" here to write a picture descriptive narrative "title=" ">

(NXV) + index + 1, in this formula. N represents the collection size. V represents the polling round (starting from 0), and index represents the position index of the current proposer collection. For example, when setting n==4. The proposer with ID 1 that exists in the first index of this ordered collection list is 1, 5, 9, 13 ... the proposer that exist in the second index of this ordered collection list are 2, 6, 10, 14, 18 ..., which is the number of votes that can be used. The proposer that exist in the third index of this ordered collection list are 3, 7, 11, 15, 19 ...

So this numbering design won't be repeated? Of course it will. especially in the start-up phase of the whole Paxos algorithm system . For example, the entire Paxos algorithm all n nodes of the proposer have been started, but a proposer on a temporary only found N-2 proposer node, so that the proposer calculated the number of their own use. It is possible to repeat the number that is calculated by another proposer. And this difference is also different from the method used to discover different nodes. For example, the BASIC-PAXOS algorithm is implemented in the code that might be given in the article. The Proposer node discovery is made using multicast, and the node discovery is inconsistent at the beginning of the Paxos algorithm system when the node discovery progress on each proposer node is started.

The good news is that acceptor in the first phase of the work principle will only authorize voting applications that are larger than the current acceptedvote (Acceptedvote <= preparevote at this time). That is, when two or more of the proposer holding the same voting utterly to the same acceptor to vote for authorization. Acceptor will only authorize one of them. There is also an authorization request that will be acceptor rejected. This guarantees that an authorization request with the same polling round number will not be allowed repeatedly on the same acceptor, so the proposer with the same polling-round number will be able to determine whether or not they have an authorized majority. Assume that there is no authorization when the proposer set is stable. There will be new and larger polling rounds numbered.

===============
(next)

Architecture Design: System Storage (24)--data consistency and Paxos algorithm (middle)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.