Four, distributed transaction consistency Protocol Paxos Popular understanding

Source: Internet
Author: User

Reprint Address: http://www.lxway.com/4618606.htm

Wiki's introduction: The Paxos algorithm is Leslie Lambert (Leslie Lamport, the "La" in LaTeX, which is now in Microsoft Research) in 1990, a consistency algorithm based on message-passing and highly fault-tolerant features.

Paxos algorithm is currently in Google's chubby, megastore, spanner and other systems have been applied, Hadoop zookeeper also used Paxos algorithm, in the above systems, the use of the algorithm and Lamport proposed The original Paxos is not exactly the same, this later analysis slowly. The purpose of this blog is to get a little white to understand the idea of the Paxos algorithm within half an hour. Little White may not be interested in math, not familiar with the complex theory of distribution, but an entry-level programmer. The reason I want to write this blog, because I read a lot of online introduction Paxos algorithm articles, as well as blogs, including Lamport's paper, feeling or difficult to understand, mostly too complex, I have always believed that the complex advanced theory is based on some common laws, And these universal laws in life we have met, and even used. Therefore, we first ignore the Paxos algorithm itself, from the small things in life began to talk about.

If there is a group of friends to decide to travel to the Mid-Autumn Festival, this group of friends distributed throughout the country, assuming a total of 25 people, respectively, in different provinces, to decide exactly where to go to Lhasa, Kunming, Sanya and so on which location (Rendezvous time the mid-Autumn Festival has been set, at this time need to decide tourism). The most direct way of course is to build a QQ group, we all vote inside, according to the majority of the principle of obedience. This approach is similar to the "Shared memory" implementation of the consistency, the implementation is simple, but the Paxos algorithm is not this scenario, because the Paxos algorithm thinks this way has a big problem, that is, QQ server hangs off how to do? The principle of Paxos is that fault tolerance must be strong. So, the Paxos scenario is similar to the 25 people who can only text each other, the core problem that needs to be solved is that even if any part of the people (Paxos's purpose is actually less than half of the people) "lost", others can agree at the rendezvous point. Well, how do you design it?

These 25 people find another 5 people (of course, these 5 people can be selected from 25 people, here for the sake of description convenient, just take out another 5 people), such as Beijing, Shanghai, Guangzhou, Shenzhen, Chengdu 5 people, 25 people send them text messages, tell their preferred destination. These 5 people can not communicate with each other, only accept 25 people sent messages. These 25 people we call a donkey friend, those 5 people called the captain.

First look at the logic of the donkey friend. The donkey friend can give any 5 captain to send text messages, the process of texting is divided into two steps:

First step (Application phase): Ask 5 captains to try to communicate with the captain on the destination. Because each captain will always receive different friends of the message, not with a number of donkey friends to communicate, at any time can only with a donkey friend communication, according to what principles can be fair and impartial public? These text messages are sent with the time, the captain adopted the principle is to agree with the text message to send the latest friend communication, if there is an updated text message, and SMS updated with the Friends of the donkey communication. In short, as a person with a voice, only the time to keep listening to the latest voices, to make the most sensible choice. After the donkey sent a text message, waiting for the captain. Some captains may say that you are too old for this message and I don't communicate with you; some of the captains may return saying that your text messages are the latest I've received and I agree to communicate with you. For the following captains, they have to return to their chosen destination. about how the captain decided on the tourist destination, and later.

For the donkey, the first step must be at least half of the captain agreed to communicate, in order to enter the next step. Otherwise, you do not even have the qualifications to communicate, have been there in the wild hair. The message you send is updated, you are more likely to get the right to communicate ...

Because at least half of the captains (i.e. 3 captains or more) agree, you can communicate with the captain in a substantive way, that is, to enter the second step, and the captain at any time can only communicate with 1 donkey friends, so at any time, it is impossible to appear two of the donkey friends have reached this state ... Of course, you can send a message by mad to rob the right to communicate ....

For the person who gets the right to communicate (known as a), the captain will send him the destination they decide (and may not have decided yet). It can be seen that each captain is their own decision on the destination, the captain without communication.

Second step (communication stage): This lucky donkey has received the captain to send him a tourist destination, there may be several situations:

The first case: The Captains of communication with a (not necessarily all 5 captains, but more than half) have not yet decided to go there to travel, at this time the donkey friend a elated, give these captains a second text message, tell them their desired destination (such as the Maldives);

Two results may be received: first, more than half of the captain agreed, so that a proposed Maldives by more than half of the captain agreed, the whole decision process is over, the other friends will sooner or later know the news, a first to pack things ready to go to the Maldives; Maybe the captain out of trouble, such as a captain in the phone with a girlfriend, and so on, may be other donkey friends to seize the right to communication (because the captain hate, only to update the donkey friends to send their own text messages, their own communication with the new, a proposal of the captain does not agree) and so on. In any case, the hard-pressed a will have to start again from the first step, to re-send the captain to apply for text messages.

The second situation: at least one captain has decided to visit the destination, a may receive from different captain decision of a number of destinations, these destinations are different captains and different friends at different times to make decisions, then a will look first, is not some of the tourist destination has been more than half of the captain agreed (such as 3 of the captain agreed To Sanya, 1 agreed to go to Kunming, another one did not respond to a), if there is such a situation, then do not pull, that the whole decision process has been agreed, clean up and ready to go to Sanya Bar, the end, if not more than half (such as 1 agreed to go to Kunming, 1 agreed to go to Sanya, 2 agreed 1 did not talk to me), a as a high-quality donkey friends, also do not follow their own will Paxos (the key, the latter identify the former, otherwise the entire decision-making process is endless), although the original may want to go to Maldives and so on. When they send a second message to the captains, they tell them that they want the destination, which is the latest decision in the pile of tourist destinations that they have received. (for example, to Kunming that is the captain of Beijing 1 minutes before the decision, to Sanya's decision is the Shanghai captain 1 hours ago made out, so top Kunming). The idea of a donkey friend A is that since the captain has made the decision, then I'll just make the latest decision.

From the above logic can be seen, once more than half of the captain agreed at a certain location, such as Kunming, followed by the donkey friend B continued to send text messages, if the right to communicate, because more than half of the captain agreed to communicate with B, stating that B received from more than half of the team long hair came over the news, B is bound to receive at least one of the captain sent him the result of Kunming (otherwise, more than half of the captain did not agree with the result of Kunming, which is clearly inconsistent with the previous assumptions), B will be the top of the latest location, will not change, because the back of the donkey will be the top of Kunming, so agree with more and more leaders Eventually, there must be an agreement.

After reading the logic of the donkey friend, then what is the captain's logic?

The captain's logic is relatively simple.

In the application phase, the captain will only choose to communicate with the latest application of the message, the captain know that he received the latest text messages, for older text messages, the captain will not respond to, the captain agreed to communicate, will decide the destination (or not decide this information) to the donkey friends.

In the communication stage, the Donkey friend C will send their desired destination (at the same time will attach on their own application for text messages, such as 3 minutes ago), so the captain to check, if this time (3 minutes ago) is indeed the current time to receive the latest application of their own text messages (indicating this period of time no friends to communicate with themselves), The captain agreed to the donkey friend C of this tourist destination (such as Kunming, even if they have done the decision of the past Sanya 1 hours ago, who let C update it, so updated to Kunming); if it is not up-to-date, it means that in 3 minutes there are other donkey friend D with himself, because he is a hate, agree with D, So the decision of the Donkey friend C will not agree, waiting for D to send over the decision.

The basic idea of Paxos is roughly the process above. Can be seen, in fact, the strategy of the donkey is the key to Paxos. Let's correspond to the theory.

Paxos is primarily used to ensure the consistency of replicas (or states) in distributed storage. To keep the replicas consistent, the update sequence for all replicas remains consistent. Because the data additions and deletions to the operation generally exist a number of client concurrent operations, in the end which client to do first, which client after doing, this is the update order. If it is not distributed, then you can use the method of locking, who first apply to the lock, who first operation. However, under distributed conditions, there are multiple replicas, if the dependency request lock + Copy synchronization update complete and then release the lock, then need to have the allocation lock of such a node (if it is a number of lock allocation nodes, then there is the need for distributed lock management, the lock to which the client becomes a difficulty), this node becomes a single point, It is not reliable, lost the meaning of distributed multi-copy, while the performance is very poor, in addition, there will be deadlocks and other situations.

So, after all, it seems to be able to solve the essence problem only when we solve the problem of consistency in distributed condition.

As the above example, Paxos solves this problem by using the election, the minority obeys the majority of the thought, as long as the 2n+1 node, there are more than N agreed to a decision, the system reached a consensus, and in accordance with the Paxos principle, the final theory has reached a consensus, will not change. In this case, the client does not have to communicate with all the servers, select and most of the communication can be, and do not need the server are all working state, some servers hang up, only to ensure that more than half survive, the entire process can continue, fault tolerance is quite good. Therefore, before the blog said that in the deployment of zookeeper this service, the need for odd machines, this argument of course has a certain source background, such as if it is 5, then any client and any of the 3 units agreed on the equivalent of voting ended, but 6 units have not? Only at this time need to agree with more than 4 units.

The acceptor in Paxos is equivalent to the captain above, proposer is the equivalent of the above, the epoch number is equivalent to the example of the sending time to apply for text messages. The formal description of the Paxos has been many, here is no longer the proof of the correctness of Paxos, because more complex, there is time to analyze later. In addition, Paxos most time-consuming place is to need more than half of the agreed communication to enter the second step, imagine that, at the beginning, all the donkey to the captain of the crazy text message, each captain received the latest message is different donkey friends, so that it is difficult to reach more than half of all agree with a friend of the state of communication, In order to reduce this time, Paxos also has the improvement of fast Paxos and so on, has the free analysis.

There are some questions to consider: Before Paxos, or in addition to chubby,zookeeper these systems, other distributed systems also face such consistency issues, such as HDFS, distributed databases,Amazon dynamo, etc. The solution mentality is different, and then the comparative analysis is available.

Finally, we talk about the term consistency.

Regarding the consistency of Paxos, personal understanding refers to the consistency of redundant replicas (or states, but all because of redundancy). This is not a thing to say about the consistency of acid in a relational database. in a relational database, you can't even copy, so what's the consistency of a copy? By classic definition, the C in acid means that in one transaction, the result of the transaction execution must be to change the database from one consistent state to another. So, what is the consistency of the State, which is related to business constraints, such as the classic transfer transaction, after the transaction is complete, cannot appear one account money is deducted, the other account of the money does not increase the situation, if the two add up money is equal to the money before the transfer, then is the consistency state.

From many blog posts, these two kinds of consistency are often confused. In addition, the consistency described in the CAP principle, personally, refers to the consistency of the replicas, which is close to the consistency within the Paxos. is to deal with "because redundant data exists and need to ensure that multiple copies are consistent," the strong consistency of NoSQL is also refers to the consistency of the replica, the final consistency also refers to the copy to achieve the exact same existence of a certain delay.

Of course, if the database itself is distributed and there are redundant replicas, you can take advantage of the Paxos protocol in addition to resolving the problem of consistency in the business logic of the transaction and the need to resolve the replica consistency. However, the consistency of the replicas is resolved, and the consistency of business logic is not fully solved, and if it is a distributed database, but there is no replica, the transaction coherence needs to be designed according to the business constraints.

In addition, when it comes to Paxos, there is also the question of Byzantine generals, which means that it is impossible to attempt to achieve consistency through message passing on an unreliable channel where there is a loss of information. Paxos itself is the use of message delivery method to solve the consistency problem, so it is assumed that the channel must be reliable, here is reliable, mainly refers to the message will not be tampered with. Message loss is allowed.

Regarding the consistency, the transaction acid,cap,nosql and so on, later detailed analysis. The description of this article may have some examples of inappropriate places, if wrong, welcome criticism.

Four, distributed transaction consistency Protocol Paxos Popular understanding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.