Wikipedia Introduction: paxos algorithm is Leslie Lamport, or "La" in latex, which is now at Microsoft Research Institute) A Consistency algorithm based on message transmission and highly fault-tolerant was proposed in 1990.
Paxos algorithms are currently applied in Google's systems such as chubby, consumer store, and spanner. zookeeper in hadoop also uses the paxos algorithm. In the above systems, the algorithm used is not exactly the same as the original paxos proposed by Lamport. This will be analyzed later. The purpose of this blog post is to let a Tom understand the idea of the paxos algorithm within half an hour. Tom may not be interested in mathematics or be familiar with distributed complex theories. He is just an entry-level programmer. I want to write this blog post because I have read many articles about the paxos algorithm on the Internet, as well as blogs, including Lamport papers, which are hard to understand and are mostly too complicated. I have always believed that, complex and advanced theories must be based on some general rules, which we have met and even used in our daily life. Therefore, we should ignore the paxos algorithm itself and start with things in our lives.
If a group of friends decide to travel during the Mid-Autumn Festival, these friends are distributed across the country, assuming a total of 25 people are in different provinces, determine the location to go to Lhasa, Kunming, Sanya, and so on (the date of the Mid-Autumn Festival has been set. At this time, you need to decide the destination ). The most direct method is, of course, to build a QQ group, where everyone votes, according to the principle of minority majority. This method is similar to the "Shared Memory" implementation consistency, which is easy to implement, but the paxos algorithm is not in this scenario, because the paxos algorithm thinks this method has a big problem, what if the QQ server fails? Paxos is highly fault tolerant. Therefore, paxos is similar to the scenario where the 25 people can only send text messages to each other. In order to reach an agreement on this issue, these 25 people have found the other five (of course, these five can be selected from 25 people. For the convenience of the description, the other five people are listed ), for example, five people in Beijing, Shanghai, Guangzhou, Shenzhen, and Chengdu send text messages to them to tell them their preferred destinations. These five people can communicate with each other and only receive text messages sent from 25 people. These 25 people are called friends, and the five are called captains.
Let's first look at the logic of friends. You can send text messages to any five captains. The process of sending text messages is divided into two steps:
Step 1 (application phase): Ask the five captains and try to communicate with the captains about the destination. Because each captain has always received text messages from different friends, he cannot communicate with multiple friends, he can only communicate with one friend at any time, and what principles can he follow to achieve fairness, justice, and openness? These text messages are sent at the same time. The captain adopts the principle that he agrees to communicate with the newest friends who sent the text message. If an updated text message appears, he or she communicates with the new friends who sent the text message. In short, as a person with the right to speak, you can make the best choice only when you keep listening to the latest voices. After a friend sends a text message, wait for the Captain to reply. Some captains may reply that you are too old to communicate with you. Some captains may return that your text message is the latest one I have received, I agree to communicate with you. For the later captains, they have to return to their own travel destinations. Let's talk about how the captain decides the destination.
For Alice, at least half of the team leaders have agreed to communicate with each other before they can proceed to the next step. Otherwise, you don't even have the qualification to communicate. Keep sending your hair there. Your text message is updated, and you are more likely to gain the right to communicate ......
At least half of the team leaders (more than three team leaders) agree that you can communicate with the team leaders in a substantive manner, that is, proceed to the second step; the captain can only communicate with one friend at any time. Therefore, it is impossible for two friends to reach this status at any time... Of course, you can get the communication right through crazy text messages ....
For those friends (called a) who have obtained the right to communicate, the captains will send them their own travel destinations (or they may not have decided ). It can be seen that the captains decide their own destinations, and there is no need to communicate with each other.
Step 2 (communication stage): The lucky donkey-friend received a tour destination from the Captain. There may be several situations:
First case: the captains who communicate with a (not necessarily all five, but more than half) have not decided whether to travel there. At this time, Alice a is furious, send a second text message to these long queues to inform them of their desired destinations (such as the Maldives );
Two results may be received: one is that more than half of the team leaders have agreed, so it indicates that more than half of the team leaders in the Maldives recommended by a have agreed, and the entire decision process is complete, other friends will know about the message sooner or later. A first picks up the items and prepares to go to the Maldives. In addition, it indicates a failure. A leader may be faulty. For example, if a leader is calling his girlfriend, he or she may be preemptible by other friends. (because the leader is so tired, only the new friends can send text messages to themselves, communicate with new people on your own, and a's suggestion team lead does not agree. In any case, the hard-pressed a has to start from the first step and send a text message to the captain again.
Case 2: at least one leader has decided on the destination. A may receive multiple destinations decided by different captains, these tourist destinations are the decisions made by different captains and friends at different times. A will take a look at them first, if more than half of the tourism destinations have already agreed to this situation (for example, all three captains have agreed to go to Sanya, one has agreed to go to Kunming, and the other has not decided on, so don't talk about it. It means that the entire decision process has reached an agreement. Pack up and prepare to go to Sanya. It's over. If none of them reach more than half (for example, one agrees to go to Kunming, one agreed to go to Sanya, two agreed to go to Lhasa, and one did not take care of me.) as a high-quality donkey-friend, A did not go along as he wished (the key to paxos is, the latter agrees with the former, otherwise the entire decision process will never end), although he may have wanted to go to the Maldives and so on. When I sent a second text message to the captain, I told them that the destination they hoped for was the last one in the pile of destinations they received. (For example, the Beijing captain decided to go to Kunming one minute ago, and the Shanghai captain decided to go to Sanya one hour ago, so he went to Kunming ). The idea of Alice A is that since a leader has made a decision, I will simply stick to the latest decision.
From the logic above, we can see that if more than half of the team leaders agreed to a location such as Kunming at a certain time point and followed by Alice B to send text messages, because more than half of the team leaders have agreed to communicate with B, B will inevitably receive the result from Kunming sent to him by at least one team lead, and B will stick to the latest location and will not change it, because the later friends will go to Kunming, so I agree that there are more and more leaders in Kunming, and eventually an agreement will be reached.
After reading the logic of friends, what is the logic of the team leader?
The logic of the team leader is relatively simple.
During the application phase, the team leader will only choose to communicate with the friends who sent the most recent application text message. The team leader will know the time when he received the latest text message. The team leader will not handle older text messages; if the captain agrees to communicate, he will send the travel destination (or the information has not been determined) he decides to his friends.
During the communication stage, Alice C will send the desired tourist destination (and append the time for applying for a text message, such as 3 minutes ago), so the captain should check it, if this time (3 minutes ago) is indeed the time when you have received the application text message (it indicates that you have no friends to communicate with yourself during this time), then, the captain agreed to the travel destination of luyou C (for example, in Kunming, even if he had made the decision of Sanya one hour ago, who made C Update, and then updated it to Kunming ); if it is not the latest, it means that another friend d has applied for it with himself in the past three minutes, because he is a new guy and agrees to communicate with D, so Alice C's decision won't be agreed by himself. Wait for D's decision to be sent later.
The basic idea of paxos is roughly the above process. Let's figure it out.
Paxos is mainly used to ensure the consistency of copies (or statuses) in distributed storage. Replicas must be consistent, so the update sequence of all replicas must be consistent. Because data addition, deletion, modification, and query operations generally have multiple clients for concurrent operations, which client is the first and which one is the last? This is the update order. If it is not distributed, you can use the locking method. Whoever applies for the lock first will perform the operation first. However, in distributed conditions, multiple copies exist. If the lock is released after the Synchronization Update of the applied lock + replica is completed, such a node must have a sharding lock (if multiple lock allocation nodes exist, then the demand for Distributed Lock management emerged, and the client to which the lock is assigned becomes another difficulty.) This node becomes a single point of failure, and the reliability is not good. It loses the meaning of Distributed Multi-copy, at the same time, the performance is also very poor, in addition, there will be deadlocks and other situations.
Therefore, to put it bluntly, only consistency issues under distributed conditions can seem to be solved.
In the preceding example, paxos solves this problem by using election. A few of them obey the majority idea. As long as N or more of the 2n + 1 nodes agree to a decision, it is deemed that the system has reached the same level and, in accordance with the paxos principle, eventually reached the same level in theory and will not change. In this case, the client does not need to communicate with all the servers, and you can choose to communicate with most of them. No servers are all in the working state. Some servers are suspended and only half of the servers are kept alive, the entire process can be sustained, and the fault tolerance is quite good.
In paxos, the acceptor is equivalent to the above captain, and proposer is equivalent to the above friends, And the epoch number is equivalent to the sending time of the text message applied for in the example. There are already many official paxos descriptions. I will not repeat them here. The proof of paxos correctness is complicated and will be analyzed later. In addition, the most time-consuming part of paxos is that more than half of them need to agree to the communication before they can enter the second step. Imagine that at the beginning, all the friends sent crazy text messages to the captain, each captain receives the latest text message from different friends, so it is difficult to reach more than half of them agree to communicate with a friend. to reduce this time, paxos has improved fast paxos and so on, analysis is performed when you are free.
There are some problems to think about: Before paxos, or in addition to Chubby and zookeeper systems, other distributed systems also face such consistency problems, for example, HDFS, distributed database, Amazon dynamo, and so on have different solutions. If you have time, perform comparative analysis.
Finally, let's talk about the term "consistency.
Regarding paxos consistency, I personally understand that it refers to the consistency of redundant copies (or states, but both are due to redundancy. This is not consistent with acid in relational databases. In a relational database, the secondary database does not even have one. Why is replica consistency? According to the classic definition, C in acid indicates that in a transaction, the result of transaction execution must be that the database changes from one consistent state to another. So what is a consistent state? This is related to business constraints. For example, in a classic transfer transaction, after the transaction is processed, no account money can be deducted, if the money in another account does not increase, if the sum of the money is equal to the money before the transfer, it is consistent.
From the perspective of many blog posts, these two kinds of consistency are often confused. In addition, the consistency mentioned in the CAP principle refers to the copy consistency, which is similar to the consistency in paxos. They all handle the problem that "multiple copies must be consistent because of the existence of redundant data". The strong consistency that nosql abandons also refers to the copy consistency, eventual consistency also means that there is a certain delay when the copies are completely identical.
Of course, if the database itself is distributed and there are redundant copies, in addition to solving the consistency problem of transactions in the business logic, but also to solve the copy consistency problem, you can use the paxos Protocol at this time. But the copy consistency problem is solved, and the business logic consistency cannot be completely solved. If a distributed database does not have a copy, the transaction consistency needs to be designed according to the business constraints.
In addition, when talking about paxos, it will also involve the Byzantine general problem. It means that it is impossible to try to achieve consistency through message transmission on an unreliable channel with message loss. Paxos uses the message transmission method to solve the consistency problem. Therefore, it assumes that the channel must be reliable, which mainly means that the message will not be tampered. Message loss is allowed.
We will analyze the consistency, acid, Cap, nosql, and other issues of transactions in detail later.