The Paxos algorithm in distributed database

Source: Internet
Author: User
Tags ticket

The Paxos algorithm in distributed database

Http://baike.baidu.com/link?url=ChmfvtXRZQl7X1VmRU6ypsmZ4b4MbQX1pelw_VenRLnFpq7rMvYfDDmg3Rg1Aw6YyobKozdN599x2sCiJNNHV_

The Paxos algorithm is Leslie Lambert (Leslie Lamport, the "La" in LaTeX, which is now in Microsoft Research), which was presented in 1990 as a consistency algorithm based on message passing. This algorithm is considered to be the most effective in similar algorithms.Chinese namePaxos algorithmCreatorLeslie LambertTime1990Statusis considered to be the most effective in similar algorithms. Directory

1 overview

2 background

3 Math Problems

? Problem Description
? Math Basics

4 several algorithms

? The initial agreement
, the
Basic Agreement
, the
Full council agreement.
1 Overview editing Paxos Algorithmis Leslie Lambert (Leslie Lamport, the "La" in LaTeX, a message-passing consistency algorithm proposed by this person in Microsoft Research in 1990. [1] This algorithm is considered to be the most effective of similar algorithms.2 background editing The problem solved by the Paxos algorithm is how a distributed system can agree on a value (resolution). A typical scenario is that in a distributed database system, if the initial state of each node is consistent, each node executes the same sequence of operations, then they can finally get a consistent state. To ensure that each node executes the same sequence of commands, a "consistency algorithm" is executed on each instruction to ensure that the instructions seen by each node are consistent. A general consistency algorithm can be applied in many scenarios and is an important problem in distributed computing. Therefore, the research on consistency algorithm has not stopped since the 1980s. There are two models of node communication: Shared memory and message delivery (Messages passing). Paxos algorithm is a kind of consistency algorithm based on message passing model. The Paxos algorithm can be used not only in distributed systems, but also in the case where multiple processes need to achieve some consistency. The consistency method can be implemented through shared memory (requiring a lock) or message delivery, and the Paxos algorithm uses the latter. Here are a few scenarios where the Paxos algorithm works: Multiple processes/threads in a machine agree on data, multiple clients concurrently read and write data in a distributed file system or distributed database, and multiple replicas in distributed storage respond to read and write requests. Lamport's first Paxos algorithm, the part-time Parliament, is more challenging to understand, in part because it is Lamport to express and explain the problem in a story-telling way. So when reading an article, the reader needs to see what the author wants to say through the story itself. For example, there will be a lot of articles about Paxos civilization has not been discovered and textual research, these maps to the actual system is often simple, we all know the basis of the Ming, but if the reader is struggling to know what these content is, it was fooled. The following chapters are arranged as follows: The second section corresponds to 1.1-2.1 of the original text. The third section corresponds to the original 2.2-3.2. [1]3 Math problem editor Problem DescriptionSince Lamport is presenting the Paxos question in the form of a story [2], we need to briefly outline the question: The law enforcement on the Greek island Paxon (legislators, later called Pastor Priest) votes through law in the Parliament Hall (chamber), and exchange information through the way the waiter passes the note, and each law enforcement officer will record the passed law on his own account (ledger). The problem is that the law enforcers and waiters are unreliable, and they can leave the halls of Parliament at any time, and at any moment there may be new law enforcers (or just temporary departures) returning to the Parliament Hall for a legal vote on how to make the voting process work properly and the laws passed without contradiction. Description: It is not difficult to see the story of the Parliament Hall is our distributed system, each priest is corresponding to each node or process, the waiter pass the note process is the process of communication, the law is we need to ensure consistency of the value of (value). The access of the priest and attendant corresponds to the failure and affiliation of the node/network, and the pastor's account is the node persistence storage device. The above voting process can be further described as process requirements (progress requirements): When most priests stay in the halls of Parliament long enough, and no priest enters or exits, the proposed bill should be passed and recorded in the accounts of each priest.Fundamentals of MathematicsThe law in Paxon is completed by voting (ballots, also translated into elections), and each poll involves a group of pastors called a quorum (quorum), and the vote succeeds and passes the law only if all the priests in the quorum approve of the act. Each poll B contains the following: B_dec in the voting b_qrm quorum priest's collection (non-empty priest collection) B_vot in favor of the priest set B_bal poll number with the above definition, we see that the necessary and sufficient conditions for voting B are as follows: B_qrm belongs to B_vot. Then we define BAs a collection of votes, and stating that a vote if the following three conditions are met, then consistency can be guaranteed. Every vote in practice can be seen as a read-write request, and all the quorum priests approve it by law: all nodes that involve this request are responding to requests at the same time (such as updating a value) to ensure consistency. The size of the election number in practice does not represent the timing of the election launch. Three important definitions are given below: B1 ( BEach election in B has a unique election number. B2 ( BEach of the two elections in B has at least one common priest. B3 ( BB In each election B, if the arbitrary priest in its quorum is in favour of a previous election, then this election B is equal to the previous election in which the pastor in B was in favour. That the new law is equal to all the laws that participate in the election of priests to vote in favour. Description: See here, reader 80% is already very confused, below we take a version of the distributed Key-value database as an example, each key-value has multiple replicas, if the client initiates an update (key,vaule) operation, it will be generated by a node initiated, A consistent operation that responds to the related node, that is, election B. An update is made to the copy that saved the Key-value. It is important to note that the quorum priest (B_QRM) is a large subset of all the nodes in the example that hold this key-value copy, because it is possible at some point that some of the nodes that save this key-value replica are unreachable. Bis a series of update operations on a key-value, and different laws are actually different values of a key-value. Then b1-b3. B1 means only one update operation at a time; B2 means that each two update operation must have a common node participation; B3 means that the Key-value value of a key-value operation is consistent with the latest value of the previous vote in all participating nodes. This is because if a node has already voted before, it confirms that it can modify the value, while the other quorum's pastor/node has not yet confirmed the value. Here's why B1-B3 contains consistency! lemma 1.1If B1 ( B), B2 ( B) $ and B3 ( B) is satisfied, then for the BAny B and B ', there is a little proof, interested can refer to the original theorem 1.2If B1 ( B), B2 ( B) and B3 ( B) is satisfied, then for the BAny B and B ', if $b ' _bal=b_bal so by B1 ( B) B ' =b. If B ' _bal is not equal to B_bal, then there is always a number large, a small, according to the Lemma 1.1 available. Theorem 1.3B>b_bal and for all BB has a Q and B_QRM intersection is not empty. There is an election B ' satisfying B ' _bal=b, B ' _qrm=b ' _vot=q, so if B1 ( B), B2 ( B), B3 ( B) is satisfied, the B1 ( Band B '), B1 ( Band B '), B1 ( Band B ') are also satisfied. A little proof, see the original. This theorem is about gathering in an election. BAfter each successful election, as long as there is an intersection with each election in the previous collection, then these successful elections merge the electoral collection BMeet consistency.4 Several algorithmic edits The above proves that consistency can be guaranteed if a protocol satisfies B1-B3 constraints. Directly from these constraints to get preliminary protocol, basic protocol is a limited version of preliminary protocol, ensuring consistency. The Complete Synod protocol further limits the basic protocol to meet conformance and process requirements (progress requirements). The following will be the specific process of these three algorithms.Initial ProtocolTo satisfy the B1, the priest-sponsored election number must satisfy the partial-order relationship, and one way is for each pastor to use an incremented number as the election number, but so the priest cannot immediately know whether the value they have chosen has been chosen by another priest as the election number has been used. Another way is to use the number + priest name as the election number, thus avoiding the use of their own election numbers by other priests. Meet B2, the quorum for each election must be a majority of the set (majority set) Q, so that any two elections will have a common priest. Most of the collection here is a flexible choice, in the original text Lamport use weight analogy, the weight of people more likely to stay in Parliament Hall, so that can use more than half of the priest collection as a majority of the collection. As for the actual situation of the majority of the collection is to see the specific circumstances. Satisfies the B3, requiring each pastor p to find each priest Q's Maxvote (B,q, B_QRM) every time before initiating an election B)。 According to the above requirements, the initial protocol can be obtained: 1. Pastor P Select an election number B and send Nextballot (b) to the other priest 2. Other Pastor Q after receiving Nextballot (b), return Lastvote (b,v) to Pastor P,v=maxvote (B,q, B) $ is the largest pro ticket for Q cast that is less than the B number. In order to ensure that B3,q cannot vote in favour of the elections between B and B_bal. (if q is sending Lastvote (B,V) and voting for a new election then V is not the biggest pro ticket for Q-cast) 3. Pastor P has received lastvote (B,V) from a majority of the set q in each Pastor Q, initiating a new election, numbered B, a quorum of Q, and the law D satisfies B3. Then Pastor P wrote the law on the back of his account and sent Beginballot (b,d) to each priest in Q. 4. After receiving Beginballot (b,d), Pastor Q decides whether to vote in favour of the election, and if so, he will send vote (b,q) to Pastor P. 5. If Pastor P receives the affirmative vote vote (B,q) from each priest in Q, the Law D is written to his account and sent to all Q success (d) messages. 6. Upon receipt of the success (d) message, Pastor Q writes the law d into his own account. Description: The first step is that the pastor who initiated the law wants the number of the next election to be B. Pastor Q responded to Pastor P's request with Lastvote (B,V), which guaranteed v=maxvote (B,q, who passed the law to Pastor P). B) was changed, specifically to vote in favour of an election between B and B_bal. The third step requires the law D to meet B3, here I started a little confused, the actual system value is determined by the client, and should not be B3 decision. Here we still use the above example of the Key-value database to clarify the idea: when a node/Pastor first launch update before the equivalent BFor an empty set, the initiating update/election operation continues until all quorum (quorum) has voted in favour of the law (that is, the majority set node has updated the value of the Key-value to be considered a successful update), B3 corresponds to the previous update did not succeed, Then the new election values need to be kept in context. The fourth step allows the priest to not send vote (B,Q) or send several times, corresponding to the information sent may fail due to communication and not sent or sent multiple times. Once the priest has voted in favour, it is confirmed that the value can be modified. Considering that the last sixth Law D was written to the accounts by Pastor Q, it was possible that at Fifth step, Pastor P had written the law into his own account, and then sent success (d) to the other priest, which was inconsistent because the communications or the priest had left the assembly Hall and had not been written into his own accounts. So the real time to write to the account should be in the fourth step, when Pastor Q is sent to the Pastor P for the vote and the law is written to the respective accounts. Instead of considering how to ensure that the law written by Pastor Q Fourth is inconsistent, there are more elections to ensure consistency if the law is not passed. It is also mentioned that the law is passed when the law is first written to the accounts.Basic ProtocolThe initial agreement (preliminary Protocol) requires each priest to save (i) Every election he initiates, (ii) Every affirmative vote he casts, and (iii) every $lastvote$ he sends. To simplify the data that the priest needs to save, we make a restriction on the above protocol to get the basic Protocol protocol. First introduce three new parameters: Lasttried[p] Pastor P's last election prevvote[p] Pastor P The most recent poll nextbal[p] The maximum number of B received for the election number, that is, the maximum election numbers that Pastor P participates in the initial agreement, Each priest may initiate an election at the same time, requiring each priest to initiate only one election lasttried[p in the Basic agreement, and once an election is launched, the information before the election is unimportant. In the initial agreement, it is required that each priest cannot vote in favour between B_bal and B, and in the underlying agreement it is more strictly required not to vote in favour of an election less than B. Then the underlying protocol can be outlined in the following steps: 1. Pastor P Select a larger than lasttried[p] election number B, send Nextballot (b) to the other priest 2. Pastor Q received Nextballot (b) and b>nextbal[q] after setting nextbal[q]=b, followed by sending Lastvote (b,v) to Pastor P, where v==prevba[q]. (if B is less than or equal to nextbal[q], do not reply) 3. From satisfying a majority of the collection Q each priest receives lastvote (B,V) information, Pastor P initiates a number B, quorum is Q, law is D (satisfies B3) of the election, and Beginballot (B,D) is sent to each priest in Q. (Return to the first step if no priest satisfies any of the majority of the set Q) 4. Pastor Q received Beginballot (B,D), decided to cast an affirmative vote, set prevvote[p] for this vote, and sent vote (B,Q) to Pastor P. (if B is not equal to Nextbal[q] after receiving Beginballot (B,D), this information is omitted, stating that during this period the Pastor Q also received other numbers larger elections) 5. Pastor P received voted (B,D) from the majority of the collection Q, and B==lasttried[p], said that the election was successful, that the Law D was recorded in the accounts, and that each priest in Q sent a success message success (d). 6. Each Pastor Q receives the success (d) message and writes the law to the accounts. The underlying protocol is the initial co-Because both have no behavioral requirements for the pastor and therefore do not guarantee the process (QS). The following is an agreement on the assurance process-the full parliamentary Agreement (complete Synode Protocol). full parliamentary agreement The underlying protocol guarantees consistency without guaranteeing any process, because it only illustrates what the priest might do, without asking what the priest should do. To meet the process requirements mentioned earlier (Qrogress Requirements), we need to add some additional requirements to enable the pastors to complete 2-6 steps as soon as possible. Consider a situation if the second step of the pastor Q receives the election number B is larger than the previous received, then he will give up all the elections received before. However, before an election with an election number B is not confirmed, it may receive a larger number of election B ', so that no law can be passed and the process cannot be guaranteed. Therefore, in order to achieve the process demand, a successful election will be required to launch another election. The first thing you should know is that the waiter delivers the message and the pastor's time to process the message, which is often achieved by setting timeout in the network, and if the priest does not receive a response from the waiter for a certain amount of time, the waiter or the corresponding pastor leaves the Hall of Parliament. Assuming the priest performs an action within 7 minutes, the waiter transmits a message within 4 minutes, then a Pastor p sends a message to the Pastor Q, hoping that its response time should be within 22 minutes (7+4+7+4 minutes). With the assumption of the time above, and considering the situation discussed above, if the elected Pastor P will be expected to receive a reply from the other pastor within 22 minutes of the second and fourth steps, if not, some pastors or waiters may have left the hall of Parliament, or a number of pastors have initiated larger elections. In both cases, Pastor P should terminate the election and restart a new election, in order not to be able to carry out the newly-launched election number, to get the latest election number from other priests, and to pick a larger number to launch the election. Assuming that Pastor P is the only pastor who can sponsor the election and that there are most of the ministers in the Assembly Hall, it is guaranteed to pass a law within 99 minutes: 22 minutes to find the law with a larger number, 22 minutes to get the maximum number and select a larger number, 55 minutes to complete 1-6 To complete a successful election (question: Since only Pastor P is able to initiate an election, the numbers are controlled, and it seems unnecessary to find and select a larger number in the first two steps.) A: Not all elections were initiated by President, and the other priests initiated the election, and President issued the election numbers to other priests wishing to launch the election. From the above process we find that the complete parliamentary agreement requires an electoral president process, president's election algorithm is not the focus of the article, so the article only in T-minute instead of the election president time, so T + 99 minutes can pass a law. The method of selecting President in this article is whose last name is in the alphabet and sends his surname to theAll the priests in the hall, if in T-11 minutes a priest did not receive a surname more than his last name in the alphabet, he thought he was a president (I think the broadcast weight should be good too, not to say that it would be longer to stay in the Senate hall longer than the heavier weight)? ^_^). There is one more detail: Every pastor P needs to send his lasttried[p to other priests during the election president so that President can choose a number that is large enough for the first election. At this point, the full parliamentary agreement can guarantee the process by electing president and setting timeouts. Multi-legal congressional agreementIn the parliamentary agreement of the previous section (Complete Synod protocol), President was elected, and each priest wishing to elect had informed him that President had allotted an election number to the priest, passing only one law at a time. The multi-legal congressional agreement (the Multi-decree parliment) selects a president through a series of laws and only needs to execute the first two steps once. The specific method is president the first step to send Nextballot (b,n) instead of Nextballot (b), expressed the hope that through the n-b of all laws, in President's account, the law before the number n is continuously recorded, B>n. Other Pastor Q receives the message that each of the laws that have appeared in its account number greater than $n$ is returned to president, not returning the normal lastvote information on the account. The following talks about the nature of the multi-legal congressional agreement, first of all the order of the law, the simultaneous election of different legal numbers, and the election of every priest who considers himself a president (not knowing how president is elected or the order in which the law was passed). In the third step of the complete parliamentary agreement, the law was proposed, the first time it was written to the accounts, it was said that the law was through。 When a president needs to come up with a new bill, he has to learn from the majority of the priests. They all voted in favour, and each law was voted by at least one priest in the majority of the ministers, so president always learned all the laws that had passed before launching a new election. President will not fill the legal number of the vacancy with the important law. And will not be disorderly to propose laws, so the agreement satisfies "legal order": if both law A and Law B are important laws, the law a before the Law B offers to pass, then the law A has a lower legal number than the law B. The 2nd attribute is that president after the election and no one goes in and out of the hall of Parliament, the law is passed by the following steps (3-5 steps corresponding to the full parliamentary Agreement): 3. President sends Beginballot to each priest of a quorum priest; 4. Each priest sends voted information to President. 5.president sends success messages to each priest. This requires only three message passes per law, and by merging the Beginballot and Success commands, you can further reduce message delivery.

Preliminary
Adj, preliminary, initial, starting, preparatory


Synod
N. General Assembly; Religious council; Church Court


Ballot
N. voting; voting paper; Total votes
VI. Vote; Draw a decision
Vt. To make a vote; pull a ballot.


Tried
Adj. reliable; tested.

President
N. President; Chairman; Principal


Decree
N. A decree; a verdict.

Parliment
N. Congress

Oceanbase introduced the Paxos protocol, each transaction, after the main library execution completes, to synchronize to more than half of the libraries (including the main library itself), such as 2 libraries in 3 libraries, or 5 libraries in 3 libraries, the transaction is successful. In this way, a few libraries (for example, 1 libraries in 3 libraries, or 2 libraries in 5 libraries) are not affected when the business is not impacted SQL server AlwaysOn also uses the Paxos protocol

The Paxos algorithm in distributed database

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.