About distributed transactions

Source: Internet
Author: User

A few days ago saw an optimistic lock and pessimistic lock blog, read about the optimistic lock and pessimistic lock discussion and some examples in the real project, also caught my desire to write a blog, since optimistic lock and pessimistic lock is about concurrency control mechanism related knowledge, then I would like to write a blog about distributed transactions , I personally feel that distributed transactions are optimistic lock pessimistic lock extension reading.

Optimistic and pessimistic locks are mostly concurrency control solutions when dealing with read and write interactions with a single database, but now distributed scenarios are increasingly occupying the focus of our work, so what is the concurrency control in distributed scenarios? This involves the knowledge of distributed transactions, I feel that understanding of the relevant knowledge of distributed transactions can definitely play a multiplier role in our work!

Before we get busy with the methodology, let's look at how the relevant technologies of distributed transactions evolve.

Data mirroring

If we have a system that provides data services, how do we deploy it to ensure high availability of data services?

Typically, two sets of data are deployed and a copy is made so that if one of the systems goes down, another system will still provide data services to achieve high data availability.

However, the way data copies are used (services that provide high availability of data, data replicas are basically the only way), there is a "data consistency" problem, which is actually the most troublesome problem in distributed transactions, and in order to solve the "data consistency" problem can produce performance problems, The balance of data consistency and performance is an important issue that we are facing in the distributed business system.

Data consistency:

1) Weak Weak consistency: When you write a new value, the read operation may or may not be read on the copy of the data.

2) Eventually final consistency: when you write a new value, you may not be able to read it, but after a certain time window it is guaranteed to eventually read it.

3) Strong strong consistency: Once the new data is written, the new value can be read at any time in any copy.

In order to solve the above problems, intelligent human thought of a lot of ways, and each is a milestone of the idea

Master-slave model

This is a frequently used model of our database deployment: The main library and the standby library. If slave is the master backup, then master provides data services, when master fails, then slave immediately top up, as a provider of data services, until master recovers, this is a classic model. So how does this model data consistency work?

Data synchronization between master and slave either master to push data to slave, or slave to master pull data, either way it is generally periodically push and periodically pull, since periodicity is certainly the final consistency. Imagine the data synchronization process of writing data: Master first write, if Master writes successfully, then slave write, if slave write successfully, then the process commits; if slave write fails, notify Master to rollback and return to write failed.

There is a disgusting scene: Master Hung, then slave on the top, then slave can only provide read service until master recovery, then the write operation can only delay; if slave can provide write service during master Hang, then after master recovers, Master's data and slave data is inconsistent, then you have to synchronize the slave with the master data, it is very troublesome to think about it.
So Master-slave is not the perfect solution, is there any other solution?

2PC protocol (two phase commit)

Wikipedia defines the 2PC protocol:

A two-phase commit (two-phase commit) is an algorithm (algorithm) designed to maintain consistency across all nodes in a distributed system architecture, in the domain of the computer network and the database, in order to make transactional submissions. Typically, a two-phase commit is also known as a Protocol (PROTOCOL). In a distributed system, each node, while aware of its own operation, succeeds or fails, does not know the success or failure of the operation of the other node. When a transaction spans multiple nodes, in order to maintain the acid nature of the transaction, it is necessary to introduce a component that acts as a coordinator to unify the operation results of all nodes (called contributors) and ultimately to indicate whether the nodes are actually committing the results of the operation (such as writing updated data to disk, etc.). Therefore, the two-phase submission algorithm can be summarized as follows: Participants will notify the coordinator of the success or failure of the operation, and then by the coordinator based on the feedback of all participants to determine whether the participants to commit or abort the operation.

Two-phase submissions are actually divided into two phases: the vote phase and the decision phase

Both phases have an important role: the Facilitator

Vote stage:

    • The coordinator asks if each node can submit data.
    • Each node begins the preparation of the submission and then responds to the coordinator.
    • Each node responds to the coordinator, and if the readiness is OK the response succeeds and the response fails if the preparation fails

Decision Stage:

    • The coordinator receives all the node responses, and if the response is OK, the coordinator notifies all nodes to perform the data submission operation, and each node returns the coordinator "commit complete" after the commit operation has been completed, and the entire transaction is ended when the coordinator receives the "commit complete" of all the nodes.
    • When the coordinator receives one or more node response preparation failures, all nodes are notified of the rollback operation, and all nodes receive a rollback operation to perform the preparation, and then respond to the coordinator "rollback complete", and the coordinator receives the "rollback complete" of all the nodes and ends the entire transaction.

The 2PC protocol seems to be a good design for the processing of distributed transactions, and in order to avoid the data consistency problem caused by the failure of node submission, we take a vote and decision way. But there are still problems in this agreement:
Timeout problem

Vote stage:

    • The coordinator sends to each node to ask if it can submit the message if it times out, then it doesn't matter if a node's message times out then it is possible that the node received the message late, the preparation is too late, the response coordinator "Ready to work OK" message is also late, then the coordinator needs a timeout mechanism to control , such as a timeout to a threshold value is considered a failure, do not affect the subsequent operation.
    • Node Response Coordinator message timed out, then the above-mentioned coordinator needs a timeout mechanism to control

Decision Stage:

    • When the vote phase is complete, the coordinator notifies all nodes to perform a commit or prepare a rollback of the message, and if some nodes do not receive the message, the nodes are "tangled" because they do not know what to do next, and do not know what other nodes are "doing" and entering an unknown state. If you set a timeout on the node, how much time does it take to commit or rollback without receiving the message, then commit or rollback? It's still not working. This is the most cake in the 2PC protocol case.

In order to solve these problems, especially the "worst case scenario", the latter came out of the 3PC protocol (three-phase commit), but 3PC has only solved part of the problem.

3PC protocol (three-phase commit)

Because in the 2PC protocol, when the vote phase is complete, the node does not receive a decision in the second order, then the node enters an unknown state, which blocks the entire transaction. The 3PC protocol adds a phase to the decision phase.

    1. The facilitator asks if each node agrees to submit the data.
    2. The node receives the message and does not need to be prepared to go back to agreeing or agreeing.
    3. The coordinator receives all of the node's consent information before sending the node execution ready to submit the message, if there is a node that does not agree to submit the data, the direct abort
    4. The node receives a commit operation after the coordinator sends a message to prepare for submission, returns a successful message to the coordinator after the operation completes, and fails to return a failed execution message
    5. When the coordinator receives a successful message for all nodes, it ends the entire transaction and notifies the node rollback operation (which is an extremely complex process) if a node returns a failed message.

The 3PC Coordinator and Node status diagram is as follows:

Whether 2PC or 3PC, in fact, are flawed solutions, and 3PC implementation is actually very complex, for the design will be to solve the process of a variety of failure scenarios.

In fact, the network environment is inherently an unreliable environment, so trying to establish a connection on an unreliable channel (network) and coordinate work is a huge challenge! The unreliability of network environment needs you to solve the problem of communication failure and timeout, so that you can solve the "anorectal crushed"! The solution is to design a solution that tolerates this unreliable environment as much as possible, or to reduce the reliability to an acceptable level.

Fortunately, the solution is always more than setbacks, Paxos algorithm saved us!

Paxos algorithm

In the unreliable environment above, is there a better solution that tolerates this environment or minimizes reliability and agrees on changes to an operation or value in a distributed system? The answer is yes, the Paxos algorithm can do it.

Wikipedia on the Paxos algorithm:

The problem with the Paxos algorithm is how to agree on a value in a distributed system where the above anomalies can occur, ensuring that no matter what happens above, the consistency of the resolution is not compromised. A typical scenario is that in a distributed database system, if the initial state of each node is consistent, each node executes the same sequence of operations, then they can finally get a consistent state. To ensure that each node executes the same sequence of commands, a "consistency algorithm" is executed on each instruction to ensure that the instructions seen by each node are consistent. A general consistency algorithm can be applied in many scenarios and is an important problem in distributed computing. Therefore, the research on consistency algorithm has not stopped since the 1980s.

The Paxos algorithm is actually a democratic election algorithm, the role of no coordinator, is the node between each other "vote", if more than half of the support ticket to perform operations, the specific process is as follows:

If: There are three nodes of A1,A2,A3.
1. A1 node has a value change operation, before the operation, A1 send a message to A1,A2,A3 to let them vote (this message includes two parts: the operation request content and Sequence, this Sequence is a Paxos maintenance of the only self-increment sequence)
2. The receiving node receives this message, it will first take out the sequence to the other nodes you receive sequence comparison, if the largest response A1 OK.
3. If A1 receives more than half of the OK, it sends the accept Request to all nodes, and vice versa.
4. When the other nodes receive the accept request, the same will be done first to compare the sequence, if it is the largest, then the action is rejected if it is not the largest

The above sequence is a combination of the weights of the time and the requested node, and the time and weight as the number of requests is very meaningful.

Perhaps this describes the process of abstraction, a few examples below, see the example will immediately understand:

Example 1: With A1, A2, A3, A4, A5 Five nodes, A1 has an operation to change a value to X, A1 to all knot dots requests, where there is no exception, the timing diagram is as follows:

Example 2: An exception, there are A1, A3, A5 three nodes, A1 and A5 at the same time the request to change the value, but the A5 request a long time, after the A1 proposal all over, A3 only received, A3 how to do? The timing diagram is as follows:

Example 3: An exception, there are A1, A3, A5 three nodes, A1 and A5 at the same time the request to change the value, but the A5 request is a little slower, A3 reply A1 OK A3 received A5 request, then A3 how to do it? The timing diagram is as follows:

In fact, there are many anomalies, we can deduce by ourselves, we can find that as long as more than half of the principles adopted by the basic is not a problem. This is why zookeeper uses the Paxos algorithm

Google Chubby author Mike Burrows said there is only one consistency algorithm in the world, that is Paxos, the other algorithms are defective.

Summarize

This is my personal knowledge of how distributed systems handle transactions, but we may know all the points of knowledge, but how to string these knowledge into a deeper understanding of the knowledge, about distributed transactions, there are many things, such as Vector Clock, NWR model, etc. There is no time to share in the back of my study summary.

Data index

2PC (http://zh.wikipedia.org/wiki/two phase commit)

Paxos Wikipedia (Http://zh.wikipedia.org/zh/Paxos algorithm)

Paxos-simple (Http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf)

About distributed transactions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.