Distributed Transactions (DTS) What you need to know

Source: Internet
Author: User
Tags commit message queue prepare rollback throw exception

At present, most of the systems in cloud computing, big Data and Internet have adopted SOA and microservices architecture. A business operation that involves an end-to-end full link is often done by multiple services and DB instances together. Therefore, in a business scenario with high consistency requirements, how to ensure that data consistency between multiple services is a key point in the RPC call. first, distributed system/soa/micro-service architecture features:

distributed Consistency (consistency), availability (availability), and partition tolerance (partitiontolerance) are not present in large distributed systems that can be met at the same time. In most cases, only 2 of these items can be satisfied, and the final consistency of the system is achieved (base theory).
(1) Cap features:

A. Consistency (consistency):( The same data is consistent across all nodes of the distributed system)

B. Availability (availability):( All nodes that are active in the distributed system can handle operations and respond to queries)

C. Partition tolerance (Partition tolerance):(if a network failure occurs, a subset of nodes cannot communicate, but the system still works)

(2) acid characteristics:

A. atomicity (atomicity)

All operations in a transaction (transaction) are either completed or not completed and will not end up in the middle of a link. When an error occurs during execution, the transaction is rolled back (Rollback) to the state before the transaction begins, as if the transaction had never been executed.

B. Consistency (consistency)

Transactional consistency means that the database must be in a consistent state before and after a transaction is executed. If the transaction completes successfully, all changes in the system are applied correctly and the system is in a valid state. If an error occurs in the transaction, all changes in the system are automatically rolled back and the system returns to its original state.

C. Isolation (Isolation)

In a concurrency environment, when different transactions manipulate the same data at the same time, each transaction has its own full data space. Modifications made by a concurrent transaction must be isolated from modifications made by any other concurrent transaction. When a transaction views the data update, the data is in the state it was in before the other transaction modifies it, or the state after the other transaction modifies it, and the transaction does not view the data in the middle state.

D. Persistence (Durability)

This means that as long as the transaction completes successfully, the updates it makes to the database must be persisted. Even if a system crash occurs, the database can be restored to the state at the end of the transaction when the database system is restarted. second, the basic introduction of distributed Transactions

Distributed Transaction Services (distributed TransactionService, DTS) is a distributed transaction framework that is used to ensure eventual consistency of end-to-end business operations in a large distributed/microservices environment.

The cap theorem shows that any large distributed system/MicroServices can only guarantee two points on three points of consistency, availability, and partitioning tolerance. Because of the frequent packet loss and network failure in Distributed system, partition tolerance must be satisfied, and in order to take account of high availability, most systems transform strong consistency requirement into final consistency requirement, and ensure the final consistency of data through idempotent mechanism. Iii. Introduction to commonly used distributed technology

(1) Local message table (classic ebay mode)

The core idea of this scheme is that the distributed system executes asynchronously in the way of message log when processing the task. The message log can be stored to a local text, database, or message queue, and then automatically retried through the business rules scheduled task or manually. Take the cross-bank transfer of an online payment system as an example:

The first step, the pseudo-code is as follows, the user ID is a account deduction of 1000 yuan, through the local transaction transaction messages (including the local transaction ID, payment account, collection account, amount, status, etc.) into the message table:

Begin Transaction

         Update User_account Set amount = amount-1000 where userId = ' A '

         insert INTO trans_message (XID, Payaccount,recaccount,amount,status) VALUES (UUID (), ' A ', ' B ', 1000,1);

End transaction

commit;

The second step, notify the other user ID is B, add 1000 yuan, usually through the way of message MQ send asynchronous message, the other party subscribed and listen to the message automatically triggered the transfer of the operation; here to ensure idempotent, to prevent the triggering of duplicate transfer operations, you need to execute the transfer operator to add a trans_recv_ The log table is used for idempotent, and after the second phase receives the message, it is determined by judging the Trans_recv_log table to see if the relevant record is executed, and if it is not executed, the balance of the B account will be increased by $1000 and the record will be added to Trans_recv_log. Updates the status value of Trans_message with a callback after the event ends.
(2) message middleware

A. Non-transactional message middleware

This is still the case with the above-mentioned inter-bank transfer, it is difficult to guarantee that the operation of the MQ Post will be successful after the debit is completed. Such consistency seems difficult to guarantee. The following pseudo-code illustrates the exception for message delivery:

try{

    Boolean result = Dao.update (model);//Update database failed to throw exception if

    (result) {

                      mq.send (model);//If MQ timeout or receiver processing fails, Throw exception

    }

}catch (Exception ex) {

                          rollback ();//If exception is rolled back

}

For the above operating conditions are mainly the following:

Successful operation of the database, delivery of messages to MQ is also successful, it is normal, everything OK.
The operations database failed and no messages were posted to MQ.
The operation database succeeded, but the message was posted to MQ, the exception was thrown out, and the operation of the update database that was just executed is rolled back.

From the above analysis of the situation, basically to ensure that the reliability of sending messages. Let's analyze the consumer problem:

1. After the recipient takes out the message, the customer's corresponding business operation succeeds. If the business execution fails, the message cannot be invalidated or lost. You need to ensure that messages are consistent with business operations.
2. Try to ensure the idempotent nature of the message. If there is a duplicate message post, it can be idempotent without impacting the business.

B. Transactional-enabled message middleware

Apache Open source ROCKETMQ Middleware supports a transactional messaging mechanism to ensure that the asynchronous processing of local operations and sending messages is consistent with the results of local transactions.

In the first phase, ROCKETMQ sends a prepared message before executing the local transaction and will hold the interface of the message back to the address.

The second stage is to perform local operations.

In the third phase, the confirmation message is sent, the interface address URL is checked through the first stage, and the state is modified, if the local transaction succeeds, the modified state is committed, otherwise the modified state is rolled back.

Where, if a third-stage acknowledgment message fails to be sent, ROCKETMQ will have a scheduled task to scan the transaction message in the cluster, and if it finds a message in the prepare state, it will confirm to the message sender whether the local transaction has been successfully executed. ROCKETMQ determines whether to roll back or continue sending a confirmation message based on the policy set by the sending side. This ensures that the message is sent and the local transaction succeeds or fails simultaneously.

Back to the above example of the transfer, if user A's account balance has been reduced, and the message has been sent successfully, as consumer User B began to consume the message, this time there will be a consumption failure and consumption timeout two issues, the idea of solving the timeout problem is to try again until the consumer consumer message success, The problem of message duplication is likely to occur throughout the process, and it needs to be dealt with in the above-mentioned idempotent scheme. Distributed Transaction -2pc Protocol

In order to solve the problem of consistency in large distributed/microservices systems, it is more popular to use the more famous second-order submission protocol (2 Phase commitment Protocol) and Sankai commit protocol (3 phasecommitment Protocol). Due to performance issues, the three-phase commit protocol is currently less used. This article also mainly introduces the two-phase protocol. 2PC Protocol

The two-phase commit protocol is a classic solution for data consistency in distributed systems. In a large-scale cluster environment, it is possible to ensure the availability of services by means of code quality, mock testing, and so on, but cannot guarantee the availability of other services for the monomer microservices. When a full-link end-to-end business operation, often across multiple nodes, multiple applications, in order to be able to guarantee the acid characteristics of global transactions, we need to introduce a coordination component (this is called TM) to control all service participants (called RM) operation results, The results of the feedback from all participants determine whether the entire distributed transaction is committed or rolled back.

The first stage: called the preparation (prepare) stage. The transaction Coordinator sends the prepare request to each service application, and the service application does the preprocessing after the request is made, and the preprocessing may be a pre-check or temporary storage of the request, which can be interpreted as a tentative submission. The following are the general steps:

A. The coordinator of the transaction asks all the participant services if the action can be submitted.

B. Each participant begins the preparation of the transaction execution: such as resource lockout, reserve resource, write rollback/retry log.

C. The participant responds to the Coordinator, and if the transaction readiness is successful, the response is "can be committed", otherwise the response rejects the submission.

Phase two: called commit (Commit)/rollback (rollback) phase. refers to the phase in which a transaction is actually committed or rolled back. If the transaction coordinator discovers that a transaction participant has a failure during the prepare phase, all participants are required to roll back. If the facilitator discovers that all participants are successful in prepare, then he will send a submission request to all participants before all participants are formally submitted. This ensures that all submissions are successful or all fail. Here are the specific steps:

A. If all participants respond to "can submit", then the coordinator sends a "formal submit" command to all participants. The participants complete the formal submission, release all resources, and then respond to "done", and the coordinator collects the "complete" response of each service and ends the transaction.

B. If a participant responds with "Reject commit", then the coordinator sends a "rollback" to all participants, frees all resources, and then responds with "rollback complete", and the coordinator collects the "rollback" of each service application to return, canceling the overall distributed transaction.

The following figure is an example of a success and failure of phase two:

The two-phase commit protocol solves the problem of strong data consistency in the Distributed System/MicroServices architecture, and its principle is simple, but the shortcomings are as follows:

A. Single point of issue: the role of the coordinator throughout the two phase is very important, and once the node that deploys the Coordinator component service is unavailable for downtime, it can affect the normal operation of the entire distributed system.

B. Synchronous blocking: During the two-phase commit process, all service participants need to obey the coordinator's unified scheduling, and the period is blocked, which will affect the efficiency of the whole system to some extent.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.