Go: Distributed transactions: Just a choice between consistency, throughput, and complexity

Last Update:2016-09-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

from:http://mp.weixin.qq.com/s?__biz=mza5nzc4ota1mw==&mid=2659598134&idx=1&sn= F5f73354d162a7561b3d73c204a4d1f5&scene=0#wechat_redirect

It's a rip-off topic, and I've been through so much about the need for distributed transactions: "There's no simple solution, like using database transactions, to solve the problem of distributed data consistency." In particular, the micro-service architecture is popular today, a transaction needs to cross multiple "services", multiple databases to achieve, traditional technical means, has been unable to respond to and meet the micro-service situation of these complex scenarios. For the micro-service transaction business how to ensure data consistency, this article as far as possible theory combined with practice, we in the actual products used in the implementation of the distributed transaction mechanism , and we grilled a steak, hope to help you.

When it comes to distributed transactions, you have to take the cap out and talk about it ... and, of course, base ...

From an architectural standpoint, business splitting (data partitioning), data consistency, performance (availability) is always an art of balance:

In the micro-service architecture, in order to achieve higher performance and flexibility, the business application is split into multiple, transaction across multiple microservices orchestration, data consistency problem arises;
In order to solve the problem of data consistency, it is necessary to use different transaction mechanisms to guarantee the performance (usability) problem.

In the computer world, in order to solve one thing, another problem will follow, from another level to prove that it architecture is always a balance of art.

The core idea of "BASE" is to make the system achieve final consistency according to the business characteristics (eventual consistency); In the Internet domain, it is often necessary to sacrifice strong consistency in exchange for high availability of the system, only to ensure the "final consistency" of the data , but this final time needs to be acceptable to the user, but in the field of financial related transactions, there is still a need to adopt a strong consistency to ensure the accuracy and reliability of transactions.

Next, we introduce the common transaction processing modes in the industry, including two-phase commit, three-phase commit, sagas long transaction, compensation mode, reliable event mode (local event table, external event table), reliable event mode (non-transactional message, transaction message), TCC , etc. Different transactional models support different data consistency. If readers are familiar with these kinds of distributed transactions, they can directly reference and choose the appropriate transaction model in combination with their own business requirements.

Two-phase commit, three-phase commit

This distributed transaction solution is now more mature on a variety of technology platforms: JTA transactions under the Java EE Architecture (each application server provides implementations, except Tomcat).

However, the current two-phase submission, three-phase submission has the following limitations and is not suitable for use under the MicroServices Architecture system:

All the operations must be transactional resources (such as databases, message queues, EJB components, etc.), the use of limitations (most of the use of the Micro-Service Architecture HTTP protocol), more suitable for traditional monomer applications;
Because of strong consistency, resources need to wait within the transaction, the performance impact is large, the throughput rate is not high, not suitable for high concurrency and high performance business scenarios;

Sagas long Business

In the sagas transaction model, a long transaction consists of a set of child transactions that are pre-defined execution order and their corresponding set of compensating child transactions. A typical complete transaction consists of T1, T2 、......、 tn, and other business activities, each of which can be local or remote, and all business activities under sagas transactions are either all successful or all rolled back, and there is no intermediate state.

The implementation mechanism of the Sagas transaction model:

Each business activity is an atomic operation;
Each business activity is provided with positive and negative actions;
Any one of the business activities error, in accordance with the execution of the reverse order, real-time execution of the counter-operation, transaction rollback;
In the case of rollback failure, it is necessary to record the pending transaction log and retry by retrying the policy.
Rushed to retry the still failed scenario, provide a timed flush server, the rollback failed to run the business timing punch;
Timed punch is still failing the business, waiting for manual intervention;

Sagas long transaction model support for high data consistency requirements of the scene comparison is applicable , due to the use of compensation mechanism, each atomic operation is to perform the task first, to avoid the long-time resource lock, to achieve real-time release of resources, performance is relatively guaranteed.

Sagas long Transaction Mode if the business to achieve, complexity and difficulty coexist. In our actual use, a set of frameworks supporting the sagas transaction model was developed to support the rapid delivery of the business.

developer : The business only needs to trade orchestration, each atom operation provides the positive and negative trade;

Configurator : You can set the transaction rollback policy for the exception type (which exceptions include transaction management, which exceptions are not included in transaction management), whether the pipelining of each atomic operation is persisted (for different performance can support caching, DB, and extending other persistence modes) , and the Punch option configuration (retry count, timeout, whether real-time punch, timing punch, etc.);

Sagas Transaction Framework : Provide transaction security mechanism, responsible for atomic operation of the flow of the ground, atomic operation of the order of execution, to provide real-time punch, timing punch, transaction interceptors and other basic capabilities;

the core of the sagas framework is ibusinessactivity, iatomicaction. Ibusinessactivity completes the atomic activity of enlist (), delist (), prepare (), commit (), rollback (), and so on, the main completion of the state context, positive and negative operations.

Limited to the length of the article, this article does not detail the implementation of the detailed; later find time can be described in detail based on the sagas long transaction model concrete implementation framework. Sagas long transactions require transactions to provide anti-operation, support transactional strong consistency, because there is no lock-in resources throughout the transaction cycle, the performance impact is small, suitable for high data requirements for use in the scene.

Compensation mode

Sagas long transaction model is essentially a complex implementation of compensation mechanism, if the actual business scenario does not require complex sagas transaction framework support, a simple compensation pattern can be implemented in the business. The compensation process also often needs to achieve eventual consistency, requiring that the cancellation service be called at least once and the cancellation service must achieve idempotent. Compensation mode can be found in colleague Tian Xiangyang's technical article "Data consistency assurance under MicroServices Architecture (iii)" Http://dwz.cn/3TVJaB

Compensation mechanisms are not recommended in complex scenarios (requiring multiple transaction orchestration), the advantage is very easy to provide rollback, and dependent on the service is very small, compared with the sagas long transaction, it is easier to use, the disadvantage is that the code is large, high coupling, corresponding to the inability to provide anti-operation of the transaction is not suitable.

Reliable Event mode (local event table, field event table)

A reliable event pattern is an event-driven architecture that publishes an event to the message agent when something important happens, such as updating a business entity. The message agent pushes events to the microservices that subscribe to the events, and when the microservices that subscribe to them receive this event, they can either complete their business or cause more events to be published.

The reliable event pattern is to ensure reliable event delivery and avoid recurring consumption, defined by event delivery as:

Each service atomicity of business operations and release events;
The message agent ensures that the event is delivered at least once, and avoids repeated consumption requirements for the service to be idempotent.

Based on the event pattern, it is important to consider the reliable arrival of events, in the actual support of our products, there are usually local events table, external event table two modes:

1. The Local Event table method saves the event and business data in the same database, uses an additional event recovery service to recover the event, and guarantees the atomicity of the update business and release events by the local transaction. Given that the event recovery may have a certain delay, the service can immediately post an event to the message agent after the local transaction has completed.

MicroServices record business data and events in the same local transaction;
Micro-Service publishes an event immediately notifies the associated business service if the event is published and immediately deletes the logged event;
The event recovery service periodically recovers unpublished events from the event table, republish, and republish successfully before deleting logged events;

2nd of the operation is mainly to increase the real-time release events, by the third guarantee event must be published. Local event Table mode the business system and the event system are tightly coupled, and additional event database operations can add additional stress to the database and may become a bottleneck.

2. The External event table method persists the event to an external event system, which is required to provide a real-time event service to accept the MicroServices publishing event, while the event system also needs to provide an event recovery service to confirm and recover events.

The business service sends an event to the event system via the real-time event service before the transaction commits, and the event system only logs events that are not actually sent;
After the service is submitted, the event system is confirmed to the event system through the real-time event service, and after the event is confirmed, the incident is actually released to the message agent;
Business service cancels events to the event system through real-time events when the business rolls back;
What if the business service stopped the service before sending an acknowledgement or cancellation? Event recovery Services for the event system periodically find unacknowledged sent events to the Business Service query State, depending on the status returned by the business service to determine whether the event is to be published or canceled;

This way, the business system and the event system are decoupled independently and can be scaled independently. However, this approach requires an additional send operation and requires the Publisher to provide an additional query interface.

The transaction assurance mode based on reliable events can be implemented in many variants, such as the low reliability of the message, and the way the local table can be cached. To improve the efficiency of message delivery , you can combine multiple messages into a delivery mode. In order to provide a strong consistency of transaction security , even the local message table can be persisted (to ensure that the delivery method messages landed reliably) + Remote message table persistence (to ensure that the receiver message reliable landing) combined mode.

The Distributed transaction solution for business and process in our process products adopts multiple message merge Post + local cache + Remote message table persistence mode , and then we will introduce the specific usage.

Usage Scenarios

In real business projects, the pattern of distributed deployment of business and process is usually adopted, and the business system accesses the process engine through the remote interface, and the business data and the process data are stored in the respective database.

In this scenario, if the business system's process operations and business operations cross together, when the process operation succeeds and the business operation fails, the business rolls back, and the process is created on the engine side, resulting in inconsistent state of the business system and process engine.

In a business application, the process operations in a transaction are local cache + bulk delivery + Remote Landing mode (if you need to ensure message reliability on the client, the local cache can be changed to the cost of the surface); at the process engine end after the message delivery, the message table is done to ensure reliable execution. In our process products, the process engine provides the client with a unified Distributed transaction API that operates just like traditional local transactions, ensuring transparency and simplifying the complexity of developers. The Distributed Transaction API supports two protocol modes:

HTTP + binary Serialization mode
WebService mode

Later we will add the RESTful style to the interface.

The reliable event mode has the large scale application in the Internet company, this way suits the business scene to be very extensive, moreover can achieve the final consistency of the data, the disadvantage is that this mode realizes the difficulty, the Reliance database realizes the reliability, may have the performance bottleneck in the high concurrency scenario, A standard set of reliable event frameworks is needed at the corporate level to support it.

Reliable Event mode (non-transactional messages, transactional messages)

The event notification of a reliable event pattern can be implemented in the form of a message, and its implementation principle is consistent with the local event table and the external event table, which is not detailed in this article. The current common message framework ACTIVEMQ, RabbitMQ, Kafka, ROCKETMQ can be used as a channel for message delivery. Note: Kafka is usually not suitable because the Kafka design has a scenario where the message is dropped.

Currently on the market support business information products less, ROCKETMQ although the realization of a reliable business model, but there is no open source, no open source, no open source, by the way, there are too many domestic open source to improve the space (the key point is not open source, open source, there is no continuous investment).

TCC Mode

A full TCC business consists of a primary business service and several from the business services, the primary business service initiates and completes the entire business activity, and the TCC mode requires three interfaces from the service: Try, Confirm, Cancel.

Try: Complete All business checks
Reservation must business resources
Confirm: Real Business execution
Do not conduct any business checks; Use only business resources reserved in the try phase; Confirm Operation satisfies idempotent;
Cancel:
Releases the business resources reserved in the try phase, and the cancel operation satisfies the Idempotent property;

The entire TCC business is divided into two phases:

First stage: The primary business service invokes all the try operations from the business separately and enlists all from the business service in the activity manager. When all try operations from the business service are invoked successfully or a try operation from the business service fails, enter the second phase.

Second stage: The Activity Manager performs a confirm or cancel operation based on the results of the first stage execution. If all try operations in the first phase are successful, the activity manager invokes all confirm operations from the business activity. Otherwise, all cancel operations from the business service are called.

A detailed description of the TCC mode can be found in colleague Tian Xiangyang's technical article, "Data consistency assurance under MicroServices Architecture (iii)" Http://dwz.cn/3TVJaB.

It is important to note that the second phase of the confirm or cancel operation itself is also a process that satisfies the final consistency, and may cause the call to fail for some reason (such as a network) when calling confirm or cancel, so the activity management is required to support the retry capability. This also requires confirm and cancel operations to be idempotent.

Summarize

Six kinds of distributed transaction implementation patterns from data consistency, transaction level, throughput, implementation of the complexity of each have advantages and disadvantages, for everyone to provide a choice basis.

At the point of view of architecture design, business factors need to be taken into account for data consistency, which helps the team to make a more technically reasonable choice. Based on the specific business scenario, it is more advantageous to make an architectural tradeoff by assessing the priority of the business to the transaction. We often contact the securities, finance, payment and other industries, the data consistency requirements are very high, need strict real-time guarantee requirements; But for social-based scenarios, local real-time consistency and eventual global consistency can be adopted. Therefore, in the process of practice, we must combine the technology with the business, choose the appropriate technical solution for their own business.

Author Introduction

Liu Xiang, from the Pu Yuan, ten years of IT industry experience, focus on enterprise software platform. Have a certain understanding in SOA, distributed computing, Enterprise architecture design and other fields. There is a book "Springbatch Batch Processing framework".

Go: Distributed transactions: Just a choice between consistency, throughput, and complexity

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Go: Distributed transactions: Just a choice between consistency, throughput, and complexity

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Go: Distributed transactions: Just a choice between consistency, throughput, and complexity

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support