Traditional applications use local transactions and distributed transactions to ensure data consistency, but in a microservices architecture the data is private and needs to be accessed through the API provided by the service, so the distributed transaction no longer applies to the MicroServices architecture. So how does the MicroServices architecture guarantee data consistency? This article is about to talk about this topic.
Traditional distributed transactions are not the best choice for data consistency in microservices
The principle of final consistency of data should be met in MicroServices architecture
Three modes of final consistency for microservices architectures
Reconciliation is the ultimate defense of the last
Traditional distributed transactions
Let's take a look at the first part, traditionally using local transactions and distributed transactions to ensure consistency.
Traditional stand-alone applications typically use a relational database, and the benefit is that the application can use acid. To ensure consistency we only need to: Start a transaction, change (INSERT, delete, update) A lot of rows, and then commit the transaction (roll back the transaction if there is an exception). Further, with data access technologies and frameworks in the development platform (such as Spring), we need to do fewer things and focus on the changes in the data itself.
As the size of the organization expands and the volume of business continues to grow, single-machine applications and databases are not enough to support large volumes of traffic and data, and when applications and databases are split, there is an application that requires simultaneous access to two or more two databases. We started with distributed transactions to ensure consistency, which is what we often call the two-phase commit protocol (2PC).
Local and distributed transactions are now very mature and informative and are not discussed here. Let's talk about why distributed transactions do not apply to microservices architectures.
First, for a microservices architecture, data access becomes more complex because the data is private to the microservices, and the only way to access it is through the API. This way of packaging data access makes microservices loosely coupled, independent of each other and easier to scale performance.
Second, different microservices often use different databases. Applications produce a variety of different types of data, and relational databases are not necessarily the best choice.
For example, an application that produces and queries a string uses a Elasticsearch character search engine, and an application that generates social image data can take a graph database, such as neo4j.
MicroServices-based applications typically use a combination of SQL and NoSQL patterns. However, most of these non-relational data do not support 2PC.
It can be seen that distributed transactions are not available in the microservices architecture.
Final consistency principle
Based on the CAP theory, a choice must be made between availability (availability) and consistency (consistency). If you choose to provide consistency, you need to pay the price of blocking other concurrent accesses before meeting consistency. This may persist for an indeterminate period of time, especially if the system has shown a high latency or a network failure has caused a loss of connectivity.
Based on the current success experience, usability is generally a better choice, but maintaining data consistency between services and databases is a fundamental requirement, and the microservices architecture should choose to meet eventual consistency.
Final consistency means that all copies of the data in the system can eventually reach a consistent state after a certain amount of time has elapsed.
Of course, the final consistency is chosen to ensure that the final period of time is within the user's acceptable range. So how do we achieve eventual consistency?
From the nature of consistency, it is to ensure that the services included in a business logic either succeed or fail. So how do we choose the direction? To ensure success or to guarantee failure.
We say that the business model determines our choices. There are three modes of achieving final consistency: Reliable event mode, business compensation mode, and TCC mode.
Reliable Event Mode
A reliable event pattern is an event-driven architecture that publishes an event to the message agent when something important happens, such as updating a business entity. The message agent pushes events to the microservices that subscribe to the events, and when the microservices that subscribe to them receive this event, they can either complete their business or cause more events to be published.
1. If the order service creates an order to be paid, a "create order" event is issued.
2. Payment service consumption "Create order" event, after payment is completed, a "payment complete" event is issued.
3. Order service Consumption "payment complete" event, the order status is updated to be out of the library.
Thus, the completed business process is realized. But it's not a perfect process.
This process can lead to inconsistencies in the fact that a micro-service publishes an event after it has updated the business entity and fails, although the MicroServices publishing event succeeds, but the message agent fails to properly push the event to the subscription's microservices; the microservices that accept the event repeatedly consume the event.
Reliable event mode is to ensure reliable event delivery and avoid recurring consumption, and reliable event delivery is defined as:
Each service is atomicity of business operations and publishing events.
The message agent ensures that the event is delivered at least once.
Avoiding repeated consumption requires the service to be idempotent, such as payment services cannot be paid multiple times because of recurring events.
Because the popular message queue now implements the persistence of events and the delivery mode of at least once, "the message agent ensures that events are delivered at least once" has been met and does not unfold today.
The following content is mainly discussed from reliable event delivery and the realization of idempotent, we first look at reliable event delivery.
First, let's look at a code snippet of implementation, which is intercepted from a production system.
Based on the above code and comments, there are 3 possible scenarios at the beginning of the review:
The operation database was successful, and the post event to the message agent was successful.
The operations database failed and the event is not posted to the message agent.
The operation of the database succeeds, but the event is posted to the message agent fails, an exception is thrown out, and the operation of the update database that was just executed is rolled back.
From the above analysis of the situation, there seems to be no problem. But careful analysis is not difficult to find fault, in the above process there is a hidden time window.
When microservices a delivers events, it is possible that the message agent has handled the success, but the response is returned with a network exception that causes the append operation to throw an exception. The end result is that the event is posted and the database is rolled back.
A outage of microservices A will also cause database operations to be rolled back as the connection closes unexpectedly, after the delivery has completed to the database commit operation. The end result is that the event is delivered and the database is rolled back. This implementation often runs for a long time without having problems, but once it does, it's hard to find the problem.
Here are two ways to implement reliable event delivery.
1. Local Event table
Local event table methods store events and business data in the same database, use an additional "event recovery" service to recover events, and ensure that the atomicity of business and release events is guaranteed by local transactions. Given that the event recovery may have a certain delay, the service can immediately post an event to the message agent after the local transaction has completed.
MicroServices record business data and events in the same local transaction.
The Micro Service publishes an event immediately notifies the associated business service if the event is published successfully and immediately deletes the logged event.
The event recovery service periodically recovers unpublished successful events from the event table, republish, and republish successfully before deleting logged events.
The operation of article Ⅱ is mainly to increase the real-time of the release event, and the third guarantee event must be released.
Local event Table mode the business system and the event system are tightly coupled, and additional event database operations can add additional stress to the database and may become a bottleneck.
2. External Event table
The External event table method persists events to the external event system, which is required to provide a real-time event service to accept the MicroServices publishing event, while the event system also needs to provide an event recovery service to confirm and recover events.
The business service sends events to the event system through the real-time event service before the transaction commits, and the event system only logs events that are not actually sent.
After the business service is committed, the event system is confirmed to the event system via the real-time event service, and after the event is confirmed, the events are actually released to the message agent.
The business service cancels events to the event system through real-time events when the business is rolled back.
What if the business service stopped the service before sending an acknowledgement or cancellation? Event recovery Services for the event system periodically find unacknowledged sent events to the Business Service query State, depending on the status returned by the business service to determine whether the event is to be published or canceled.
This way, the business system and the event system are decoupled independently and can be scaled independently. However, this approach requires an additional send operation and requires the Publisher to provide an additional query interface.
After the introduction of reliable event delivery and then the realization of idempotent, some events themselves are idempotent, some events are not.
Events of a idempotent nature need to consider the order of execution
If the event itself describes a fixed value at a point in time (such as an account balance of 100) instead of describing a conversion instruction (such as a balance increase of 10), then the event is idempotent.
We need to be aware that the number and sequence of events that may occur are unpredictable and require the sequential execution of idempotent events, otherwise the result is often not what we want.
If we receive two events, (1) The account balance is updated to 100 and (2) The account balance is updated to 120.
1. Micro-Service received event (1)
2. Micro-service received event (2)
3. Micro-service receives the event again (1)
Obviously the result is wrong, so we need to ensure that the event (2) cannot be processed once the event (1) is executed, otherwise the account balance is still not the result we want.
To guarantee the order of events a simple practice is to add a timestamp to the event, the MicroServices record the timestamp of each type of event last processed, and if the timestamp of the received event is earlier than we recorded, discard the event. If the event is not issued on the same server, time synchronization between servers is a challenge, and it is more prudent to replace the timestamp with a global incrementing sequence number.
For operations that do not have idempotent, the main idea is to store execution results for each event, and when an event is received we need to query whether the event has been executed based on the ID of the event, or if it has been executed directly to return the last execution result, the execution event is dispatched.
In this thought we need to consider repeating an event and querying the cost of storing the results.
Recurring events with small overhead
If the cost of repeating an event is small, or if only very few events are expected to be received repeatedly, you can choose to process the event repeatedly, throwing a uniqueness constraint exception from the database when persisting the event data.
Duplicate processing Overhead Events Use event store to filter for recurring events
If the overhead of repeating an event is much higher than the cost of an additional query, a filtering service is used to filter the recurring events, and the filtering service uses the event store to store the events and results that have already been processed.
When an event is received, the filtering service first queries the event store, determines whether the event has been processed, returns the stored results directly if the event has been processed, or dispatches the business service to perform processing and stores the processed results in the event store.
In general, the above method can run very well, if our microservices are RPC class services we need to be more careful, the problem may arise is that (1) The filtering service after the business process is completed before the event results are stored in the event store, but before the business process is completed, it is possible to receive duplicate events, so The RPC service also cannot rely on database uniqueness constraints; (2) The processing result of the business service may appear in the position state, usually occurs when the normal submission request but does not receive the response.
For the problem (1) You can follow the steps to record the event-handling process, such as the process of recording events for events such as "Receive", "Send Request", "Receive answer", "process complete". The advantage is that the filter service can detect duplicate events in a timely manner, and can also be processed differently depending on the event status.
For the problem (2) An additional query request can be used to determine the actual processing state of the event, noting that additional queries lead to longer delays, and perhaps some RPC services do not provide a query interface at all. At this time can only choose to receive temporary inconsistencies, when the use of reconciliation and manual access to ensure consistency.
Compensation mode
To describe the convenience, here are two concepts defined:
Business Exceptions: Business logic generates errors, such as insufficient account balances, insufficient inventory of commodities, etc.
Technical exceptions: Exceptions generated by non-business logic, such as network connectivity anomalies, network timeouts, and so on.
Compensation mode uses an additional coordination service to coordinate each microservices that needs to be consistent, and the coordination service calls each microservices sequentially, and if a microservices invocation exception (including business exceptions and technical exceptions) cancels all previously invoked microservices that were successful.
Compensation patterns are recommended only for situations where a business exception cannot be avoided, and if it is possible to optimize the business model, avoid requiring compensation transactions. Business anomalies such as insufficient account balances can be avoided by pre-freezing amounts, which can require the merchant to prepare additional inventory, etc.
We use an example to illustrate the compensation model, a travel company that provides booking itineraries, and can book airline tickets, train tickets, hotels, etc. in advance through the company's website.
Suppose a customer is planning a trip that:
Shanghai-Beijing June 19 9 o'clock XXX flight.
A 3-night stay at XXX Hotel.