Analysis of distributed database and cache double write consistency scheme

Source: Internet
Author: User
Tags message queue redis

Why did the introduction write this article?

First, the cache has been widely used in projects due to its high concurrency and high performance characteristics. In terms of reading the cache, there is no doubt that all follow the process to do business.

But in terms of updating the cache, whether to update the cache or delete the cache. Or the first to delete the cache, and then update the database, in fact, there is a great controversy. At present, there is no comprehensive blog, the analysis of these options. So the blogger was trembling, the risk of being sprayed by everyone, wrote this article.

Article structure

This article consists of the following three parts
1, explain the cache update policy
2, the analysis of the shortcomings of each strategy
3, for the shortcomings of the Improvement Program

Body

To make a note, in theory, setting the expiration time for the cache is a solution that guarantees eventual consistency. In this scenario, we can set the expiration time on the cached data, all the writes are based on the database, and the cache operation is just the best effort. That is, if the database writes successfully and the cache update fails, the subsequent read request will naturally read the new value from the database and backfill the cache whenever the expiration time is reached. Therefore, the ideas that are discussed next do not depend on the scenario of setting the expiration time for the cache.
Here, we discuss three kinds of update strategies:

    1. Update the database first and then update the cache
    2. Delete the cache first and then update the database
    3. Update the database first, and then delete the cache

No one should ask me why I did not update the cache before updating the database policy.

(1) Update the database first and then update the cache

This package is widely opposed by all. Why is it? There are two reasons for this.
Reason one (thread-safe angle)
At the same time request a and request B for the update operation, there will be
(1) Thread A updates the database
(2) thread B updated the database
(3) thread B updates the cache
(4) Thread A updates the cache
This occurs when the request a update cache should be older than the request B update cache, but because of network and other reasons, B has updated the cache earlier than a. This results in dirty data and is therefore not considered.
Reason two (business scenario angle)
There are two points as follows:
(1) If you are a write database scene is more, and read the data scene than less business requirements, the adoption of such a scheme will lead to, the data is not read at all, the cache is frequently updated, wasting performance.
(2) If you write the value of the database, it is not written directly to the cache, but rather through a series of complex computations to write to the cache. Then, every time the database is written, the value of the write cache is computed again, which is undoubtedly a waste of performance. Obviously, deleting the cache is more appropriate.

The next discussion is the most controversial, first delete the cache, and then update the database. Update the database first, and then delete the cached question.

(2) Delete the cache first and then update the database

The reason that the scheme causes inconsistencies is that. At the same time there is a request for a update operation, and another request B for the query operation. Then the following scenario occurs:
(1) Request A for a write operation, delete the cache
(2) Request B Query Discovery cache does not exist
(3) Request B to go to the database query to get the old value
(4) Request B to write the old value to the cache
(5) Request A to write the new value to the database
This can lead to inconsistent situations. Also, if you do not set an expiration time policy for the cache, the data will always be dirty data.
So, how to solve it? using delayed double deletion strategy
The pseudo code is as follows

public void write(String key,Object data){ redis.delKey(key); db.updateData(data); Thread.sleep(1000); redis.delKey(key); }

The translation into Chinese description is
(1) Retire Cache first
(2) write the database again (these two steps are the same as the original)
(3) Hibernate for 1 seconds and retire cache again
In doing so, you can delete the cache dirty data that is caused by 1 seconds.
So, how about this 1 seconds how to determine, specifically how long to sleep?
For the above scenario, readers should evaluate the time-consuming logic of their own project's read data business. Then the sleep time of the written data is based on the time-consuming of reading the data business logic, plus hundreds of Ms. The purpose of this is to ensure that the read request ends, and that the write request can delete the cached dirty data caused by the read request.
What if you use MySQL's read-Write separation architecture?
OK, in this case, the reasons for inconsistent data are as follows: Two requests, one for update operation, and another for query operation.
(1) Request A for a write operation, delete the cache
(2) Request A to write the data to the database,
(3) Request B query cache discovery, cache no value
(4) Request B to query from the library, at this time, the master-slave synchronization has not been completed, so the query to the old value
(5) Request B to write the old value to the cache
(6) The database completes master-slave synchronization and changes from library to new value
The above scenario is the reason for inconsistent data. Or use a double-deletion delay strategy. Only, the sleep time is modified to the master-slave synchronization on the basis of the delay time, plus hundreds of Ms.
with this synchronous elimination strategy, what about throughput reduction?
OK, then delete the second time as asynchronous. Start a thread yourself and delete it asynchronously. In this way, the written request does not have to sleep for a period of time, and then return. Do this to increase throughput.
Second Delete, what if the deletion fails?
This is a very good question, because the second deletion fails, and the following happens. There are still two requests, one for the update operation, and another for the query operation, for convenience, the assumption is a library:
(1) Request A for a write operation, delete the cache
(2) Request B Query Discovery cache does not exist
(3) Request B to go to the database query to get the old value
(4) Request B to write the old value to the cache
(5) Request A to write the new value to the database
(6) Request a attempts to delete the request B write to the cache value, the result failed.
OK, that means. If the cache fails for the second time, the cache and database inconsistencies will occur again.
How to solve it?
Specific solutions, and see bloggers on the Update Strategy (3) analysis.

(3) Update the database first, then delete the cache

First of all, say it first. The foreigner proposed a cache update routine, called "cache-aside pattern". Among them is that

    • invalidation : The application takes data from the cache, does not get it, then takes the data from the database, succeeds, and puts it into the cache.
    • hit : The application takes data from the cache and returns it.
    • Update : Save the data to the database first, and then the cache will be invalidated after the success.

In addition, Facebook, a well-known social networking site, also suggested in its paper "Scaling Memcache at Facebook" that they used to update the database before deleting the cached policy.
is there a concurrency problem with this situation?
No. Assuming that there will be two requests, a request a to do the query operation, a request B to do the update operation, then there will be the following circumstances generated
(1) Cache just fails
(2) Request A to query the database, get an old value
(3) Request B to write the new value to the database
(4) Request B to delete the cache
(5) Request A to write the old value that is found to the cache
OK, if this happens, dirty data will actually occur.
However, what is the probability of such a situation?
A congenital condition occurs when the write database operation of Step (3) takes less time than the read database operation of step (2), and it is possible to make the step (4) Precede step (5). However, you think, the database read operation speed is far faster than the write operation (otherwise read and write separation of why, do read and write separation meaning is because the reading operation is relatively fast, the cost source is less), so step (3) time is shorter than step (2), this situation is difficult to appear.
Suppose, some people have to contradicting, have obsessive-compulsive disorder, must solve how to do?
How to solve the above concurrency problem?
First, setting a valid time for the cache is a scenario. Secondly, the asynchronous delay deletion strategy given in strategy (2) is used to ensure that the read request is completed and then deleted.
is there anything else that causes inconsistency?
Yes, this is also the cache update policy (2) and the Cache update policy (3) is a problem, if the deletion of the cache failed to do, it is not inconsistent with the situation arises. such as a write data request, and then write to the database, the deletion of the cache failed, there will be inconsistencies. This is also the last question left in the cache update strategy (2).
How to solve ?
Provide a guaranteed retry mechanism, here are two sets of scenarios.
programme I
As shown

The process is as follows
(1) Updating database data;
(2) cache failed because of various problems
(3) Send the key that needs to be deleted to the message queue
(4) Consume the message yourself and get the key you need to delete
(5) Continue to retry the delete operation until successful
However, the solution has a drawback, causing a lot of intrusion into line-of-business code. So with scenario two, in scenario two, start a subscription program to subscribe to the database Binlog to get the data that needs to be manipulated. In the application, another program, get the information from this subscription program, to delete the cache operation.
Scenario Two

The process is as follows:
(1) Updating database data
(2) The database writes operation information to the Binlog log
(3) The subscription program extracts the required data and key
(4) Another non-business code to obtain this information
(5) Try to delete the cache operation and discover that the deletion failed
(6) Send this information to the message queue
(7) Re-obtain the data from the message queue and retry the operation.

Note: The above subscription binlog program in MySQL has ready-made middleware called canal, you can complete the subscription binlog log function. As for Oracle, bloggers currently do not know if there are any ready-made middleware to use. In addition, the retry mechanism, the blogger is the way to use the message queue. If the consistency requirements are not very high, directly in the program to another thread, every time to try again, these people can be flexible and free to play, just provide a way of thinking.

Summarize

In fact, this article is a summary of the existing consistency scheme in the Internet. For the first to delete the cache, then update the database update strategy, there are plans to maintain a memory queue way, Bo Master looked at a moment, feel that the implementation of the complex, not necessary, so there is no need to give in the text. Finally, I hope you have something to gain.

Reference documents

1, master-slave db and cache consistency
2. Cache Update Routines

Analysis of distributed database and cache double write consistency scheme

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.