"58 Shen Jian Architecture Series" Master-Slave db and cache consistency

Last Update:2018-01-15 Source: Internet

Author: User

Tags serialization

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly discusses a few questions:

(1) Why database master-slave delay causes inconsistent cache data

(2) Optimization ideas and solutions

first, the origin of demand

In the previous article, "Cache architecture design details two or three" there is a small optimization point, when only the main library, through the "serialization" thinking can solve the cache and the data in the database inconsistent. The point that aroused heated discussion is that " under the master-slave synchronization, the read and write separated database schema, there is the possibility of dirty data into the cache, the serialization scheme no longer applies ", this is the topic to be discussed in this article.

Second, why the data will be inconsistent

Why do you read dirty data, there are several situations:

(1) In the case of a single database, the service layer of concurrent read and write, the cache and the operation of the database cross

Although there is only one db, it is possible to dirty the data into the cache in the odd sequence of anomalies described above:

1) Request A to initiate a write operation, the first step is to eliminate the cache, and then this request for various reasons at the service layer stuck (a lot of business logic calculations, such as calculated for 1 seconds), such as Step 1

2) Request B to initiate a read operation, read Cache,cache miss, such as Step 2

3) Request B to continue reading DB, read out a dirty data, and then dirty data into the cache, such as Step 3

4) Request a card a long time after finally write the database, write the latest data, such as Step 4

Although this situation is rare, it is theoretically present , and the latter initiated request B was completed in the middle of the earlier request a.

(2) Master-slave synchronization, read and write separation, reading from the library to the old data

When the database schema does a master-slave, read-write separation, more dirty data into the cache is the case:

1) Request A to initiate a write operation, the first step is to eliminate the cache, such as Step 1

2) Request A to write the database, write the latest data, such as Step 2

3) Request B to initiate a read operation, read Cache,cache miss, such as Step 3

4) Request B to continue reading DB, read from the library, when the master-slave synchronization is not completed, read out a dirty data, and then dirty data into the cache, such as Step 4

5) The master-slave synchronization of the final database is completed, such as Step 5

In this case , the timing of request A and request B is completely no problem, is the delay of the active synchronization (assuming a delay of 1 seconds) in the middle of Read requests read from the library to dirty data caused by the inconsistency.

So how to optimize it?

Three, inconsistent optimization ideas

Some students said, "that can first operate the database, and then eliminate the cache," This is not possible, in the "cache and database operations who first" article introduced.

The underlying cause of the inconsistency arises:

(1) In the case of a single library, the service layer may read the old data into the cache during the logical calculation of 1s

(2) master-slave library + read/write separation, in the 1s clock master-slave synchronization delay process, may read the old data into the cache

Since the old data is in that 1s gap into the cache, is it possible to write the request after the completion, then hibernate 1s, again to eliminate the cache , you can write this 1s of dirty data to be eliminated again?

The answer is yes.

The steps to write the request are upgraded from 2 steps to 3 steps:

(1) Retire Cache first

(2) write the database again (these two steps are the same as the original)

(3) Hibernate for 1 seconds and retire cache again

In this case, 1 seconds of dirty data, such as cache, will be eliminated again, but the problem is:

(1) All write requests are blocked for 1 seconds, greatly reducing the throughput of write requests, increasing the processing time, the business is not acceptable

Re-analysis, in fact, the second elimination of the cache is "in order to ensure cache consistency" and do the operation, rather than the "business requirements", so there is no need to wait, with an asynchronous timer, or using the message bus asynchronous to do this thing :

Write requests are upgraded from 2 steps to 2.5 steps:

(1) Retire Cache first

(2) write the database again (these two steps are the same as the original)

(2.5) no longer sleeps 1s, but sends a message to the message bus ESB, which can be returned immediately after it is sent.

In this case, the processing time of the write request has hardly increased, this method eliminated the cache two times, so it is called "cache Double Elimination" method. The cost of this approach is that the cache is incremented by 1 caches miss (the cost can be almost negligible).

Downstream, there is a consumer that asynchronously eliminates the cache, after receiving the message, asy-expire the cache after 1s. In this way, even if the 1s has dirty data into the cache, there is a chance to be eliminated again.

The above scenario has a disadvantage , need to write operations line to add a step, there is no solution to the line of business code without any intrusion , there is, this program in the "Chat redundant table data Consistency" also mentioned, Asynchronously phase out the cache by analyzing the Binlog under the line:

Line of business code will not need to move, add a line of Read Binlog asynchronous elimination module, read into the Binlog data, asynchronous elimination of the cache.

question: Why does the above always say 1s , this 1s How did you get here?

Answer: 1s is just an example, you need to set this value based on the amount of data and concurrency of the business, observing the master-slave synchronization delay. For example, the master-slave synchronization delay of 200ms, this asynchronous phase-out cache set to 258ms is OK.

Iv. Summary

When "abnormal timing" or "read from library" causes dirty data into the cache, it is possible to use the " cache Double Elimination " method of two asynchronous elimination to solve the problem of inconsistent data between cache and database, and implement at least three kinds of scenarios:

(1)timer Asynchronous Elimination (this article does not say, the essence is to start a thread specifically asynchronous two elimination cache)

(2) bus asynchronous elimination

(3) read Binlog asynchronous elimination

"The article is reproduced from the public number" architect's Road "

"58 Shen Jian Architecture Series" Master-Slave db and cache consistency

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More