Double write consistency of cache database

Source: Internet
Author: User
Tags message queue delete cache

Introduction

In a project that introduces a cache system, we need the old data for the update operation, we will first retire the cache, and then update the database. Update the database first, and then retire the cache. or update the database, and then update the cache? Below, we will talk about some ideas about the pros and cons of these three options.

Purpose
    1. To tidy up their knowledge of this area;
    2. Share your views and learn with your friends;
    3. Use beginner's angle to explain the content of this aspect.

Cache Update Policy
    1. Update the database first, and then update the cache;
    2. Update the database first, and then delete the cache;
    3. Delete the cache first, and then update the database;

Update the database first and then update the cache

This package, the small part thinks most of the scene is not suitable. Why is it? Mainly from the following several reasons to explain:

I. Waste of resources

In some large-scale information sites (blog, paste), we introduced the cache is mainly hot data (request frequent) cache, and at this time, if many users of cold data (long time no one access, or very little access) to update, and then to update the cache, This results in a large amount of wasted cache resources (because of the low number of accesses, resulting in these cache hits and a waste of cache resources).

second, dirty data

This is due to concurrent operations, such as: There are two requests A and B to update the data, due to network reasons, the following conditions may exist:

    1. Request a updates the database;
    2. Request B updated the database;
    3. Request B updated the cache;
    4. Request a updates the cache.

This is the case of a data coverage of the B data, the resulting dirty data, if there is no cache timing expiration mechanism, the dirty data at this time to wait for the next update, the cache will be updated, although the user sees the data problems, will be updated again, but this has been an unnecessary request, When the volume of write requests is large, it is easy to create many unnecessary update requests.

Three, request time

If the cache is not a simple data cache, but requires a more complex operation in order to get the cached value, then the request will be calculated on the cache value, a portion of time, which leads to the response time of the request is longer, increase the burden of the system, reduce the system's processing power.

Iv. Frequent writes

In a lot of write requests, and read requests very few scenarios, the cache does not play much role, it is frequently updated, resulting in resource waste, such as:

    1. A modification to data A was made, and the cache a was generated;
    2. There is no request to read data A at this time;
    3. A modification was made to data a, and the cache a was updated;
    4. There is no request to read data A at this time;
    5. A modification was made to data A;
    6. A request to read data A is now available.

This will cause the cache does not have to update the operation (no one to read the cache), when the user is large, it will cause a large number of unnecessary operations, resulting in a waste of system resources.

Fit Scene

Of course, this is not to say that you cannot use these, since there is a reason for their existence. The scene is suitable for:

    1. Read requests occupy 99% of the total traffic on the website;
    2. The amount of data on the website is small (hundreds of thousands of of the article data);
    3. Rarely will update the data (the general article is written well, will not be updated).

Case
    1. Personal blog
    2. Handbook website (w3cschool, novice tutorials, etc.)

Update the database first, and then delete the cache

This strategy compares to multi-platform use, such as: Facebook. But there are some problems with this strategy, such as:

first, dirty data

The causes of dirty data are mainly caused by concurrency, such as:

    1. User A requests data a
    2. Data a cache invalidation
    3. User A gets the old data data from the database A
    4. User B has updated data A (new data)
    5. User B deleted the cache
    6. User A will check that the old data is written to the cache

Dirty data is generated at this point, although the probability is very small, but for the site is not frequently updated, the dirty data at this time is a very serious error.

second, cache deletion failed
    1. User A has updated data a
    2. User A failed to delete cache for data a
    3. User B reads old data that is cached by data a

At this point, the problem of inconsistent data arises.

Solution Solutions1. Set the cache's effective time (the simplest scenario)

Advantages:

    • Simple
    • Easy to operate

Disadvantages:

    • There will be old data in a short period of time
    • If the amount of data is too large, the cache time is short, the cache is prone to a large number of failures during a period of time, the database pressure suddenly increased, triggering a cache avalanche phenomenon (cache validity time is a random value to reduce the likelihood of cache avalanche)

2, Message Queuing (more complex, need to introduce a Message Queuing system)

Steps:

    1. Update the database;
    2. Failed to delete cache;
    3. Send the key that needs to be deleted to the message queue;
    4. Partition time pull the key to delete from the message queue;
    5. Continue to delete until successful.

Advantages:

    • Does not trigger a cache avalanche
    • Delete only the cache that needs to be deleted

Disadvantages:

    • Introduction of the message system (increased complexity of the system)

Delete the cache first and then update the database

This approach is also used by more people, but there are also problems with dirty data:

cause
    1. User A failed to delete cache
    2. User A successfully updated the data

Or

    1. User a deletes the cache;
    2. User B reads the cache and the cache does not exist;
    3. User B Gets the old data from the database;
    4. User B updated the cache;
    5. User a updated the data.

Both of these conditions can result in dirty data.

Solution Solutions1. Set the cache's effective time (the simplest scenario)

Advantages:

    • Simple
    • Easy to operate

Disadvantages:

    • There will be old data in a short period of time
    • If the amount of data is too large, the cache time is short, the cache is prone to a large number of failures during a period of time, the database pressure suddenly increased, triggering a cache avalanche phenomenon (cache validity time is a random value to reduce the likelihood of cache avalanche)

2. Message Queuing
    1. First eliminate the cache;
    2. Update the database;
    3. Send the cache key that needs to be retired to the message queue;
    4. Another program pulls data from the message queue;
    5. Delete the key you want to delete until you delete it.

Advantages:

    • Guaranteed deletion of the cache
    • does not increase processing time for updates
    • Does not trigger a cache avalanche

Disadvantages:

    • Will add a cache miss (negligible)
    • Introduction of the message system (increased complexity of the system)

Summary
    1. The above methods according to their own system business to choose;
    2. Since it exists, there is a reason for it;
    3. When using cache time, you need to pay attention to the cache avalanche problem;
    4. The message system can be introduced to avoid dirty data;
    5. There is also analysis Binlog to delete the cache asynchronously (small series did not study this, so the small part did not write);

Double write consistency of cache database

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.