Cache and database consistency: cache penetration, cache avalanche, key rebuild scheme

Source: Internet
Author: User
Tags mutex redis redis cluster

Cache penetration refers to querying a nonexistent data, the cache layer and the storage layer are not hit, but for fault-tolerant considerations, if the data from the storage layer is not written to the cache layer, 11-3 shows the entire process is divided into the following 3 steps:

    1. Cache layer not hit
    2. Storage layer is not hit, so empty results are not written back to the cache
    3. return empty result

cache penetration will cause non-existent data to be queried every time the request is made to the storage layer, losing the meaning of cache protection back-end storage.

Figure-1: Cache penetration Model
Cache penetration issues can increase the back-end storage load, because many back-end storage does not have high concurrency and can even cause back-end storage to be down. You can usually count the total number of calls in the program, the number of cache layer hits, the number of storage tier hits, and if you find a large number of storage tier empty hits, there may be a cache penetration problem.
There are two basic cache penetration. First, the business itself code or data problems, second, some malicious attacks, reptiles and other causes a large number of empty hits, let's look at how to solve the cache penetration problem.

Second, the solution of buffer penetration 1) Cache empty Objects

As shown, when the 2nd-step storage layer is not hit, the empty object is still persisted to the cache layer, and the data is then accessed from the cache, protecting the backend data source.

There are two problems with caching empty objects:
First, the null value is cached, which means that more keys are stored in the cache layer and more memory space is needed (if the attack is more serious), the more effective way is to set a shorter expiration time for such data to be automatically rejected.
Second, the data for the cache layer and the storage layer will be inconsistent for a period of time and may have some impact on the business. For example, if the expiration time is set to 5 minutes, if the storage tier adds this data at this time, there will be inconsistencies between the cache layer and the storage tier data, and the empty objects in the cache layer can be purged with the message system or otherwise.
The following is an implementation pseudo-code that caches an empty object:

2) Bron filter blocker

As shown, before accessing the cache layer and the storage layer, the existing key is saved with a filter of the filters in advance to make the first layer interception.

For example: A personalized referral system has 400 million user IDs, each hour the algorithm engineer will be based on each user's previous history behavior made personalization into the storage layer, but the latest user because there is no historical behavior, there will be cache penetration behavior, for this can be all have personalized recommendation data users made Bron filter. If the Bron filter considers that the user ID does not exist, then the storage tier is not accessed, protecting the storage layer to some extent.
Development Tips:
For Bron filter Knowledge, refer to: Bloom filter (Bron filter) concept and principle

You can use Redis's Bitmaps to implement a fabric filter, and GitHub has a similar solution for open source, which readers can refer to:

Using a filter to solve penetration problems
This approach is suitable for applications where data hits are low, data is relatively fixed in real-time (usually a large dataset), and code maintenance is more complex, but the cache space is less expensive.

comparison of two alternatives

Here are two solutions to the problem of cache penetration (in fact the problem is an open question, there are many workarounds), and the following table analyzes the two scenarios from the applicable scenarios and maintenance costs in two ways.
Cache null object and Bron filter scheme comparison

Third, cache avalanche problem optimization

From can be very clear what is a cache avalanche: because the cache layer carries a large number of requests, effectively protect the storage layer, but if the cache layer for some reason the overall failure to provide services, so all requests will reach the storage layer, the storage layer calls will be increased, causing the storage layer will be hung off. The English original intent of the cache avalanche is stampeding herd (the fleeing bison), which means that after the cache layer is down, traffic will be stored back as if it were a bison.

Avalanche caused by unavailability of cache layer
To prevent and solve the cache avalanche problem, you can start with the following three aspects.
1) Ensure high availability of the cache layer service.
As with the aircraft, there are multiple engines, and if the cache layer is designed to be highly available, even individual nodes, individual machines, and even a room outage can still be serviced, such as the previously described Redis Sentinel and Redis Cluster are highly available.
2) Rely on the isolation component for back-end current throttling and demotion.
Both the cache layer and the storage layer will have the probability of error, and they can be treated as resources. As a system with large concurrency, if one resource is not available, it may cause the thread to hang on this resource, causing the whole system to be unusable. Downgrading is very normal in high concurrency systems: for example, in referral services, if the personalized referral service is not available, you can downgrade the supplemental hotspot data without causing the front page to open the skylight.
In real-world projects, we need to isolate important resources, such as Redis, MySQL, Hbase, and external interfaces, so that each resource runs independently of its own thread pool, even if there are problems with individual resources and no impact on other services. But how the thread pool is managed, such as how to shut down resource pools, open resource pools, and manage resource pool thresholds, is still quite complex, and it is recommended that a Java dependency isolation tool, Hystrix (Https://, is shown here.
Hystrix is a powerful tool to resolve dependency isolation, but this content is beyond the scope of this book and applies only to Java applications, so this is not covered in detail here.

3) Walkthrough in advance. Before the project goes live, after the cache layer is down, the application and the backend load, and the possible problems, on this basis to do some set of plans.

Four, cache hotspot key reconstruction optimization

Developers use the cache + expiration policy to both speed up data read and write, and to ensure regular updates of the data, a pattern that basically satisfies most of the requirements. But there are two problems that can be fatal to the application if they occur at the same time:

    1. The current key is a hotspot key (such as a popular entertainment news), and the concurrency is very large.
    2. Rebuilding the cache cannot be done in a short time, and can be a complex calculation, such as complex SQL, multiple IO, multiple dependencies, and so on.

In the instant the cache fails, there are a number of threads to rebuild the cache (such as), resulting in increased back-end load, and may even cause the application to crash.

A large number of threads rebuild the cache after the hotspot key fails
To solve this problem is not very complex, but not to solve this problem to bring more trouble to the system, so need to set the following goals:

  1. Reduce the number of rebuild caches
  2. Data as consistent as possible
  3. Less potential danger
    1) Mutual exclusion lock (mutex key)
    This method only allows one thread to rebuild the cache, and the other thread waits for the thread to rebuild the cache to finish, retrieving the data from the cache, the entire process:

    Rebuilding a cache with mutex locks
    The following code uses the Redis setnx command to achieve this functionality.

    (1) Get data from Redis, if the value is not empty, return the value directly, otherwise execute (2.1) and (2.2).
    (2) If the set (NX and ex) results are true, indicating that no other thread is rebuilding the cache at this time, then the cache build logic is executed by the front-end process.
    (2.2) If the result of SETNX (NX and ex) is false, it means that there are already other threads doing the build cache, and then when the thread breaks the specified time (for example, 50 milliseconds, depending on the speed of the build cache), the function is re-executed until the data is fetched.
    2) never expire
    "Never Expires" contains two levels of meaning:
    From the cache level, there is really no expiration time, so there will be no issue after the hotspot key expires, that is, the "physical" period.
    At a functional level, a logical expiration time is set for each value, and a separate thread is used to build the cache when it finds that it exceeds the logical expiration time.
    The entire process is as follows:

    "Never expire" policy
    From the actual combat, this method effectively eliminate the problem of hot key, but the only thing that is not enough is to reconstruct the cache, there will be inconsistent data, depending on the application to tolerate this inconsistency. The following code simulates using Redis:

    As a large concurrent application, there are three goals when using the cache: first, speed up user access and improve user experience. Second, reduce the back-end load, reduce the potential risk, ensure the system is stable. Third, ensure that the data "as far as possible" update. The above two solutions are analyzed in the following three dimensions.
    Mutex (Mutex key): This idea is relatively simple, but there are some hidden dangers, if the building cache process problems or long time, there may be deadlock and thread pool congestion risk, but this method can better reduce the back-end storage load and consistency of the better.
    never expire: This scenario does not actually have a series of hazards due to hot key creation because there is no real expiration time, but there is a case of inconsistent data, and the complexity of the code increases.
    Two workarounds are shown in the following table.
    Solutions to two hot key keys

Reprinted from: Highly Available architecture

Cache and database consistency: cache penetration, cache avalanche, key rebuild scheme

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.