There are some solutions for cache penetration and cache avalanche on the web, nothing more than:
1. If the query data is NULL, NULL is cached
2. Using the Fabric filter
First of all, the concept of cache penetration: We use the cache in the project is usually the app first check whether the cache exists, if there is a direct return to the cache content, if it does not exist directly query the database and then cache the query results returned. This time if we query a certain data in the cache has not existed, it will cause each request to query the DB, so that the cache is meaningless, when the traffic is large, the db may be hung off.
Let's talk about our business scenario:
1. Usually we are the homepage, or statistics page, the user request more, home page into the system will be loaded, statistics page (for some quasi-real-time statistical results) query SQL or results more complex.
2. Timeliness general, basically hour level
3. Large amount of data, usually billion or tens
4. Business logic is more complex and may require the association of various tables
5. If there are too many requests, it is possible for the database to crash, even after the sub-database is performed, it is possible to consume a large part of the database IO and CPU resources
6. The dimensions of the statistics are many, and the dimensions requested by each user may be different.
In response to the above situation, we generally do is to add a layer of cache, request to go to the cache first, you can use memcached or Redis, if the cache does not exist or the cache fails, then go to load DB. Most of the time this is very good, but one day if you need to restart the cache, or slow down a large part of the failure at some point, this will cause the cache to penetrate what we said earlier.
OK, let's take a look at the optimization of our cache penetration: Let's start with an architecture diagram to explain
1. A more business-statistical dimension or scenario, create a table in JSON format as a template
2. Through the scheduling platform, the task statistics are scheduled to be completed and saved to the Template table and the cache cluster
3. Continuously poll for 2 to keep the data hot
4. The user requests come, first accesses our cache, once the cache fails or restarts, obtains the newest heat data directly from the Database Template table and caches, so we can effectively alleviate the database pressure.
5. This is also a cache warm-up scenario
Technical Exchange: 534368042
Solutions for business-level cache penetration