First, cache penetration
Our use of caching in projects is usually the app that checks for the existence of the cache, and if there is a direct return of cached content, query the database directly if it does not exist and then cache the query results back. This time if we query a certain data in the cache does not exist, it will cause every request to query db, so that the cache is meaningless, in the flow of large, may be the db is dead.
This problem is often encountered, just did not cause enough attention, in my thought, if you encounter such a problem can be in the encapsulated cache set and get part of the addition of steps, if the query a key does not exist, the key for the prefix set a logo key; First query identification key, if the identification key exists, return a good agreement falsh or null value, and then the app do the appropriate processing, so that the cache layer will not be penetrated. Of course, this validation key will not expire for too long.
Second, cache concurrency
Sometimes if the site concurrent access high, a cache if the failure, there may be multiple processes at the same time query db, while setting the cache, if the concurrency is indeed very large, this may also cause DB pressure is too large, there are frequent updates to the cache problem.
My idea now is to add a lock to the cached query in app, if key does not exist, lock it, then check DB into the cache, then unlock it, and wait for the other process to find a lock, then return the data after the lock is unlocked, or go to the DB query.
Third, Cache invalidation
The main cause of this problem is high concurrency, usually we set a cache expiration time, there may be some will be set for 5 minutes ah, 10 minutes of these, high concurrency may be at a certain time generated a lot of cache, and the same expiration time, this time can trigger a when the expiration time , these caches fail at the same time, and requests to be forwarded to DB,DB may be too stressful.
Some time ago I also just saw the relevant articles on the Internet, one of the simple scenarios cited is that cache expiration time is dispersed, for example, we can add a random value based on the original expiration time, such as 1-5 minutes of random, so that each cache expiration rate of recurrence will be reduced, It is difficult to trigger events that fail collectively.
Second, the third question is almost the same, mainly when the second problem for the same cache, the third problem for many cache