Cache consistency Issues
When data timeliness requirements are high, you need to ensure that the data in the cache is consistent with the database, and that the data in the cache nodes and replicas must be consistent, and that there is no difference. This is compared to relying on cache expiration and update policies. It is common to proactively update the data in the cache or remove the corresponding cache when the data changes.
Caching Concurrency issues
After the cache expires, it attempts to get data from the back-end database, which is a seemingly reasonable process. However, in high concurrency scenarios, it is possible for multiple requests to get data from the database concurrently, causing great impact on the backend database and even causing "avalanche" phenomena. In addition, when a cache key is being updated, it can also be fetched by a large number of requests, which can also lead to consistency issues. So how do you avoid similar problems? We will think of a "lock" mechanism, in the case of a cache update or expiration, the first attempt to acquire a lock, when the update or from the database to complete the release of the lock, the other requests only need to sacrifice a certain wait time, you can directly from the cache to continue to obtain data.
Cache penetration Issues
Cache penetration is also called "breakdown" in some places. Many friends understand that the cache penetration is caused by a large number of requests penetrating to the backend database server due to a cache failure or cache expiration, which can have a huge impact on the database.
This is actually a misunderstanding. The real cache penetration should look like this:
In high concurrency scenario, if a key is high concurrent access, not hit, for fault tolerance considerations, will try to get from the back-end database, resulting in a large number of requests to reach the database, and when the key corresponding to the data itself is empty case, This leads to a lot of unnecessary query operations in the database, resulting in tremendous impact and stress.
There are several common ways to avoid caching traditional issues:
- Cache Empty Objects
objects that have empty query results are also cached, and if they are collections, an empty collection (not NULL) can be cached, and if a single object is cached, it can be distinguished by the field identifier. This avoids requests penetrating to the back-end database. At the same time, it is necessary to ensure the timeliness of cached data. This approach is less expensive and more suitable for low-hit, but potentially frequently updated data.
- Individually filtered treatment
All keys that may be null for the corresponding data are stored uniformly and intercepted before the request, thus avoiding the request penetrating to the backend database. This approach is relatively complex, and is suitable for low-hit, but infrequently updated data.
Cache Bump Problem
Cache bumps, some places may be "cache jitter", which can be seen as a more minor failure than an "avalanche", but it can also have impact and performance impact on the system over time. This is typically caused by a cache node failure. The recommended practice in the industry is to solve it through a consistent hash algorithm. There is no more elaboration here, you can refer to other chapters
Cache Avalanche Phenomenon
A cache avalanche is a result of caching, which causes a large number of requests to reach the backend database, causing the database to crash, the entire system crashing, and the disaster occurring. There are many reasons for this, such as "cache Concurrency", "cache penetration", "cache bumps", and so on, which can lead to the cache avalanche phenomenon. These problems can also be exploited by malicious attackers. There is also a situation where, for example, a pre-loaded cache cycle in the system fails at a certain point in time, which can also cause an avalanche. To avoid this periodic failure, you can stagger the cache expiration by setting different expiration times, thus avoiding the cache set failure.
From the application architecture point of view, we can reduce the impact by means of limiting, downgrading, fusing and so on, or we can avoid this kind of disaster through multilevel cache.
In addition, from the perspective of the entire research and development system process, should strengthen the pressure test, as far as possible to simulate real scenes, early exposure problems to prevent.
Cache bottomless Phenomenon
The issue was put forward by Facebook staff, who had reached 3,000 memcached nodes and cached thousands of G content around the age of 2010.
They found a problem---memcached connection frequency, efficiency decreased, so add memcached node,
Added, found that because of the connection frequency caused by problems, still exist, and did not improve, called "bottomless pit phenomenon."
At present, the mainstream database, cache, Nosql, search middleware technology stack, all support "shard" technology to meet the "high-performance, high concurrency, high availability, scalability," and other requirements. Some are mapped to different instances by hash modulo (or consistent hash) on the client side, and some are mapped in a way that the client side passes the range value. Of course, some are on the service side. However, each operation may need to be done with different nodes for network communication, the more instance nodes, the greater the overhead, the greater the impact on performance.
Can be avoided and optimized mainly from the following aspects:
- How data is distributed
Some business data may be suitable for hash distribution, and some businesses are suitable for range distribution, which can avoid the overhead of network IO to a certain extent.
- IO optimization
Can make full use of the connection pool, NiO and other technologies to minimize the connection overhead, enhance the concurrent connection ability.
- How data is accessed
A one-time fetch of large data sets will result in a smaller network io cost for getting small datasets multiple times.
Of course, cache bottomless is not a common phenomenon. In the vast majority of companies may not meet at all.
Common problems with caching in high concurrency scenarios