In the development of Internet projects, the application of caching is very common, and the cache can help the page to increase loading speed and reduce the load of server or data source.
1. Why do I need a cache?
In the project, the most performance-consuming place is the back-end service database. and the database read and write frequency is often uneven distribution, most of the situation is read more write less, and read operation (select) there will be some complex judgment conditions, such as like, group, join and so on, these syntax is very consumption performance, all will appear a lot of slow query, So the database is easy to encounter bottlenecks in the reading operation.
Then through the front of the database, the front of a cache service, you can effectively absorb the uneven request to withstand the traffic peaks.
In addition, if the application and the data source is not the same server, there will be a lot of network consumption, will also have a great impact on the response speed of the application, if the current application of data real-time requirements are not so strong, in the application side with the cache can be quickly improve efficiency.
2. What problems do you encounter with caching?
While caching can improve overall performance, it may also cause other problems. For example, after using the cache, it is equivalent to storing 2 copies of the data, one in the database and the other in the cache. When there is new data to be written or old data need to be updated, if we update only one of the data source, then the data on both sides is inconsistent, so there is a cache of data and database data how to make effective and fast synchronization problem, can guarantee the final consistency of the data.
In addition, the caching service introduces the complexity of the system architecture, as there are additional concerns about caching itself:
Expiration time for cache issue:
The expiration time of the design cache needs to be very tricky and must be combined with the business realities. Because the design's expiration time is too short, it can result in poor caching, and also cause frequent data writes from the database to the cache. If the cache design expires too long, it can lead to a waste of memory.
Cache Hit-ratio issues:
It is also important to design what data is stored in the cache and, if poorly designed, may result in a cache hit rate being too low to lose the cache effect. Generally for hot data, to ensure that the hit rate of 70% or more of the best results.
Cache penetration/Avalanche issues:
means that if the cache service is down or lost, then it is possible that all traffic is directly hitting the back-end database, which can cause a ripple effect, and the sudden spike in requests is likely to cause the database to fail to load.
3. What are the specific update policies for the cache?
Typical cache modes are generally as follows:
Cache aside
Read/write Through
Write Behind
Each model has different characteristics, adapts to different project scenarios, below to take a look at:
Cache aside mode
This is a kind of strategy pattern that people often use. The main flow of this model is as follows:
When the application queries the data, it reads the data from the buffer cache, and if it does not, then reads the data from the database and then obtains the data from the database and puts the data into the cache caches.
If an app wants to update a data, it is the first to update the data in the database, and after the update is complete, the data in the cached cache is invalidated by the instruction.
Why not let the update work after the database has been written, and then go to the cached cache data also modified it?
Mainly because of this, there are 2 write-action events, fearing that in the case of concurrency will lead to dirty data, for example:
If there are 2 requests at the same time, request A and request B, and execute concurrently. Request A is going to read the data, and request B is going to update the data. There is no data in the initial state cache, when the request a read to the data, ready to write back, at the moment, the request B just to update the data, after updating the database, and then to the cache update, that request a and then write to the cache is the old data, is dirty data.
So there is no dirty data problem with the Cache aside mode? No, in extreme cases, dirty data can also be generated, such as:
If there are 2 requests at the same time, request A and request B, and execute concurrently. Request A is going to read the data, and request B is going to write the data. If the initial state cache does not have this data, the request a found in the cache does not have data, will go to the database to read the data, read the data prepared to write back to the cache, at this time, the request B is to write the data, request B after writing the database data, and then set the cache invalidation. At this time, request a because in the database read the previous old data, began to write the data in the cache, at this time write entered is also the old data. The result is that the data in the cache is inconsistent with the data in the database, resulting in dirty data.
But this probability is much smaller than one of the probabilities above. So overall, the Cache aside mode is a relatively simple and practical way.
Read/write Through Mode
This pattern is essentially the cache service as the primary storage, all the application read and write requests are directly related to the cache service, regardless of the last-end database, the database is maintained and updated by the cache service. However, when the data changes in the cache are synchronized to update the database, in the application's eyes only the cache service.
The process is fairly straightforward:
The probability of dirty data in this mode is relatively low, but it is strongly dependent on the cache, the stability of the caching service is very high, and the initial state empty data problem is added when the new cache node is increased.
Write Behind Mode
This pattern is a variant of the Read/write Through pattern. The difference is that the Read/write Through mode cache writes the database synchronously, while the write Behind mode cache operation database is asynchronous.
The process is as follows:
This mode is characterized by fast speed, very high efficiency, but the consistency of the data is poor, there may be data loss, the implementation of logic is more complex.
The above is the current three kinds of mainstream cache update strategy, in addition to the Refrsh-ahead mode and so on because the use is not very common is not described in detail.
Caching is a very common Internet project to improve the efficiency of the program, the use of more, but also more critical, we can communicate together.
This article was originally published in the public number "more than thinking", welcome attention, Exchange Internet cognition, project management, Big Data, Web, blockchain technology.