Cloud computing Design Pattern translation: Cache-aside pattern

Source: Internet
Author: User

This handbook is actually old already, just recently too lazy has not moved--, hope to insist on the main content of translation finished. In the process of translation will be added some of their own views, if there is inaccurate or wrong place to welcome the various suggestions to point out ~.


Simply load the data from the persistent storage into the cache. This mode improves system performance and ensures consistency of data in cache and underlying storage. PS: The following will use "storage space" to represent the data store, in the actual project can represent the database, files, etc. persisted to the storage container of the hard disk.

Context and problem

Caches are often used in traditional applications to optimize scenarios that require frequent, repeated access to data in storage. In general, however, it is difficult to achieve full consistency between the data in the cache and the data in the underlying storage. The program must implement a strategy that not only allows the data in the cache to be synchronized with the data in the memory as much as possible, but also when the data in the cache becomes stale (out of sync) and can be handled correctly.

Solution

Many commercially available cache systems provide penetration-read (read-through) and write-through (Write-through)/write-behind (http://www.infoq.com/cn/articles/ write-behind-caching/ This article write-behind a very thorough story). In these systems, the application retrieves/reads the data through caching, and if the required data does not exist in the cache, the cache extracts the data from the storage and adds it to the cache in a way that is transparent to the application. In addition, any changes to the data will be made in the cache, and the cache will automatically write these changes to memory.

For caches that do not provide these features, the application must maintain its own data in the cache:

Data read:

The application can achieve the same effect by implementing the Read-through policy, which is very effective when it is necessary to load data into the cache, and Figure 1 summarizes the steps of this process:


1. Check that the data you want to read is in the cache.

2. If the data is not in the cache, the data is read directly from the storage space and the data is returned to the application.

3. Save this data to the cache as well.

Data write:

If your application needs to write/update data, you can implement Write-through by following these steps:

1. Modify the data in the storage space first.

2. If there is data in the cache, set the data to invalid (discard). If this part of the data is used later, the cache will again load the data from the storage space.

Issues and considerations

When using this pattern, the following points need to be noted:

1. The life cycle of the cached data. Many caches implement a data failure rule that is removed from the cache when the data is not used within a specified time period. In practice, such rules must be well designed based on the nature of the application's access to the data to make the cache work best in the system. For example, this failure time cannot be set too short, otherwise it will cause the application to frequently access the storage space, load the data from the storage space, and insert the data into the cache. The corresponding should not set the expiration time too long, otherwise it will cause the cache to retain many non-hot data, resulting in a waste of storage space. The cache has the greatest effect when it is stored in a constant and frequently used data.

2. Erase (discard) the cached data. The vast majority of cache capacity is much smaller than storage space, so the cache can only hold a relatively limited amount of data, so when necessary, the cache will be the data cleanup action. Most caches implement the least-recently-used strategy to select the data that needs to be purged (that is, the least amount of data that has been used in the most recent time is cleaned up), but this is usually also customizable, by configuring the cache's global data time-out and other configuration items, And the time-out for specifying a single piece of data can maximize the efficiency of the cache. In addition, the use of a global cache data purge policy is sometimes not very appropriate, for example, some data overload is very expensive, then some relatively high frequency of use, but the overload of low-cost data, this data is more necessary to keep in the cache.

3. Data consistency. implementing this pattern does not guarantee consistency between the data in the cache and the data in the storage space. The data in the storage space is most likely to be modified at any time by other processes, where the data in the cache is not synchronized, unless there is an action to reload the data from the storage space. This problem can be particularly severe if the system frequently synchronizes data between different storage spaces.

4. Local cache. for an application, the cache can be implemented locally or directly based on memory (this is highlighted in the source because of the existence of a distributed cache (i.e., non-native cache) in the cluster or a hierarchical tiered storage (memory-ssd-hard disk) system. The local cache referred to here should refer only to the caches within the application, such as in the same JVM, not including the different processes but in the same physical node, where the Cache-aside model achieves the best results for the frequently read business. However, local memory is private, and different application instances (processes) retain their own private caches, which can easily result in inconsistent data, so it is necessary to set the data to a shorter expiration time, thus increasing the frequency of synchronizing data from the storage space. For this scenario, you can refer to the implementation principle of distributed cache.


PS: recently, and colleagues have discussed related topics, caching this thing with Storm, spark the rise of the role not only more and more important, play the role of a positive from a subsidiary to become the protagonist, is an important point in the overall system design. As with the mature distributed cache of Memcache and Redis, the emergence of tachyon along with Spark seems to have opened up a new path. I personally appreciate the tachyon based on the two main features of RAMDisk and lineage. However lineage this strategy has its merits, but there are many scenarios that are not very suitable. Instead, I'm pretty optimistic about the role RAMDisk plays in distributed caching. All in all, we look forward to the progress and development of the community in caching this piece.


Cloud computing Design Pattern translation: Cache-aside pattern

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.