Talk about the Cache

Source: Internet
Author: User

Title: Talk about the cache

Tags

    • Cache

Categories

    • Tech

Comments:true
Date:2018-06-18 22:00:00

Last year in the work of system performance optimization, spent a lot of effort to customize the caching scheme for the business, then feel perfect, but some days ago inadvertently chat about the cache found in some details are still not considered. Here's a summary of the issues you need to consider when doing the cache.

The outline is as follows:

    • Cache mode
    • Cache obsolescence
    • Cache breakdown
    • Cache penetration
    • Cache avalanche

Cache mode

The more common patterns are divided into two main categories: Cache-aside and Cache-as-sor. where Cache-as-sor (System of Record, which is the DB that stores data directly) also includes read-through, Write-through, Write-behind.

Cache-aside

Cache-aside is a more general-purpose caching model in which the process of reading data can be summarized as follows:

    1. Read the cache and return directly if the cache is present. If it does not exist, it executes 2
    2. Read the SoR and update the cache to return
      The code is as follows:
# 读 v1def get(key):    value = cache.get(key)    if value is None:      value = db.get(key)      cache.set(key, value)    return value

The process for writing numbers is:

    1. Write the SoR
    2. Write Cache
      The code is as follows:
# 写 v1def set(key, value):    db.set(key,  value)    cache.set(key, value)

Logic seems simple, but there are plenty of surprises if you're in a highly concurrent distributed scenario.

Cache-as-sor

In cache-aside mode, the maintenance logic of the cache is implemented and maintained by the business side, while the cache-as-sor is the logic of the cache on the storage side, that is, the DB + cache is transparent to the business caller as a whole, the business does not need to care about the implementation details, only GE T/set can be. Cache-as-sor mode is common with Read Through, write Through, write Behind.

    • Read Through: When the reading operation occurs, query the cache, if Miss, then the cache query the SoR and update, the next access to the cache can be directly accessed (that is, the implementation of Cacha-aside on the storage side)
    • Write Through: When a write operation occurs, the cache is queried, and if hit, the cache is updated and the SoR is updated by the cache model
    • Write Behind: When a write operation occurs, the SOR is not immediately updated, only the cache is updated and then immediately returned, and the Sor is updated asynchronously (eventually consistent)

Read/write Through mode is a good understanding, is the synchronization of the update cache and SoR, read the scene is also cache priority, miss after reading the SoR. The main meaning of this type of mode is to alleviate the pressure of the SoR in the context of the read operation and to improve the overall response speed, there is no optimization for the write operation, it is suitable for the scene with less read and write. Write Behind's cache and SoR updates are asynchronous and can be used to optimize write operations by using batch and merge at asynchronous times, thus improving the performance of writes.

The following two figure is a flowchart from the Wikipedia write Through and write Behind:


Write Through and write Behind

Summary

Many DB now have memory-based caches that can respond to requests more quickly, such as the high performance of Hbase's Cache,mongo in blocks, and partly relies on its large system memory cache. However, it is more obvious that the local cache will be more effective locally, eliminating the large amount of network I/O, which will greatly increase the processing delay of the system and reduce the pressure of downstream cache + db.

Cache obsolescence

Cache elimination is a relatively old topic, the usual cache strategy is just a few, such as FIFO, LFU, LRU. And LRU is the standard for the cache-culling strategy, of course, depending on the business scenario, other strategies may be more appropriate.

FIFO elimination strategy usually uses queue + Dict, after all, the queue is inherently FIFO, the new cache object is placed in the tail, and when the queue full when the team first object out of the team expires.

The core idea of LFU (Least frequently used) is that the least recently used data is first eliminated, that is, to count the number of times each object is used, and when it needs to be eliminated, choose the least-used elimination. Therefore, the LFU is usually implemented based on the minimum heap + Dict. Because the complexity of each change in the minimum heap is O (logn), the efficiency of the LFU algorithm is O (LOGN), which is slightly less efficient than FIFO, LRU O (1).

LRU (Least recently used), based on the principle of locality, that if the data is recently used, then it is very likely to be used in the future, conversely, if the data is not used for a long time, then the probability of future use is lower.

LRU expiration typically uses a double-ended list + Dict
Implementation (in the production environment using a linked list is generally doubly linked list), the most recently accessed data from the original location to the list header, so that the data at the beginning of the chain is recently used, and the end of the chain is the longest unused, in the time complexity of O (1) to find the data to be deleted.

# LRU 缓存过期概要逻辑, 无锁版data_dict = dict()link = DoubleLink() # 双端队列def get(key):    node = data_dict.get(key)     if node is not None:        link.MoveToFront(node)    return node    def add(key, value):    link.PushFront(Node(key,value))    if link.size()>max_size:        node = link.back()        del(data_dict[node.key])        link.remove_back()

Ps:

    1. Realization of Lru_cache in Py3 Functools
    2. Golang Implementing LRU Cache

Cache breakdown

In high concurrency scenarios (such as seconds), if a key fails at some time, but there is a large number of requests to access the key, these requests will fall directly downstream of the DB, that is 缓存击穿 (Cache penetration), the DB caused great pressure, it is likely to hit a wave of db Business hangs off.

In this case, the more general protection downstream method is to access downstream DB through a mutex, the thread/process that obtains the lock is responsible for reading the DB and updating the cache, while other acquire lock failed processes retry the entire get logic.

This logic is implemented in the Redis set method as follows:

# 读 v2r = redis.StrictRedis()def get(key, retry=3):    def _get(k):        value = cache.get(k)        if value is None:            if r.set(k,1,ex=1,nx=true): # 加锁                value = db.get(k)                cache.set(k, value)                return true, value            else:                return None, false        else:            return value, true    while retry:        value, flag = _get(key)        if flag == True:            return value        time.sleep(1) # 获取锁失败,sleep 后重新访问        retry -= 1    raise Exception("获取失败")

Cache penetration

When the data requested for access is a nonexistent data, this non-existent data is not written to the cache, so requests to access this data are landed directly down to the downstream DB, which can also pose a risk to downstream DB when the volume of such requests is large.

Workaround:

    1. Consider caching this data appropriately for a short period of time, caching the empty data as a special value.

    2. Another more rigorous approach is to use the Bloomfilter, Bloomfilter features in the detection of the existence of the key will not be false (Bloomfilter does not exist, must not exist), but there may be false positives (bloomfilter exist, may not exist). Within Hbase, you use Bloomfilter to quickly find rows that do not exist.

Preventive penetration based on bloomfilter:

# 读 v3r = redis.StrictRedis()def get(key, retry=3):    def _get(k):        value = cache.get(k)        if value is None:            if not Bloomfilter.get(k):                 # cache miss 时先查 Bloomfilter                # Bloomfilter 需要在 Db 写时同步事务更新                return None, true            if r.set(k,1,ex=1,nx=true):                value = db.get(k)                cache.set(k, value)                return true, value            else:                return None, false        else:            return value, true    while retry:        value, flag = _get(key)        if flag == True:            return value        time.sleep(1)        retry -= 1    raise Exception("获取失败")

Cache avalanche

When a large number of caches fail at the same time for some reason, such as simultaneous expiration, restart, and so on, a large number of requests are hit directly downstream of the service or DB, causing great pressure to crash down the outage, or avalanche.

For 同时过期 This scenario, it often occurs because of cold start or traffic bursts, resulting in a large amount of data write caches in a very short period of time, and they expire at the same time, so they expire in a similar amount of time.

Workaround:

    1. A simpler approach is to 随机过期 have the expiration time of each data set to expire + random .

    2. Another good solution is to make a level two cache, such as a set local_cache + redis of storage scenarios, or patterns, that were designed before caching redis + redis .

In addition, it is a reasonable downgrade scheme. In high concurrency scenarios, when too much concurrency is detected or the resource has been impacted, downstream resources are protected through a current-limiting downgrade to prevent the entire resource from being overwhelmed and the cache is gradually built during the throttling period, and the throttling is resumed when the cache is gradually restored and degraded.

Reference

Http://www.cs.utah.edu/~stutsman/cs6963/public/papers/memcached.pdf
Http://www.ehcache.org/documentation/3.5/caching-patterns.html
Https://docs.microsoft.com/en-us/azure/architecture/patterns/cache-aside
Https://coolshell.cn/articles/17416.html
Https://en.wikipedia.org/wiki/Cache_ (computing)
Https://docs.oracle.com/cd/E13924_01/coh.340/e13819/readthrough.htm
https://blog.csdn.net/zeb_perfect/article/details/54135506
http://blog.didispace.com/chengchao-huancun-zuijiazhaoshi/

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.