Go Memcached Application Summary

Source: Internet
Author: User
Tags failover memcached object serialization serialization

Abstract memcached is a high-performance distributed cache system, with its simple and convenient operation, stable and reliable performance is widely used in Internet applications, this article, I mainly to summarize some common application scenarios and solutions.

Catalogue [-]

    • 1. Cached Storage Design
    • 2. Cache Update Policy
    • 3. Bulk delete (or update) issues
    • 4. Problems with failover and expansion
    • 5. Some minor details related to optimization

Memcached is a high-performance distributed cache system, with its simple and convenient operation, stable and reliable performance is widely used in Internet applications, online about memcached introduction of a lot of information, the most classic data is "memcached comprehensive analysis" This document, Original link: http://gihyo.jp/dev/feature/01/memcached/0001, Chinese translation online many: http://tech.idv2.com/2008/08/17/memcached-pdf/, This document is very well written and easy to read. I'm going to summarize some of the common application scenarios and solutions.

1. Cached Storage Design

According to the different scenarios, the following two designs are generally available:

    • Scenario One: the database SQL query results cached to memcached, read the data when the first read from memcached, blocking the database query request.

      优点: We can do some unified cache processing on the development framework, be transparent to business development, and reduce the intrusion of business logic code. In this case the cache is also more convenient to preheat, we can use the database storage log (Eg:mysql Binlog) to warm up the cache.

      缺点: One of the pitfalls of this approach is that if the front-end request involves multiple SQL query results, the memcached needs to fetch multiple data, and the overhead of the network IO and the concurrency pressure of memcached can be a bottleneck in high concurrency.

    • Scenario Two: The final result of the business process is cached, the client can directly return the results of the cache when requested.

      优点: Can quickly return data, only one time memcache can be, reduce the network IO consumption and processing consumption.

      缺点: Caching needs to be handled explicitly in business logic, and the data structure stored is more complex, and it can be cumbersome to regenerate the cache when we have data updates. This scenario is more suitable for computationally intensive high concurrency scenarios.

2. Cache Update Policy

Two common scenarios, each with its pros and cons and application scenarios:

    • Scenario One: lazy loading, the client first query memcached, if hit, return results, if not hit (no data or expired), then load the latest data from the database, and write back to memcached, and finally return the results.

      优点: Easy to use, simple;

      缺点: If the cache fails in high concurrency, it will cause instantaneous pressure on the backend database. Of course, we can use the Riga lock to control concurrency, but it also affects the application.

    • Scenario Two: the active update policy, the data in the cache will never expire, when there is data update, by a separate program to update the cache.

      优点: The cache data is always reliable (without LRU), the front end can respond quickly, and the backend database does not have the pressure of concurrent queries.

      缺点: The structure of the program becomes complex and requires the maintenance of a separate program to complete the update, and the two programs share a set of cache configurations. (PS: In fact, there are some business scenarios, such as the content of the portal site system and the site system will need to share a piece of data, a responsible for writing data, a display of data)

3. Bulk delete (or update) issues

In memcached, most of our operations are based on a single key add/set/del/get operation, it is very convenient to use, but, sometimes we will encounter the bulk delete (or update) problem. For example, a mobile app application because of the sensitive content, the network regulatory authorities to delete all the information related to this content, this time because the phone model, version, this content in the cache key is various. We can't easily get all the keys, or we can enumerate all the keys, but memcached does not support the bulk delete operation, which is troublesome, how to solve this problem? Below I use a portal site to delete sensitive news For example, we assume that every news has a lot of dimensions of content, news to NewSID logo, each dimension with prop to old acquaintances, plus a generic prefix, so that the complete key should be the format: Key{NewSID} {prop}

  • Programme one:

    A single set (set) is used to maintain a class of keys. When you need to delete (or update) in bulk, you only need to remove all the keys in this collection to do the appropriate operation. This is relatively straightforward:

    First, when we add a new k,v to the memcached, we put a key in that set, such as a piece of news in memcached with the following pairs:

    key_{newsid}_{prop1}:value1key_{newsid}_{prop2}:value2key_{newsid}_{prop3}:value3……key_{newsid}_{propn}:valuen

    In our collection, we're going to store all the keys associated with this piece of news:

    keyset_{newsid}:key_{newsid}_{prop1},key_{newsid}_{prop2},……,key_{newsid}_{propn})

    In this way, when we want to clear the cache of this news, we can take out the collection of this key, and then traverse these keys, to memcached inside delete, so that the purpose of bulk deletion.

    Here, the key set we mentioned is exactly how to store and maintain it?

    One way to do this is to memcached all keys with commas into a large string to form the value of keyset or to organize the data into memcached with a set structure (set) provided by the development language.

    Another way is to save this key with a more convenient storage structure, such as the set structure of Redis, which, of course, is not recommended and will bring complexity to the existing system.

  • Scenario Two:

    By dynamically updating the way the key is implemented, this way is to each key in the original key on the basis of a version number to compose, when the need for bulk deletion or update only upgrade version number can be, specifically how to do?

    First, we maintain a version number for this news in memcached, so that:

    key_version_{newsid}:v1.0 (版本号可以用时间戳或其它任何有意义的内容代替)  // 伪代码$memcacheClient->setVersion(key_version{newsid}, "v1.0");

    Then, when we want to save or read the news related data, we first take out this version number to generate a new key, as follows:

    //伪代码$version = getVersion(key_version_{newsid});$key = "key_{newsid}_{prop}_" + $version;

    Then use this new key to save (or read) the real content, so that the one that is related to this news in memcached is the following:

    key_{newsid}_{prop1}_v1.0:value1  key_{newsid}_{prop2}_v1.0:value2key_{newsid}_{prop3}_v1.0:value3……key_{newsid}_{propn}_v1.0:valuen

    When we need to delete (or update) all key related to this news, we only need to upgrade the version number, as follows:

    //伪代码$memcacheClient->updateVersion(key_version_{newsid},"v2.0");

    In this case, when we next visit the cache of this news, because the version number is upgraded, all content under the new key is empty, the new content needs to be loaded from the database, or the result will be returned empty. The old key can be recycled after the expiration time. This achieves the purpose of our bulk delete or update.

The above mentioned two scenarios are actually relatively simple and practical, of course, there are shortcomings, program one of the key set maintenance needs additional consumption, the old version of the scheme two data can not be cleaned up in time, resulting in cache garbage. We have a flexible choice in the actual application scenario, and there is really no difference between the two in effect.

4. Problems with failover and expansion

Memcached It is not a distributed system, strictly speaking a single point system, so-called distributed only by the client to achieve. So it doesn't have the high availability of those open-source distributed systems, so let's talk about how memcached avoids single points of failure and the problem of online expansion. (Ps:memcached do really province, the biggest feature is simple, a lot of auxiliary functions to rely on the client to achieve).

    • consistent hash : Well, this should be the simplest and most common mechanism, relying on the characteristics of consistent hashing, node failure or expansion plus node when the impact on the cluster is small, basically can meet most of the application scenario. But note: In the initial period of the node adjustment, there will be some cache loss, penetrating to the back-end database, in high concurrency applications, to do the concurrency control, so as not to pressure the database.
    • Two -write mechanism : The client maintains two clusters, each update data at the same time update two copies, read random (or fixed) read a copy, in this case, the availability and stability of the cluster is very high, can be painless change, node failure or expansion of the cache and back-end database have no impact. Of course, there is a price to do this: one is the consistency of the two data, but for the cache, this very few inconsistencies can be tolerated, the other is the memory waste problem, the redundant data to reduce the failure rate, the price is very large, not suitable for large-scale Internet applications.
    • Twemproxy: This is a Twitter open source agent, can give Redis and memcached agent, with this thing can reduce a lot of maintenance costs (mainly the client). It is also convenient for failover and online expansion. For details, refer to: Https://github.com/twitter/twemproxy
5. Some minor details related to optimization
  • Bulk Read (multiget): Some more complex business requests may request multiple memcached operations at a time, where the consumption of the network round trip and the concurrency pressure imposed on the memcached node are still quite considerable. In this case, we can consider bulk reading to reduce the number of network IO round trips, return the data at one time, and reduce the client's business processing logic.

    Here's a famous Multiget bottomless problem, found in Facebook applications, please refer to: http://highscalability.com/blog/2009/10/26/ Facebooks-memcached-multiget-hole-more-machines-more-capacit.html, this article has proposed a solution. But in fact, we can also consider the Multiget key distribution to a node, to avoid this problem, so that you need to customize the Memcache client, according to certain rules (such as: the same prefix) to the same node to distribute a class of key, to avoid this problem, This also improves performance without having to wait for data between multiple nodes.

  • change Serialization Mode : Object serialization without Java (haha, I'm only here for Java), to serialize itself, to serialize the object to be cached into a byte array or string to save. This has a good effect on both memory savings and network transmission.

  • data preheating : In some scenarios we need to warm up the cache data for the application (for example, node expansion needs to redistribute the data), in the previous mention of the cache design, you can use the database's update log to warm up the cache, which is mainly dependent on the content of the cache is consistent with the database storage. In other cases, we can consider the existing cache in front of a block of empty content of the cluster node, the old cache gradually read to the new cache cluster to achieve the purpose of data preheating, this is a bit of trouble, need to apply the end of the match.

  • growth factor : rational adjustment of memcached growth factor, can effectively control the waste of memory.

  • the processing of empty results : Some scenarios in our database do not find data, the cache is also empty, this time need to store a short time in the cache of empty results to block the front-end of the frequent requests to avoid the pressure on the database.

The use of memcached is very simple, performance is also very good, these are our actual business development will encounter some scenarios, according to the actual scenario to choose the right solution, can give a lot of convenience for future development and maintenance.

Original: http://my.oschina.net/u/142836/blog/171196

Go Memcached Application Summary

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.