[Add to favorites] Memcached cluster architecture Problems

Last Update:2018-12-05 Source: Internet

Author: User

Tags apc unix domain socket website performance

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I have collected Frequently Asked Questions about memcached.

How does memcached work?

What are the biggest advantages of memcached?

What are the advantages and disadvantages of memcached compared with MySQL query cache?

What are the advantages and disadvantages of memcached compared with local cache (such as APC and mmap files in PHP) on the server?

What is the cache mechanism of memcached?

How does memcached implement redundancy?

How does memcached handle fault tolerance?

How to import and export items in memcached in batches?

But I do need to dump all items in memcached and load the data to memcached. What should I do?

How does memcached perform authentication?

How to Use memcached multithreading? How to use them?

What is the maximum length of the key that memcached can accept? (250 bytes)

What are the expiration time limits of memcached on item? (Why is there a 30-day limit ?)

What is the maximum size of a single item that memcached can store? (1 M byte)

Why is the size of a single item limited to 1 MB byte?

To enable memcached to use the server's memory more effectively, can I configure a cache space of varying sizes on each server?

What is binary protocol? Is it worth noting?

How does memcached allocate memory? Why not use malloc/free !? Why slab?

Does memcached guarantee the atomicity of data storage?

Cluster architecture Problems

How does memcached work?

Memcached comes from two-stage hash ). Memcached is like a huge hash table that stores many <key, value> pairs. You can use keys to store or query any data.

The client can store data on multiple memcached instances. When querying data, the client first calculates the key hash value (phase 1 hash) based on the node list, and then selects a node. The client sends the request to the selected node, then, the memcached node uses an internal hash algorithm (Phase 2 hash) to find the real data (item ).

Assume that there are three clients: 1, 2, 3, and 3. memcached A, B, and C:
Client 1 wants to store the data "barbaz" with key "foo. Client 1 first calculates the hash value of key "foo" by referring to the node list (A, B, C). Suppose memcached B is selected. Then, Client 1 directly connects to memcached B and stores the data "barbaz" through key "foo. Client 2 uses the same Client library as Client 1 (meaning that the hash algorithm of Phase 1 is the same) and has the same memcached list (A, B, C ).
Therefore, after the same hash calculation (phase 1), Client 2 calculates the key "foo" on memcached B, and then it directly requests memcached B to obtain the data "barbaz ".

Various clients store data in memcached in different forms (perl Storable, php serialize, java hibernate, JSON, etc ). Some clients implement different hash algorithms. However, the behavior of the memcached server is always the same.

Finally, from the implementation perspective, memcached is a non-blocking, event-based server program. This architecture can effectively solve C10K problem and provide excellent scalability.

Refer to A Story of Caching. This article briefly explains how the client interacts with memcached.

What are the biggest advantages of memcached?

Read the above questions carefully (that is, how memcached works ). The biggest benefit of Memcached is its excellent horizontal scalability, especially in a huge system. Because the client performs a hash, we can easily add a large number of memcached instances to the cluster. Memcached does not communicate with each other, so it does not increase the load of memcached. It does not have multicast protocols and network traffic Explosion (implode ). Memcached clusters are easy to use. Insufficient memory? How many memcached instances are added? Is the CPU insufficient? How many more servers are added; is there excess memory? Just add a few servers. Don't waste it.

Based on the basic principles of memcached, you can easily build different types of cache architectures. In addition to this FAQ, details can be easily found elsewhere.

Let's take a look at the following questions: they compared memcached, the local cache of the server, and the query cache of MySQL. These questions will give you a more comprehensive understanding.

What are the advantages and disadvantages of memcached compared with MySQL query cache?

Introducing memcached into applications requires a lot of work. MySQL has a convenient query cache that can automatically cache SQL query results. cached SQL queries can be executed repeatedly and quickly. What if Memcached is different from Memcached? The query cache of MySQL is centralized, and the MySQL server connected to the query cache will benefit.

When you modify a table, the MySQL query cache is immediately refreshed (flush ). It takes only a small amount of time to store a memcached item. However, when write operations are performed frequently, MySQL's query cache often invalidates all cached data.

On a multi-core CPU, MySQL query cache may encounter scalability issues ). On a multi-core CPU, a global lock is added to the query cache. Because more cache data needs to be refreshed, the speed will become slower.

In MySQL query cache, we cannot store arbitrary data (only SQL query results ). With memcached, we can build various efficient caches. For example, you can execute multiple independent queries, construct a user object, and cache the user object to memcached. The query cache is at the SQL statement level and cannot be achieved. Query cache can be helpful for small websites. However, as the number of websites increases, the disadvantages of query cache will be greater than the benefits.

The memory used by the query cache is limited by the idle memory space of the MySQL server. It is good to add more memory to the database server to cache data. However, with memcached, any idle memory can be used to increase the size of the memcached cluster, and then you can cache more data.

What are the advantages and disadvantages of memcached compared with local cache (such as APC and mmap files in PHP) on the server?

First, the local cache has many problems that are the same as the preceding query cache. The memory capacity that the local cache can use is limited by the idle memory space of (single) servers. However, the local cache is better than memcached and query cache, that is, it not only can store arbitrary data, but also has no network access latency.

Local cache provides faster data query. Put the data of highly common in the local cache. If each page needs to load a small amount of data, place them in local cached.

The local cache lacks the group invalidation feature. In a memcached cluster, deleting or updating a key will make all observers aware of it. However, in the local cache, we can only notify all servers to refresh the cache (very slow, not scalable), or only rely on the cache timeout and invalidation mechanism.

Local cache faces severe memory restrictions, as mentioned above.

What is the cache mechanism of memcached?

The main cache mechanism of Memcached is LRU (least recently used) algorithm + timeout failure. When you store data in memcached, you can specify how long the data can be cached. Which is forever, or some time in the future. If memcached memory is not enough, expired slabs will be replaced first, and then it will be the oldest unused slabs.

How does memcached implement redundancy?
Not implemented! We are surprised at this issue. Memcached should be the cache layer of the application. Its design does not have any redundancy mechanism. If a memcached node loses all the data, you can obtain the data from the data source (such as the database) again. Note that your application can tolerate node failures. Do not write some bad query code. I hope memcached can guarantee everything! If you are worried that node failure will greatly increase the burden on the database, you can take some measures. For example, you can add more nodes (to reduce the impact of losing one node), Hot Standby nodes (take over IP addresses when other nodes are down), and so on.

How does memcached handle fault tolerance?
Not handled! :) When the memcached node fails, the cluster does not have to handle any fault tolerance. If a node fails, the response depends entirely on the user. When a node fails, you can choose from the following solutions:

Ignore it! Before a failed node is restored or replaced, many other nodes can cope with the impact of node failure.

Remove invalid nodes from the node list. Be careful when performing this operation! By default (remainder hash algorithm), adding or removing nodes on the client will make all cached data unavailable! Because the node list of the hash reference changes, most keys are mapped to different nodes due to the hash value changes.

Start the Hot Standby node to take over the IP address occupied by the failed node. This prevents hashing chaos ).

If you want to add or remove nodes without affecting the original hash results, you can use the consistent hash algorithm (consistent hashing ). You can use Baidu's consistent hash algorithm. Clients that support consistent hashing are mature and widely used. Try it!

Reshing ). When the client accesses data, if it finds that a node is down, it will hash it again (the hash algorithm is different from the previous one) and reselect another node (note that, the client does not remove the down node from the node list. Next time, it is possible to hash it first ). If a node is good or bad, the two hash methods are at risk. both good and bad nodes may have dirty data (stale data ).

How to import and export items in memcached in batches?

You should not do this! Memcached is a non-blocking server. Any operations that may result in memcached suspension or instant Denial of Service should be well considered. Batch Data Import to memcached is often not what you really want! Imagine that if the cache data changes between the export and import operations, you need to process dirty data. If the cache data expires between the export and import operations, how do you process the data?

Therefore, exporting data in batches is not as useful as you think. However, it is useful in one scenario. If you have a large amount of data that never changes, and want to cache the data quickly (warm), batch import of cache data is very helpful. Although this scenario is not typical, it often occurs. Therefore, we will consider implementing the batch Export and Import functions in the future.

Steven Grimm, as always, provides another good example in the mail list: http://lists.danga.com/pipermail/memcached/2007-July/004802.html.

But I do need to export the items in memcached in batches. What should I do ??

Okay. If you need to export and import data in batches, the most likely cause is that it takes a long time to regenerate the cache data, or the database is broken, causing you to suffer.

If a memcached node is down, it will cause you a lot of trouble. Your system is too fragile. You need to do some optimization work. For example, to solve the "surprise group" Problem (for example, memcached nodes are all invalid, and repeated queries make your database overwhelmed... this problem has been mentioned in other FAQ) or is not well optimized. Remember, Memcached is not an excuse for your evasion of optimized queries.

If it takes a long time (15 seconds to more than 5 minutes) to regenerate the cache data, you can consider using the database again. Here are some tips:

Use MogileFS (or similar software such as CouchDB) to store items. Calculate the item and dump it to the disk. MogileFS can easily overwrite items and provide quick access. You can even cache items in MogileFS in memcached to speed up reading. The combination of MogileFS and Memcached can accelerate the response speed when the cache does not hit and improve the website availability.

Use MySQL again. MySQL's InnoDB primary key query speed is very fast. If most of the cached data can be stored in the VARCHAR field, the primary key query performance will be better. Querying by key from memcached is almost equivalent to querying a MySQL primary key: hash the key to a 64-bit integer and store the data in MySQL. You can store original (non-hash) keys in common fields, and create secondary indexes to accelerate the passive invalidation of the query... key, batch deletion of expired keys, and so on.

All of the above methods can introduce memcached, which still provides good performance when restarting memcached. Because you do not need to be careful when the "hot" item is suddenly eliminated by the memcached LRU algorithm, users no longer need to spend a few minutes waiting to regenerate the cache data (when the cache data suddenly disappears from the memory ), therefore, the above method can comprehensively improve the performance.

For details about these methods, see blog: http://dormando.livejournal.com/495593.html.

How does memcached perform authentication?
No identity authentication mechanism! Memcached is the software running on the lower layer of the application (authentication should be the responsibility of the upper layer of the application ). Memcached clients and servers are lightweight, in part because they do not implement authentication mechanisms. In this way, memcached can quickly create new connections without any configuration on the server.

If you want to restrict access, you can use the firewall or have memcached listen to unix domain socket.

What is memcached multithreading? How to use them?
Thread is the Law (threads rule )! With the efforts of Steven Grimm and Facebook, memcached 1.2 and later have a multi-thread mode. The multi-thread mode allows memcached to take full advantage of multiple CPUs and share all cached data between CPUs. Memcached uses a simple locking mechanism to ensure that data update operations are mutually exclusive. Compared to running multiple memcached instances on the same physical machine, this method can process multi gets more effectively.

If your system load is not heavy, you may not need to enable multithreading. If you are running a website with large-scale hardware, you will see the benefits of multithreading.

For more information, see http://code.sixapart.com/svn/memcached/trunk/server/doc/threads.txt.

To sum up, the Command Parsing (memcached spent most of its time here) can run in multi-threaded mode. Memcached internal data operations are based on a lot of global locks (so this part of work is not multithreading ). Future improvements to the multi-threaded mode will remove a large number of global locks to Improve the Performance of memcached in scenarios with extremely high load.

What is the maximum length of the key that memcached can accept?
The maximum length of a key is 250 characters. Note that 250 is an internal limitation on the memcached server. If the client you use supports the "key prefix" or similar features, then the key (prefix + original key) the maximum length is 250 characters. We recommend that you use a shorter key to save memory and bandwidth.

What are the expiration time limits of memcached on item?
The expiration time can be up to 30 days. After memcached interprets the input expiration time (time period) as a time point, memcached sets the item to invalid once it reaches this time point. This is a simple but obscure mechanism.

What is the maximum size of a single item that memcached can store?
1 MB. If your data is larger than 1 MB, You can compress the data on the client or split it into multiple keys.

Why is the size of a single item limited to 1 MB byte?
Ah... this is a frequently asked question!

A simple answer: because the memory distributor algorithm is like this.

A detailed answer: Memcached memory storage engine (the engine will be pluggable in the future...), using slabs to manage the memory. The memory is divided into slabs chunks of different sizes (slabs with the same size is first divided, and each slab is divided into chunks with the same size. The chunks of different slab are not equal ). The chunk size starts from a minimum number and increases by a factor until the maximum possible value is reached.

If the minimum value is 400B, the maximum value is 1 MB, and the factor is 1.20, the chunk size of each slab is: slab1-400B slab2-480B slab3-576B...

The larger the chunk in slab, the larger the gap between it and the previous slab. Therefore, the larger the maximum value, the lower the memory utilization. Memcached must pre-allocate memory for each slab. Therefore, if a small factor and a large maximum value are set, more memory is required.

Another reason is that you do not want to access a large amount of data from memcached... Do not try to put a huge web page into mencached. It takes a long time to load and unpack such large data structures into the memory, resulting in poor website performance.

If you really need to store data larger than 1 MB, you can modify the value of slabs. c: POWER_BLOCK and re-compile memcached; or use inefficient malloc/free. Other suggestions include databases and MogileFS.

Can I use a cache space of varying sizes on different memcached nodes? After that, can memcached use the memory more effectively?
The Memcache client only uses the hash algorithm to determine the node on which a key is stored, regardless of the node memory size. Therefore, you can use caches of varying sizes on different nodes. But this is generally the case: nodes with a large amount of memory can run multiple memcached instances, each of which uses the same memory as instances on other nodes.

What is binary protocol? Should I pay attention to it?

The best information about binary is of course the binary protocol specification: http://code.google.com/p/memcached/wiki/MemcacheBinaryProtocol.

The binary Protocol tries to provide a more effective and reliable protocol for the client to reduce the CPU time generated by the Client/Server due to the processing protocol.
According to Facebook's tests, parsing the ASCII protocol is the most time-consuming part of memcached. So why don't we improve the ASCII protocol?

Some old information: http://lists.danga.com/pipermail/memcached/2007-July/004636.html can be found in the thread of this mail list.

How does memcached memory distributor work? Why not apply to malloc/free !? Why use slabs?
In fact, this is a compile-time option. The internal slab distributor is used by default. You should indeed use the built-in slab distributor. At the earliest time, memcached only used malloc/free to manage the memory. However, this method cannot work well with the memory management of the OS before. Repeatedly malloc/free results in memory fragmentation. The OS eventually spends a lot of time searching for contiguous memory blocks to satisfy malloc requests, rather than running the memcached process. If you do not agree, you can use malloc! Just don't complain in the email list :)

The slab distributor was created to solve this problem. The memory is allocated and divided into chunks, which are repeatedly used. Because the memory is divided into slabs with a size ranging from large to small, if the size of the item and the slab used to store it are not very suitable, some memory will be wasted. Steven Grimm has made effective improvements in this regard.

The Mail List contains some improvements about slab (power of n or power of 2) and trade-offs: http://lists.danga.com/pipermail/memcached/2006-May/002163.html http://lists.danga.com/pipermail/memcached/2007-March/003753.html.

If you want to use malloc/free to see how they work, you can define USE_SYSTEM_MALLOC during the build process. This feature has not been well tested and cannot be supported by developers.

More information: http://code.sixapart.com/svn/memcached/trunk/server/doc/memory_management.txt.

Is memcached atomic?
Of course! Well, let's clarify:
All single commands sent to memcached are completely atomic. If you send a set command and a get command for the same data, they will not affect the other party. They will be serialized and executed successively. Even in multi-threaded mode, all commands are atomic unless the program has a bug :)
The command sequence is not atomic. If you get an item through the get command, modify it, and set it back to memcached, we do not guarantee that this item is not, is not necessarily a process in the operating system. In the case of concurrency, you may also overwrite an item set by another process.

Memcached 1.2.5 and later versions provide gets and cas commands to solve the above problems. If you use the gets command to query the item of a key, memcached returns a unique identifier for the current value of this item. If you overwrite this item and want to write it back to memcached, you can use the cas command to send the unique identifier together to memcached. If the unique identifier of the item stored in memcached is the same as that provided by you, your write operation will succeed. If another process modifies this item during this period, the unique identifier of the item stored in memcached will change, and your write operation will fail.

Generally, it is tricky to modify items based on the values of items in memcached. Do not do this unless you know exactly what you are doing.

Source: Memcached cluster architecture problems _ Knowledge Base _ blog Garden

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More