* How does memcached work?
* What is the biggest advantage of memcached?
* What are the pros and cons of memcached compared to MySQL's query cache?
* What are the pros and cons of memcached and the server's local cache (such as the APC, mmap files, etc.) of PHP?
* Memcached's
Cache mechanismWhat's that?
* memcached How to achieve
Redundancy mechanism?
* How does memcached handle fault-tolerant?
* How do I import and export memcached item in bulk?
* But I do need to dump the item in the memcached, I really need to load the data into the memcached, what to do?
* How does memcached authenticate?
* What is multithreading for using memcached? How do I use them?
* What is the maximum length of a key that memcached can accept? (250bytes)
* memcached is there any limit on the expiration time of item? (Why is there a 30-day limit?) )
* What is the maximum number of individual item memcached can store? (1M Byte)
* Why is the size of a single item limited to 1M bytes?
* To allow memcached to use server memory more efficiently, can you configure cache space of varying sizes on each server?
* What is binary protocol? Does it deserve attention?
* How does the memcached allocate memory? Why not malloc/free!? Why on earth do you use slab?
* Can memcached guarantee the atomicity of data storage?
* Problems with cluster architecture
How does memcached work?
The magic of memcached comes from
Two-stage hash(Two-stage hash). Memcached is like a huge hash table that stores a lot of <key,value> pairs. With key, you can store or query arbitrary data.
The client can store the data on more than one memcached. When querying the data, the client first calculates the hash value of the key (phase a hash), and then selects a node, the client sends the request to the selected node, and the Memcached node finds the real data (item) through an internal hashing algorithm (phase two hash).
As an example, suppose there are 3 clients 1, 2, 3, 3 memcached A, B, C:
Client 1 wants to store the data "Barbaz" with Key "Foo". Client 1 first refers to the node list (A, B, C) and calculates the hash value of key "foo", assuming memcached B is selected. Then, Client 1 directly connect to memcached B, and the data "Barbaz" is stored through key "foo". Client 2 uses the same clients library (which means phase one is the same as the hashing algorithm) and also has the same memcached list (A, B, C).
So, after the same hash calculation (phase one), Client 2 calculates the key "foo" on memcached B, and then it requests memcached B directly to get the data "Barbaz".
Various clients have different forms of data storage in memcached (Perl storable, PHP Serialize, Java hibernate, JSON, etc.). Some client implementations do not have the same hashing algorithm. However, the memcached server-side behavior is always consistent.
Finally, from an implementation point of view, Memcached is a
non-blocking, event-based server programs。 This architecture is a good solution to c10k problem, and has excellent scalability.
You can refer to a story of Caching, which simply explains how the client interacts with memcached.
What is the biggest advantage of memcached?
Please read the above questions carefully (i.e. how memcached works). The biggest benefit of memcached is that it brings a great
Horizontal Scalability, especially in a huge system. Since the client has made a hash for itself, it is easy to add a lot of memcached to the cluster. Memcached do not communicate with each other, so
no increase in memcached load; No multicast protocol, no network traffic explosion (implode). The memcached cluster is very useful. Not enough memory? Add a few memcached, the CPU is not enough? Add a few more, have extra memory? Add a few more, and don't waste it.
Based on the basic principles of memcached, different types of caching architectures can be built fairly easily. In addition to this FAQ, it is easy to find detailed information elsewhere.
Take a look at some of the following questions, which are compared between the memcached, the server's local cache, and the MySQL query cache. These questions will give you a more comprehensive understanding.
What are the pros and cons of memcached compared to MySQL's query cache?
It takes a lot of work to introduce memcached into the application. MySQL has a handy query cache that can automatically cache the results of SQL queries, and cached SQL queries can be executed quickly and repeatedly. How does memcached compare with it? MySQL's query cache is centralized and will benefit from the MySQL server connected to the query cache.
* When you modify a table, the MySQL query cache
be refreshed immediately(flush). It only takes a little time to store a memcached item, but when the write operation is frequent, the MySQL query cache often invalidates all cached data.
* On multi-core CPUs, MySQL's query cache will encounter
Scaling Issues(scalability issues). On multi-core CPUs, query cache adds a global lock, which can become slower due to the need to flush more cached data.
* In MySQL's query cache, we are
Unable to store arbitrary data(SQL query results only). With memcached, we can build a variety of efficient caches. For example, you can execute multiple independent queries, build a user object, and then cache the user object into memcached. The query cache is at the level of the SQL statement and it is not possible to do so. In a small website, query cache helps, but with the increase in the size of the site, query cache will do more harm than benefit.
* The amount of memory available to query cache is limited by the free memory space of the MySQL server. It is good to add more memory to the database server to cache the data. However, with memcached, as long as you have free memory, you can use it to
increase the size of the memcached cluster, then you can cache more data.
What are the pros and cons of memcached versus the server's local cache (such as the APC, mmap files, etc.) of PHP?
First, the local cache has many of the same problems as above (query cache). The amount of memory available to the local cache is limited by the free memory Space (single) server. However, the local cache is a bit better than memcached and query cache, which is that it can store arbitrary data without the latency of network access.
* Local cache data query is faster. Consider putting the highly common data in the local cache. If you need to load some small amount of data on each page, consider putting them in local cached.
* The local cache lacks the feature of collective failure (group invalidation). In
in a memcached cluster, deleting or updating a key will make all observers aware。 However, in the local cache, we can only notify all servers to flush the cache (very slow, not extensible), or simply rely on the cache timeout invalidation mechanism.
* The local cache faces a serious
Memory Limit, which has been mentioned above.
What is the cache mechanism of memcached?
Memcached the main cache mechanism is
LRU (least recently used) algorithm + timeout expiration。 When you save data to memcached, you can specify how long the data can stay in the cache which is forever, or some time in the future. If Memcached's memory is not enough, the expired slabs will be replaced first, then the oldest unused slabs.
Memcached How to implement redundancy mechanism?
Not implemented! We were amazed at the problem. Memcached should be the cache layer of the application. Its design itself has no redundancy mechanism. If a memcached node loses all of its data, you should be able to retrieve it from the data source (such as the database) again. You should be particularly aware that your app should tolerate node failures. Do not write some bad query code, hope to memcached to guarantee everything! If you are concerned about the failure of the node to greatly increase the burden on the database, you can take some measures. For example, you can add more nodes (to reduce the impact of losing one node), Hot spare nodes (take over IP when other nodes are down), and so on.
Memcached How to handle fault-tolerant?
No Deal! :) In the case of memcached node failure, there is no need for the cluster to do any fault-tolerant processing. If a node failure occurs,
the measures to be taken depend entirely on the user。 When a node fails, here are a few scenarios to choose from:
* Ignore it! There are many other nodes that can deal with the effect of node failure before the failed node is restored or replaced.
* Remove the failed node from the list of nodes. Be careful with this operation! By default (the remainder hash algorithm), the client adds or removes nodes, causing all cached data to be unavailable! Because the list of nodes for the hash reference changes, most of the keys are mapped to different nodes (as they were) because of the change in the hash value.
* Start
Hot Spare Node, take over the IP that the failed node occupies. This prevents hash disturbances (hashing chaos).
* If you want to add and remove nodes without affecting the original hash results, you can use the
Consistent hashing algorithm(Consistent hashing). You can baidu a consistent hashing algorithm. Clients that support consistent hashing are already mature and widely used. Go and try it!
* Two hashes (reshing). When the client accesses the data, if a node is found to be down, the hash is done again (the hash algorithm differs from the previous one), and the other node is re-selected (note that the client does not remove the down node from the node list and the next time it is possible to hash to it). If a node is good and bad, the two-hash method is risky, and dirty data may be present on both good and bad nodes.
How do I export memcached item in bulk?
You should not do this! The memcached is a non-blocking server. Any operation that could lead to a memcached pause or momentary denial of service should be worth pondering. Bulk importing data to memcached is often not what you really want! Imagine, if the cached data changes between export imports, you need to deal with dirty data, and if the cached data expires between export imports, what do you do with the data?
Therefore, exporting imported data in batches is not as useful as you might think. But it's very useful in a scene. If you have a large amount of data that is never changed and you want the cache to be hot (warm) quickly, it is helpful to bulk import the cached data. Although this scenario is not typical, it often happens, so we will consider the ability to implement bulk export imports in the future.
Steven Grimm, as always, gave another good example in the mailing list: http://lists.danga.com/pipermail/memcached/2007-July/004802.html.
But I do need to memcached the item in bulk export import, how to do??
All right, all right. If you need to export the import in bulk, the most likely cause is that it takes a long time to regenerate the cached data, or the database is bad for you to suffer.
If a memcached node is down to make you miserable, you'll get into a lot of other problems. Your system is too fragile. You need to do some optimization work. such as dealing with "surprise group" problem (such as memcached node is not valid, repeated queries to keep your database overwhelmed ...) This question is referred to in the other FAQ), or to optimize bad queries. Remember, Memcached is not an excuse to avoid optimizing your queries.
If your problem is simply to regenerate the cached data for a long time (15 seconds to more than 5 minutes), you might consider re-using the database. Here are a few tips:
* Use MogileFS (or similar software such as COUCHDB) to store the item. Calculate the item and dump it on the disk. The mogilefs can easily overwrite item and provide quick access to it. You can even cache the item in the MogileFS in memcached, which speeds up the read speed. The combination of mogilefs+memcached can speed up the response time of cache misses and improve the usability of the website.
* Re-use MySQL. MySQL's InnoDB primary key query is fast. If most of the cached data can be placed in a varchar field, the performance of the primary key query will be better. Querying by key from memcached is almost equivalent to MySQL's primary key query: Hashes the key to the 64-bit integer and stores the data in MySQL. You can store the original (not hash) key in a normal field, and then build a two-level index to speed up the query ... key passively fails, bulk delete invalid key, and so on.
All of the above methods can be introduced into memcached, which still provides good performance when restarting memcached. Because you don't need to be wary of the "hot" item being suddenly phased out by the memcached LRU algorithm, users no longer have to wait a few minutes for the cached data to regenerate (when the cached data suddenly disappears from memory), so the above method can improve overall performance.
For details on these methods, see the blog: http://dormando.livejournal.com/495593.html.
How does the memcached authenticate?
No identity authentication mechanism! Memcached is software that runs on the lower layer of the application (authentication should be the upper-level responsibility of the application). The memcached client and server side are lightweight, in part because the authentication mechanism is not implemented at all. In this way, memcached can quickly create new connections without any configuration on the server side.
If you want to restrict access, you can use a firewall, or let memcached listen to UNIX domain sockets.
What are the multithreading of memcached? How do I use them?
The thread is the law (threads rule)! With the efforts of Steven Grimm and Facebook, Memcached 1.2 and later have a multithreaded model. Multithreaded mode allows memcached to take full advantage of multiple CPUs and share all cached data between CPUs. Memcached uses a simple locking mechanism to guarantee mutual exclusion of data update operations. This is a more efficient way to handle multi gets than running multiple memcached instances on the same physical machine.
If your system load is not heavy, you may not need to enable multithreaded working mode. If you are running a large web site with large hardware, you will see the benefits of multithreading.
For more information, see: Http://code.sixapart.com/svn/memcached/trunk/server/doc/threads.txt.
Simply summarize: Command parsing (Memcached spends most of the time here) can run in multithreaded mode. memcached internal operations on data are based on a number of global locks (so this part of the work is not multi-threaded). Future improvements in multithreaded mode will remove a large number of global locks and improve the performance of memcached in highly loaded scenarios.
What is the maximum length of a key that memcached can accept?
The maximum length of a key is 250 characters. Note that 250 is a memcached server-side limitation, and if you use a client that supports "key prefixes" or similar features, the maximum length of key (prefix + original key) can be more than 250 characters. We recommend using shorter keys because you can save memory and bandwidth.
Memcached is there any limit on the expiration time of item?
The maximum expiration time can be up to 30 days. Memcached the passed-in expiration time (time period) is interpreted as a point in time, the memcached will put the item in a failed state once it has reached this point in time. This is a simple but obscure mechanism.
memcached how big a single item can be stored?
1MB. If your data is larger than 1MB, consider compressing or splitting the client into multiple keys.
Why is the size of a single item limited to 1M bytes?
Ah ... This is a question that you often ask!
Simple answer: Because the memory allocator algorithm is like this.
Detailed answer: memcached memory storage engine (engine will be pluggable ... ), use slabs to manage memory. Memory is divided into slabs chunks of unequal size (first divided into slabs of equal size, then each slab is divided into equal size chunks, slab of different chunk size is unequal). The size of the chunk starts with a minimum number, and grows by a factor until the maximum possible value is reached.
If the minimum value is 400B, the maximum value is 1MB, the factor is 1.20, the size of each slab chunk is: slab1–400b slab2–480b slab3–576b ...
The larger the chunk in the slab, the greater the gap between it and the front slab. Therefore, the larger the maximum value, the less memory utilization. Memcached must pre-allocate memory for each slab, so if you set a smaller factor and a larger maximum value, you will need more memory.
There are other reasons why you should not access large data in this way to memcached ... Don't try to put huge pages into the mencached. It takes a long time to load and unpack such a large data structure into memory, which results in poor performance on your website.
If you do need to store more than 1MB of data, you can modify the value of the Slabs.c:power_block and recompile the memcached, or use an inefficient malloc/free. Other recommendations include databases, MogileFS, and so on.
Can I use cache space of varying sizes on different memcached nodes? After doing this, will memcached be able to use memory more efficiently?
The Memcache client only determines on which node a key is stored based on the hashing algorithm, regardless of the memory size of the node. Therefore, you can use caches of varying sizes on different nodes. But this is generally done: multiple memcached instances can be run on nodes with more memory, and each instance uses the same memory as the instances on other nodes.
What is a binary protocol, should I pay attention?
The best information about binary is of course the binary protocol specification: Http://code.google.com/p/memcached/wiki/MemcacheBinaryProtocol.
The binary protocol attempts to provide a more efficient and reliable protocol for the end, reducing the CPU time generated by the client/server side due to processing protocols.
According to Facebook's tests, parsing the ASCII protocol is the most CPU-intensive part of memcached. So why don't we improve the ASCII protocol?
Some old information can be found in the thread of this mailing list: http://lists.danga.com/pipermail/memcached/2007-July/004636.html.
Memcached's
Memory AllocatorHow does it work? Why not apply malloc/free!? Why use slabs?
In fact, this is a compile-time option. The internal slab allocator is used by default. You really should actually use the built-in slab allocator. At the earliest, memcached only used Malloc/free to manage memory. However, this approach does not work well with the memory management of the OS before. Repeated malloc/free caused memory fragmentation, and the OS eventually spent a lot of time looking for contiguous blocks of memory to meet malloc requests, rather than running the memcached process.
The slab dispenser was born to solve the problem. The memory is allocated and divided into chunks, which has been reused. Because memory is divided into slabs of different sizes, if the size of the item is not appropriate for the slab that is chosen to store it, some memory is wasted. Steven Grimm is already making effective improvements in this area.
There are some improvements to slab in the mailing list (power of N or power of 2) and tradeoffs: http://lists.danga.com/pipermail/memcached/2006-May/002163.htmlhttp ://lists.danga.com/pipermail/memcached/2007-march/003753.html.
If you want to use Malloc/free to see how they work, you can define USE_SYSTEM_MALLOC during the build process. This feature is not well tested, so it is too unlikely to be supported by developers.
More information: Http://code.sixapart.com/svn/memcached/trunk/server/doc/memory_management.txt.
Is the memcached atomic?
Of course! Well, let's make it clear:
All the individual commands that are sent to the memcached are completely atomic. If you send a set command and a GET command for the same data at the same time, they do not affect each other. They will be serialized and executed successively. Even in multithreaded mode, all commands are atomic unless the program has a bug:)
The command sequence is not atomic. If you get an item with a GET command, modify it, and then want to set it back to memcached, we don't guarantee that the item is not manipulated by another process (process, not necessarily an operating system). In the case of concurrency, you may also overwrite an item that is set by another process.
Memcached 1.2.5 and later, the Get and CAS commands are available, and they solve the problem above. If you use the GET command to query the item,memcached of a key, you will be returned with a unique identifier for the item's current value. If you overwrite this item and want to write it back to memcached, you can send the unique identity to memcached with the CAS command. If the item's unique identity in the memcached is consistent with what you provide, your write operation will succeed. If the item is also modified by another process during this time, the unique identity of the item stored in the memcached will change and your write will fail.
It is often tricky to modify item based on the value of item in memcached. Unless you know exactly what you're doing, don't do anything like that.
Memcache face question