Memcached Bottom Working principle

Source: Internet
Author: User
Tags cas flush memcached mysql query svn unique id
1, memcached is how to work.
The magic of memcached comes from the two phase hash (two-stage hash). Memcached is like a huge hash table that stores a lot of <key,value> pairs. With key, you can store or query arbitrary data.
The client can store the data on multiple sets of memcached. When querying for data, the client first refers to the node list to compute the hash value (phase one hash) of the key, then selects a node, the client sends the request to the selected node, and the Memcached node finds the real data (item) through an internal hashing algorithm (phase two hash).
For example, suppose there are 3 clients 1, 2, 3, 3 memcached A, B, C:
Client 1 wants to store the data "Barbaz" in Key "Foo". Client 1 First references the node list (A, B, C), calculates the hash value of the key "foo", assuming memcached B is selected. Next, the Client 1 direct connect to memcached B, through the key "foo" to the data "Barbaz" stored in. Client 2 has the same memcached list (A, B, C), using the same customer library as clients 1 (meaning that the hash algorithm for phase one is the same).
So, after the same hash calculation (stage one), Client 2 calculates the key "foo" on memcached B, and then it directly requests memcached B to get the data "Barbaz".
The types of data stored by various clients in memcached are different (Perl storable, PHP Serialize, Java hibernate, JSON, etc.). Some clients implement hash algorithms as well. However, the behavior of the memcached server side is always consistent.
Finally, from an implementation point of view, Memcached is a non-blocking, event-based server program. This architecture can be a good solution to c10k problem and has excellent scalability.
You can refer to a Story of Caching, which briefly explains how the client interacts with memcached.
2, memcached The biggest advantage is what.
Please read the above question carefully (that is, how memcached works). The biggest advantage of memcached is that it brings excellent level scalability, especially in a huge system. Because the client makes a hash of itself, it's easy to add a lot of memcached to the cluster. Memcached do not communicate with each other, so there is no increase in memcached load, no multicast protocol, no network traffic explosion (implode). The memcached cluster is very useful. There's not enough memory. Add a few memcached, the CPU is not enough. Add a few more, and have extra memory. Add a few more, don't waste it.
Based on the basic principles of memcached, different types of caching schemas can be built fairly easily. In addition to this FAQ, it is easy to find detailed information elsewhere.
Check out the following questions for a comparison between the memcached, the server's local cache, and the MySQL query cache. These questions will give you a more comprehensive understanding.
3, memcached and MySQL query cache compared to what are the advantages and disadvantages.
The introduction of memcached into the application, still requires a lot of work. MySQL has a convenient query cache, can automatically cache the results of SQL queries, cached SQL query can be repeated fast execution. How about memcached compared to it? MySQL's query cache is centralized, and the MySQL server connected to the query cache will benefit.
* When you modify the table, the MySQL query cache will be refreshed immediately (flush). Storing a memcached item takes only a very small amount of time, but when write operations are frequent, MySQL's query cache often invalidates all cached data.
* On multi-core CPUs, the MySQL query cache will encounter an extended problem (scalability issues). On multi-core CPUs, query cache adds a global lock, which can become slower because of the need to flush more cached data.
* In the MySQL query cache, we can't store any data (only SQL query results). And using memcached, we can build a variety of efficient caching. For example, you can execute multiple independent queries, build a user object, and then cache the user object in memcached. and query cache is SQL statement level, it is impossible to do this. In a small web site, query cache will help, but as the size of the site, query cache will do more harm than good.
* The amount of memory that query cache can utilize is limited by the free memory space of the MySQL server. It is good to add more memory to the database server to cache the data. However, with memcached, as long as you have free memory, can be used to increase the size of the memcached cluster, and then you can cache more data.
4, memcached and the server's local cache (such as the PHP APC, mmap files, etc.) compared to what the pros and cons.
First, the local cache has many of the same problems as the above (query cache). The amount of memory that the local cache can take advantage of is limited by the free memory space of the (single) server. However, the local cache is a bit better than memcached and query cache, that is, it can store arbitrary data, and there is no latency of network access.
* Local cache data query is faster. Consider putting the highly common data in the local cache. If each page needs to load a small amount of data, consider placing them in the local cached bar.
* The local cache lacks the attributes of collective failure (group invalidation). In the memcached cluster, deleting or updating a key will make all observers aware of it. But in the local cache, we can only notify all servers to flush cache (slow, not extensible), or simply rely on the caching timeout mechanism.
* The local cache faces a severe memory limit, as mentioned above.
5, memcached the cache mechanism is how.
The main cache mechanism of Memcached is the LRU (least recently used) algorithm + timeout failure. When you save data to memcached, you can specify how long the data can stay in the cache which is forever, or some time in the future. If memcached memory is not available, the expired slabs will be replaced first, then the oldest unused slabs.
6, memcached How to implement redundancy mechanism.
Not implemented. We were amazed at the question. Memcached should be the cache layer for the application. Its design does not have any redundancy mechanism in its own right. If a memcached node loses all of its data, you should be able to retrieve the data again from a data source such as a database. You should be particularly aware that your application should tolerate the expiration of nodes. Do not write bad query code, hope to memcached to ensure everything. If you are concerned that a node failure can significantly increase the burden on your database, you can take a few steps. For example, you can add more nodes (to reduce the impact of losing a node), hot standby nodes (take over IP when other nodes are down), and so on.
7, memcached How to deal with fault-tolerant.
Not processed. :) In the case of memcached node failure, the cluster does not need to do any fault-tolerant processing. If a node failure occurs, the response depends entirely on the user. When a node fails, several scenarios are listed below for you to choose from:
* Ignore it. Before the failed node is restored or replaced, there are many other nodes that can cope with the effect of node failure.
* Remove the failed node from the list of nodes. You must be careful in doing this operation. By default (the remainder hash algorithm), the client adds or removes nodes, causing all cached data to be unavailable. Because the list of nodes for a hash reference changes, most keys are mapped to a different node (from the original) because of a change in the hash value.
* Start the hot standby node and take over the IP occupied by the failed node. This prevents hash disturbances (hashing chaos).
* If you want to add and remove nodes without affecting the original hash result, you can use the consistent hashing algorithm (consistent hashing). You can Baidu a consistent hash algorithm. Clients that support a consistent hash are already mature and widely used. Go and try it.
* Two hashes (reshing). When the client accesses the data, if you find a node down, do a hash again (the hash algorithm is different from the previous one), and then select another node (note that the client does not remove the down node from the list of nodes, and then it is possible to hash it first). If a node is good or bad, the two hashing methods are risky, and dirty data may exist on both good nodes and bad nodes (stale).
8, how to memcached the item in bulk Import export.
You should not do this. Memcached is a non-blocking server. Any operation that may cause memcached pauses or instantaneous denial of service should be well thought-out. Importing data into a memcached is often not what you really want. Imagine that if the cached data changes between export imports, you need to deal with dirty data, and if the cached data expires between export imports, how do you deal with the data?
Therefore, bulk export import data is not as useful as you might think. But it's useful in a scene. If you have a large amount of data that never changes and you want the cache to be hot (warm), it is helpful to bulk import cached data. Although this scenario is not typical, it happens frequently, so we will consider implementing the bulk export import functionality in the future.
Steven Grimm, as always, gives another good example in the mailing list: http://lists.danga.com/pipermail/memcached/2007-July/004802.html.
9, but I really need to memcached the item in bulk export import, how to do ...
All right, all right. If you need a bulk export import, the most likely reason is that it takes a long time to regenerate the cached data, or that the database is bad for you to suffer.
If a memcached node down makes you miserable, you'll get into a lot of other trouble. Your system is too fragile. You need to do some optimization work. For example, dealing with "surprise group" problems (such as memcached nodes are invalid, repeated queries to make your database overwhelmed ... This problem is mentioned elsewhere in the FAQ, or to optimize bad queries. Remember, Memcached is not an excuse to avoid optimizing queries.
If your problem is simply to regenerate cached data for a long time (15 seconds to more than 5 minutes), you may consider using the database again. Here are a few tips:
* Use MogileFS (or similar software such as COUCHDB) to store item. Calculate the item and dump it onto disk. MogileFS can easily overwrite the item and provide quick access to it. You can even cache item mogilefs in memcached to speed up reading. The combination of mogilefs+memcached can speed up the response of cache misses and improve the usability of the Web site.
* Reuse MySQL. MySQL's InnoDB primary key query is fast. If most of the cached data can be placed in the varchar field, the performance of the primary key query will be better. Query by key from Memcached is almost equivalent to MySQL's primary key query: Hash the key to a 64-bit integer, and then store the data in MySQL. You can make the original (not hash) key stored in the normal fields, and then establish a level two index to speed up the query ... key passive, delete the invalid key, and so on.
The above methods can be introduced into the memcached and still provide good performance when restarting the memcached. Because you do not need to be careful that "hot" item is memcached LRU algorithm suddenly eliminated, the user no longer need to spend a few minutes waiting to regenerate cached data (when the cached data suddenly disappears from memory), so the above method can improve overall performance.
Details on these methods are detailed in the blog: http://dormando.livejournal.com/495593.html.
10, memcached is how to do authentication.
There is no identity authentication mechanism. Memcached is software that runs on the application's lower level (authentication should be the responsibility of the upper application). The memcached client and server side is lightweight, in part because the authentication mechanism is not implemented at all. In this way, memcached can quickly create a new connection, and the server side does not need any configuration.
If you want to restrict access, you can use a firewall or have memcached listen for UNIX domain sockets.
11, memcached of multithreading is what. How to use them.
A thread is a law (threads rule). With the efforts of Steven Grimm and Facebook, Memcached 1.2 and later have multithreaded mode. Multithreaded mode allows memcached to take full advantage of multiple CPUs and share all cached data between CPUs. Memcached uses a simple locking mechanism to guarantee the mutual exclusion of data update operations. This approach can handle multi gets more efficiently than running multiple memcached instances on the same physical machine.
If your system does not load heavily, you may not need to enable multithreaded work mode. If you are running a huge web site with large hardware, you will see the benefits of multithreading.
For more information, see: Http://code.sixapart.com/svn/memcached/trunk/server/doc/threads.txt.
To summarize briefly: Command parsing (Memcached spends most of the time here) can be run in multithreaded mode. memcached internal operations on data are based on a number of global locks (so this part of the work is not multithreaded). Future improvements to multithreaded mode will remove a large number of global locks and improve the performance of memcached in highly loaded scenarios.
12, memcached can accept the maximum length of the key is how much.
The maximum length of a key is 250 characters. Note that 250 is a memcached server-side limitation, and if you use a client that supports "key prefix" or similar features, then the maximum length of key (prefix + original key) can be more than 250 characters. We recommend that you use a shorter key because you can save memory and bandwidth.
13, memcached on the item's expiration time has any restriction.
The maximum expiration time can be up to 30 days. Memcached the incoming Expiration time (time period) is interpreted as a point of time, once at this point in time, memcached the item to a failed state. This is a simple but obscure mechanism.
14, memcached maximum can store how large a single item.
1MB. If you have more than 1MB of data, consider compressing or splitting the client into multiple keys.
15, why the size of the individual item is limited to 1M byte.
Ah ... This is a question that we often ask.
Simple answer: Because the memory allocator algorithm is the case.
Detailed answer: memcached memory storage engine (engine can be plugged in the future ...) ), use slabs to manage memory. Memory is divided into slabs chunks of unequal size (first divided into equal size slabs, then each slab is divided into equal size chunks, and slab sizes of different chunk are unequal). The size of the chunk begins with a minimum number and increases by a factor until the maximum possible value is reached.
If the minimum value is 400B, the maximum is 1MB, the factor is 1.20, the size of each slab chunk is: slab1-400b slab2-480b slab3-576b ...
The larger the chunk in the slab, the greater the gap between it and the slab in front. Therefore, the greater the maximum value, the lower the memory utilization. Memcached must allocate memory for each slab, so if you set a smaller factor and a larger maximum, you will need more memory.
There are other reasons why you should not access very large data to memcached ... Don't try to put huge web pages into mencached. It takes a long time to load and unpack such a large data structure into memory, which results in poor performance on your site.
If you do need to store more than 1MB of data, you can modify Slabs.c:power_block values and recompile memcached, or use inefficient malloc/free. Other recommendations include databases, MogileFS, and so on.
16. Can I use cache space of unequal size on different memcached nodes? After doing so, memcached can use memory more efficiently.
The Memcache client only determines which node to store a key based on the hashing algorithm, regardless of the node's memory size. Therefore, you can use a cache of varying sizes on different nodes. But this is generally done: multiple memcached instances can be run on a node with more memory, and each instance uses the same memory as an instance on another node.
17, what is binary protocol, I should pay attention to it.
The best information about binary is of course Binary protocol specification: Http://code.google.com/p/memcached/wiki/MemcacheBinaryProtocol.
The binary protocol attempts to provide a more efficient and reliable protocol for the end, reducing the CPU time generated by client/server-side processing protocols.
According to Facebook's tests, parsing the ASCII protocol is the most CPU-intensive process in memcached. So why don't we improve the ASCII protocol?
Some old information can be found in thread of this mailing list: http://lists.danga.com/pipermail/memcached/2007-July/004636.html.
How the memcached memory allocator works. Why not apply Malloc/free. Why you should use slabs.
In fact, this is a compile-time option. The internal slab allocator is used by default. You really should use the built-in slab allocator. At the earliest, memcached used only malloc/free to manage memory. However, this approach does not work well with OS memory management before. Repeatedly malloc/free creates memory fragmentation, and the OS ultimately spends a lot of time looking for contiguous chunks of memory to satisfy malloc requests rather than running memcached processes. If you do not agree, of course you can use malloc. Just don't complain in the mailing list:
The slab allocator was born to solve this problem. Memory is allocated and divided into chunks, which has been reused. Because memory is divided into slabs of varying sizes, some memory is wasted if the size of the item is not appropriate for the slab that is chosen to store it. Steven Grimm is already making an effective improvement in this area.
There are some improvements in the mailing list about slab (power of N or power of 2) and tradeoffs: http://lists.danga.com/pipermail/memcached/2006-May/002163.html http ://lists.danga.com/pipermail/memcached/2007-march/003753.html.
If you want to use Malloc/free to see how they work, you can define USE_SYSTEM_MALLOC during the build process. This feature is not well tested, so it is too unlikely to be supported by developers.
More information: Http://code.sixapart.com/svn/memcached/trunk/server/doc/memory_management.txt.
18. Is the memcached atomic?
Of course. Well, let's make it clear:
All the single commands sent to the memcached are completely atomic. If you send a set command and a GET command for the same data at the same time, they do not affect each other. They will be serialized and executed successively. Even in multithreaded mode, all commands are atomic unless the program has a bug:
The command sequence is not atomic. If you obtain an item by getting a command, modify it, and then want to set it back to memcached, we do not guarantee that the item has not been manipulated by other processes (process, not necessarily in the operating system). In concurrent situations, you may also overwrite an item that is set by another process.
The memcached 1.2.5 and later versions provide gets and CAS commands that can solve the problem above. If you use the gets command to query the item,memcached of a key, the unique identification of the item's current value is returned to you. If you overwrite this item and want to write it back to memcached, you can send that unique ID to memcached with the CAS command. If the unique identification of the item in memcached is consistent with what you have provided, your write operation will succeed. If another process modifies this item during this time, the unique identification of the item in memcached will change and your write operation will fail.
Usually, it is tricky to modify the item based on the value of item in memcached. Don't do such a thing unless you know what you're doing.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.