Memcache Ultra-Detailed interpretation

Source: Internet
Author: User
Tags cas connection pooling memcached

What is Memcache?

Memcache is a free, open source, high performance, distributed, distributed memory object caching system for dynamic Web applications to reduce the load on the database. It improves the speed of website access by caching data and objects in memory to reduce the number of times a database is read. Memcache is a hashmap that stores key-value pairs, in memory for arbitrary data (such as strings, objects, etc.) used by Key-value storage, data can come from database calls, API calls, or page rendering results. Memcache design concept is small and powerful, its simple design facilitates rapid deployment, easy to develop and solve many challenges facing large-scale data cache, and the open API enables Memcache to be used in Java, c/c++/c#, Perl, Python, PHP, Most popular programming languages like Ruby.

Also, say the difference between memcache and memcached:

1, Memcache is the name of the project

2, memcached is Memcache server can execute the name of the file

Memcache's official website is http://memcached.org/

Memcache Access model

In order to deepen understanding, I imitate the original Ali technical expert Hae teacher "large Web site technology architecture Core Principles and Case analysis" a book memcache part, I drew a picture:

In particular,Memcache is known as a "distributed cache", but Memcache itself does not have a distributed function , and memcache clusters do not communicate with each other (in contrast, such as JBoss cache, When a server has cached data updates, it notifies other machines in the cluster to update the cache or clear the cached data, so-called "distributed", which relies entirely on the implementation of the client program, just like the flow of the diagram above.

At the same time, based on this graph, the process of memcache once write cache:

1. Application input requires write-cached data

2, API will key input routing algorithm module, routing algorithm based on key and memcache Cluster Server list to get a server number

3, the server number to get memcache and its IP address and port number

4, the API calls the communication module and the specified number of server communication, writes data to the server, completes a distributed cache write operation

Read cache and write cache, as long as the same routing algorithm and server list, as long as the application query is the same Key,memcache client always access the same client to read the data, as long as the server also caches the data, can guarantee the cache hit.

This way of Memcache cluster is also from the aspect of partition fault tolerance, if Node2 down, then Node2 stored on the data are not available, at this time because the cluster Node0 and Node1 still exist, the next request Node2 stored in the key value, Must be no hit, then first get the data to be cached from the database, and then the routing algorithm module according to the key value in Node0 and Node1 Select a node, the corresponding data into the next time you can go cache, this cluster approach is very good, but the disadvantage is the cost is relatively large.

Consistent hash algorithm

From the above diagram, we can see a very important problem, that is, the management of the server cluster, the routing algorithm is very important, and the same as the load-balancing algorithm, the routing algorithm determines exactly which server in the cluster, first look at a simple routing algorithm.

1. Remainder Hash

For example, the string str corresponds to the hashcode is 50, the number of servers is 3, the remainder to get 1,str corresponding node Node1, so the routing algorithm to route str to the NODE1 server. Because of the hashcode randomness, the use of the remainder hash routing algorithm can ensure that the cache data in the entire Memcache server cluster has a relatively balanced distribution.

If the scalability of the server cluster is not considered (what is scalability, see the large Web site Architecture learning Note), then the remainder hash algorithm can meet almost the majority of cache routing requirements, but when the distributed cache cluster needs to be expanded, it is difficult.

Just assume that the Memcache server cluster from 3 to 4, change the server list, still use the remainder hash,50 to 4 is 2, corresponding to Node2, but Str originally exists Node1, which leads to cache misses. If this is not clear enough, then as an example, there are hashcode for the 0~19 of the 20 data, then:

Hashcode 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Server to which to route 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1

Now I have expanded to 4 units, bold red to indicate hit:

Hashcode 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Server to which to route 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

If I expand to 20+ number of units, only the first three hashcode corresponding key is hit, that is, 15%. Of course this is a simple example, the reality is certainly much more complicated than this, but suffice to note that the use of the remainder hash of the routing algorithm, in the expansion of the time will cause a large number of data can not be hit correctly (in fact, not only can not hit, the large number of unreachable data in the original cache before being removed in memory). This result is obviously unacceptable, in the Web site business, most of the business data operation requests are actually obtained through the cache, only a small number of read operations will access the database, so the database load capacity is based on the premise of the cache is designed. When most of the cached data is not read correctly because of the server capacity expansion, the pressure on the data access falls on the database, which will greatly exceed the load capacity of the database, which can seriously lead to database downtime.

There are solutions to this problem and the steps to be resolved are:

(1) In the site traffic trough, usually late at night, technical team overtime, capacity expansion, restart the server

(2) Gradually preheat the cache by simulating the request so that the data in the cache server is re-distributed

2. Consistent hash algorithm

The consistent hash algorithm implements the hash mapping of the key to the cache server through a data structure called the consistent hash ring, and looks at one of my own drawings:

The specific algorithm process is: first constructs a length of 232 integer ring (this ring is called the consistency hash ring), according to the node name hash value (its distribution is [0, 232-1]) puts the cache server node on this hash ring, Then the hash value is calculated based on the key value of the data that needs to be cached (its distribution is also [0, 232-1]), and then the hash ring is searched clockwise to find the nearest server node with the hash value of the key value, completing the key-to-server mapping lookup.

As shown in the figure, three node points are located on the hash ring three positions, and then the key value according to its hashcode, on the hash ring has a fixed position, the position is fixed, key will be clockwise to find a node nearest to it, Store the data in this node's memcache server. Use a hash ring if you add a node, look at:

See I added a Node4 node, only affect the data of a key value, originally this key value should be on the NODE1 server, and now go to NODE4. Using the consistent hash algorithm, it will also affect the entire cluster, but the impact is only the bold paragraph, compared to the remainder hash algorithm affects far more than half of the impact rate, this effect is much smaller. More importantly, the more cache server nodes in the cluster, the smaller the impact of the increase in the node , a good understanding. In other words, as the size of the cluster increases, the probability of continuing to hit the original cached data is increasing, although there are still small portions of the data cache that cannot be read in the server, but this ratio is small enough that it does not cause a fatal load on the database even if the database is accessed.

As for the specific application, this 232-length consistent hash ring is usually implemented using a two-fork search tree, and the binary search tree is the problem of the algorithm, and it can query the relevant data on its own.

Memcache Implementation principle

The first thing to illustrate is that the memcache data is stored in memory and stored in memory, which the individual thinks means:

1, access to data faster than the traditional relational database, because Oracle, MySQL, these traditional relational database in order to maintain data persistence, data stored in the hard disk, IO operation slow

2, memcache data stored in memory also means that as long as the memcache restart, the data will disappear

3, since memcache data stored in memory, then bound to be limited by the number of machines, this previous article has been written many times, 32-bit machine can only use up to 2GB of memory space, 64-bit machine no upper limit

Then we take a look at the principle of memcache, memcache the most important could there be memory allocation of content, memcache the use of memory allocation method is fixed space allocation, or draw a picture of their own description:

This picture involves the slab_class, slab, page, chunk four concepts, the relationship between them is:

1, memcache the memory space into a group of slab

2, each slab under a number of page, each page by default is 1M, if a slab occupy 100M memory, then this slab should have 100 page

3, each page contains a set of Chunk,chunk is a real place to store data, the same slab inside the size of the chunk is fixed

4, with the same size chunk slab is organized together, called Slab_class

The Memcache memory allocation is limited by the number of Allocator,slab, several, more than 10, or dozens of, which is related to the configuration of the boot parameter.

The value in Memcache is determined by the size of value, and value is always stored in the slab closest to the chunk size, such as slab[1] chunk size 80 bytes, slab[2] The chunk size is 100 bytes, slab[3] Chunk size is 128 bytes ( chunk in neighboring slab is basically growing at 1.25, which can be specified with-F at memcache startup ), Then come over a 88-byte value, this value will be placed in number 2nd slab. Put slab, the first slab to apply for memory, the application of memory is in page units, so when the first data is placed, regardless of size, there will be a 1M size page is assigned to the slab. After applying to page, slab will slice the page's memory by chunk size, so it becomes an chunk array, and finally select one of the chunk array to store the data.

If there is no chunk in this slab can be assigned what to do, if memcache boot does not append-m (Prohibit LRU, in this case memory is not enough will be reported out of the errors), Then Memcache will clean up the data from the least recently used chunk in this slab, and then put up the latest data. For memcache memory allocation and recovery algorithm, summarize three points:

1, memcache memory allocation chunk There will be a waste of memory, 88 bytes of value allocated 128 bytes (immediately after the large use) of the chunk, the loss of 30 bytes, but this also avoids the problem of managing memory fragmentation

2, memcache the LRU algorithm is not for the global, is for the slab

3, it should be possible to understand why Memcache storage value size is limited, because a new data, slab will first in page to apply for a piece of memory, the requested memory is only 1 m, so the value size naturally cannot be greater than 1 m

Again summarize the characteristics and limitations of memcache

The above has done a more detailed interpretation of memcache, here again summarizes the limitations and characteristics of memcache:

1. There is no limit to the amount of item data that can be saved in memcache, as long as the memory is sufficient

2, Memcache single process in 32-bit machine The maximum use of memory 2G, this previous article mentioned several times, 64-bit machine is not limited

3, key Max is 250 bytes, more than this length can not be stored

4, single item maximum data is 1MB, more than 1MB of data is not stored

5, Memcache server is not secure, such as a known memcache node, you can telnet to the past, and through the flush_all to let existing key value pairs immediately expire

6. It is not possible to traverse all the item in Memcache because the operation is relatively slow and will block other operations

7, Memcache high-performance from the two-stage hash structure: the first stage in the client, the hash algorithm based on the key value of a node, the second stage on the server, through an internal hash algorithm, find the real item and return to the client. From the implementation point of view, Memcache is a non-blocking, event-based server program

8, Memcache set to add a key value, the incoming expiry 0 means that the key value is permanently valid, the key value will expire after 30 days, see the source code of MEMCACHE.C:

#define REALTIME_MAXDELTA 60*60*24*30static rel_time_t REALTIME (const time_t exptime) {       if (Exptime = = 0) return 0;
   if (Exptime > Realtime_maxdelta) {                                     if (exptime <= process_started)                                               return (rel_time_t) 1;                                               Return (rel_time_t) (exptime-process_started);         } else {                                                                                return (rel_time_t) (Exptime + current_time);}            }

This time of failure is written in the Memcache source code, the developer has no way to change the key value of memcache the expiration time is 30 days this limit

Memcache Instruction Summary

As mentioned above, a known memcache node, directly telnet past, you can use a variety of commands to operate the memcache, see below memcache what kinds of commands:

Command Role
Get Returns the value corresponding to key
Add Unconditionally set a key value, no increase, there is the overwrite
Set Add data to the corresponding key value and the operation fails if key already exists
Replace Replace the data with the corresponding key value and the operation fails if the key value does not exist
Stats Returns Memcache General statistics (read more below)
Stats items Returns the number of item in each slab and the age of the oldest item (the number of seconds the last access distance is now)
Stats Slabs Returns information for each slab created during the memcache run (read more below)
Version Returns the current Memcache version number
Flush_all Clears all key values, but does not delete items, so memcache still consumes memory at this time
Quit Close connection

Stats instruction Interpretation

Stats is a more important instruction to list the status of the current Memcache server, and take a set of data for example:

STAT PID 1023STAT Uptime 21069937STAT time 1447235954STAT version 1.4.5STAT pointer_size 64STAT rusage_user 1167.020934STA T rusage_system 3346.933170STAT curr_connections 29STAT total_connections 21STAT connection_structures 49STAT cmd_get 49STAT cmd_set 7458STAT cmd_flush 0STAT get_hits 7401STAT get_misses: (Delete, incr, DECR, CAs hits and misses number, CAS more one badval) STAT auth_cmds 0STAT auth_errors 0STAT bytes_read 22026555STAT bytes_written 8930466STAT limit_maxbytes 4134304000STAT Accepting_conns 1STAT listen_disabled_num 0STAT threads 4STAT bytes 151255336STAT Current_items 57146STAT Total_items 580656STAT evicitions 0

These parameters reflect the basic information of the Memcache server, which means:

Name of parameter Role
Pid Process ID of the Memcache server
Uptime Number of seconds the server has been running
Time Server's current UNIX timestamp
Version Memcache version
Pointer_size The current operating system pointer size, reflecting the number of operating system bits, 64 means that the Memcache server is a 64-bit
Rusage_user Cumulative user time for the process
Rusage_system Cumulative system time for processes
Curr_connections Number of connections currently open
Total_connections Number of connections that have been opened since the server started
Connection_structures Number of connection constructs allocated by the server
Cmd_get Get Command Total Request count
Cmd_set Set command Total Request count
Cmd_flush Flush_all Command Total Request count
Get_hits Total hits, important, the most important parameter of the cache is the cache hit rate, expressed in get_hits/(Get_hits + get_misses), such as this cache hit ratio is 99.2%
Get_misses Total number of Misses
Auth_cmds Number of times the authentication command was processed
Auth_errors Number of processing times for authentication failures
Bytes_read Total number of bytes read
Bytes_written Total number of bytes sent
Limit_maxbytes Memory size (in bytes) allocated to Memcache
Accepting_conns Whether the maximum value of the connection has been reached, 1 is reached, and 0 indicates that it has not reached
Listen_disabled_num Count the number of times the current server connection has reached the maximum number of connections, this should be 0 or close to 0, if this number is growing, be careful of our service
Threads The current number of memcache bus threads, because the memcache thread is event-driven and therefore does not have a single thread corresponding to a user request
bytes Total number of items stored in the current server bytes
Current_items The total number of items stored by the current server
Total_items Total number of items stored since server startup

Interpretation of Stats slab instruction

If you understand the Memcache storage mechanism above, let's look at the information in each slab, or take a set of data for example:

1 Stat1:chunk_size 96 2 ... 3 Stat 2:chunk_size 144 4 stat 2:chunks_per_page 7281 5 stat 2:total_pages 7 6 stat 2:total_chunks 50967 7 stat 2:used_chu Nks 45197 8 Stat 2:free_chunks 1 9 stat 2:free_chunks_end 576910 STAT 2:mem_requested 608463811 STAT 2:get_hits 4808412 ST At 2:cmd_set 5958827113 Stat 2:delete_hits 014 stat 2:incr_hits 015 stat 2:decr_hits 016 Stat 2:cas_hits 017 Stat 2:cas_ba Dval 018 ... STAT 3:chunk_size 21620 ...

First See, the second slab of the chunk_size (144)/The first slab of chunk_size (96) = 1.5, the third slab of chunk_size (216)/The slab of the second chunk_size (144) = 1.5, it can be determined that the growth factor of this memcache is 1.5,chunk_size to 1.5 times times the growth. Then explain the meaning of the following field:

Name of parameter Role
Chunk_size The current slab size of each chunk, in bytes
Chunks_per_page Each page can hold the number of chunk, because each page is fixed to 1M that is 1024*1024 bytes, so this value is (1024*1024/chunk_size)
Total_pages Total number of page assigned to current slab
Total_chunks Current slab maximum number of chunk that can be stored, this value is Total_pages*chunks_per_page
Used_chunks Number of chunks that have been allocated to the storage object
Free_chunks Number of chunk that have been used but have been recycled due to expiration
Free_chunks_end The number of chunk that are newly allocated but not yet used, and this value is not 0 indicates that the current slab has never had enough capacity
mem_requested The total number of bytes of memory space in the current slab that are requested to store data, (total_chunks*chunk_size)-mem_requested indicates how much memory is idle in the current slab, which includes unused slab+ used by Slab wasted memory
Get_hits The number of GET requests hit in the current slab
Cmd_set The number of all set command requests received in the current slab
Delete_hits Number of Delete requests hit in the current slab
Incr_hits Number of INCR requests hit in the current slab
Decr_hits Number of DECR requests hit in the current slab
Cas_hits Number of CAS requests hit in the current slab
Cas_badval Number of CAs requests that were hit in the current slab but failed to update

Seeing the output of this command is large, all information is useful. For example, such as the first slab used in the chunks very few, the second slab used in a lot of chunks, then you can consider the appropriate increase in memcache growth factor, so that some of the data fell to the first slab, the appropriate balance of two slab in memory, Avoid wasted space.

Java implementation examples of Memcache

Speaking so much, as a Java programmer, how can not write memcache to write the implementation of the client? Memcache's clients have many third-party jar packages that provide implementations, the better of which is xmemcached, xmemcached has high efficiency, IO non-blocking, low resource consumption, supports complete protocols, allows to set node weights, allows dynamic additions and deletions of nodes, supports JMX, It is widely used to support the integration with the spring framework, the use of connection pooling, and the advantages of good scalability. Here the use of Xmemcache to write a simple memcache customer single instance, and has not been verified, purely to stimulate:

public class memcachemanager{private static Memcachemanager instance = new Memcachemanager (); /** Xmemcache allows developers to adjust the load of memcache by setting the node weights, the higher the weight, the more data the Memcache node stores, the greater the load * * private static Memcachedclientbuilder MCB = new Xmemcachedclientbuilder (addrutil.getaddresses ("127.0.0.1:11211 127.0.0.2:11211 127.0.0.3:11211"), NE    W int[]{1, 3, 5});        private static Memcachedclient MC = null; /** Initialize Load client memcache information */static {mcb.setcommandfactory (New binarycommandfactory ());//Use binary file Mcb.s Etconnectionpoolsize (10);        Number of connection pools, that is, the number of clients try {mc = Mcb.build ();        } catch (IOException e) {e.printstacktrace ();        }} private Memcachemanager () {} public Memcachemanager getinstance () {    return instance; /** set data to memcache server */public void Set (String key, int expiry, Object obj) throws Exception {mc.se  T (key, expiry, obj);  /** get data from memcache Server */public Object get (String key) throws Exception {return mc.get (key); /** * MemCache The atomic update through the compare and set that is the CAS protocol, similar to optimistic locking, each request to store a data will be accompanied by a CAS value, MemCache * Compared to this CAS value and the current stored data of the CAS value is not        And so on, if the equivalent overwrites the old data, if not equal the update fails, which is particularly useful in concurrent environments */public boolean update (String key, Integer i) throws Exception {        getsresponse<integer> result = Mc.gets (key);        Long cas = Result.getcas ();        Attempt to update key corresponding value if (!mc.cas (key, 0, I, CAs)) {return false;    } return true; }}

Memcache Ultra-Detailed interpretation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.