Memcache usage Explanation

Source: Internet
Author: User
Tags cas memcached

Memcache is a free, open source, high performance, distributed, distributed memory object caching system for dynamic Web applications to reduce the load on the database. It improves the speed of website access by caching data and objects in memory to reduce the number of times a database is read.

What is Memcache?

Memcache is a free, open source, high performance, distributed, distributed memory object caching system for dynamic Web applications to reduce the load on the database. It improves the speed of website access by caching data and objects in memory to reduce the number of times a database is read. Memcache is a hashmap that stores key-value pairs, in memory for arbitrary data (such as strings, objects, etc.) used by Key-value storage, data can come from database calls, API calls, or page rendering results. Memcache design concept is small and powerful, its simple design facilitates rapid deployment, easy to develop and solve many challenges facing large-scale data cache, and the open API enables Memcache to be used in Java, c/c++/c#, Perl, Python, PHP, Most popular programming languages like Ruby.

Also, say the difference between memcache and memcached:

1, Memcache is the name of the project

2, memcached is Memcache server can execute the name of the file

Memcache's official website is http://memcached.org/

Memcache Access model

In order to deepen understanding, I imitate the original Ali technical expert Hae teacher "large Web site technology architecture Core Principles and Case analysis" a book memcache part, I drew a picture:

  

In particular, Memcache is known as "distributed Cache", but Memcache itself does not have a distributed function, memcache clusters do not communicate with each other (in contrast, such as JBoss cache, a server has cached data updates, Notifies other machines in the cluster to update the cache or clear the cached data), the so-called "distributed", is completely dependent on the implementation of the client program, as in the above diagram of the process.

At the same time, based on this graph, the process of memcache once write cache:

1. Application input requires write-cached data

2, API will key input routing algorithm module, routing algorithm based on key and memcache Cluster Server list to get a server number

3, the server number to get memcache and its IP address and port number

4, the API calls the communication module and the specified number of server communication, writes data to the server, completes a distributed cache write operation

Read cache and write cache, as long as the same routing algorithm and server list, as long as the application query is the same Key,memcache client always access the same client to read the data, as long as the server also caches the data, can guarantee the cache hit.

This way of Memcache cluster is also from the aspect of partition fault tolerance, if Node2 down, then Node2 stored on the data are not available, at this time because the cluster Node0 and Node1 still exist, the next request Node2 stored in the key value, Must be no hit, then first get the data to be cached from the database, and then the routing algorithm module according to the key value in Node0 and Node1 Select a node, the corresponding data into the next time you can go cache, this cluster approach is very good, but the disadvantage is the cost is relatively large.

Consistent hash algorithm

From the above diagram, we can see a very important problem, that is, the management of the server cluster, the routing algorithm is very important, and the same as the load-balancing algorithm, the routing algorithm determines exactly which server in the cluster, first look at a simple routing algorithm.

1. Remainder Hash

For example, the string str corresponds to the hashcode is 50, the number of servers is 3, the remainder to get 1,str corresponding node Node1, so the routing algorithm to route str to the NODE1 server. Because of the hashcode randomness, the use of the remainder hash routing algorithm can ensure that the cache data in the entire Memcache server cluster has a relatively balanced distribution.

If the scalability of the server cluster is not considered (what is scalability, see the large Web site Architecture learning Note), then the remainder hash algorithm can meet almost the majority of cache routing requirements, but when the distributed cache cluster needs to be expanded, it is difficult.

Just assume that the Memcache server cluster from 3 to 4, change the server list, still use the remainder hash,50 to 4 is 2, corresponding to Node2, but Str originally exists Node1, which leads to cache misses. If this is not clear enough, then as an example, there are hashcode for the 0~19 of the 20 data, then:

  

Now I have expanded to 4 units, bold red to indicate hit:

  

If I expand to 20+ number of units, only the first three hashcode corresponding key is hit, that is, 15%. Of course this is a simple example, the reality is certainly much more complicated than this, but suffice to note that the use of the remainder hash of the routing algorithm, in the expansion of the time will cause a large number of data can not be hit correctly (in fact, not only can not hit, the large number of unreachable data in the original cache before being removed in memory). This result is obviously unacceptable, in the Web site business, most of the business data operation requests are actually obtained through the cache, only a small number of read operations will access the database, so the database load capacity is based on the premise of the cache is designed. When most of the cached data is not read correctly because of the server capacity expansion, the pressure on the data access falls on the database, which will greatly exceed the load capacity of the database, which can seriously lead to database downtime.

There are solutions to this problem and the steps to be resolved are:

(1) In the site traffic trough, usually late at night, technical team overtime, capacity expansion, restart the server

(2) Gradually preheat the cache by simulating the request so that the data in the cache server is re-distributed

2. Consistent hash algorithm

The consistent hash algorithm implements the hash mapping of the key to the cache server through a data structure called the consistent hash ring, and looks at one of my own drawings:

  

The specific algorithm process is: first constructs a length of 232 integer ring (this ring is called the consistency hash ring), according to the node name hash value (its distribution is [0, 232-1]) puts the cache server node on this hash ring, Then the hash value is calculated based on the key value of the data that needs to be cached (its distribution is also [0, 232-1]), and then on the hash ring, the nearest server node is searched clockwise from the hash value of this key value, completing the key-to-server mapping lookup.

As shown in the figure, three node points are located on the hash ring three positions, and then the key value according to its hashcode, on the hash ring has a fixed position, the position is fixed, key will be clockwise to find a node nearest to it, Store the data in this node's memcache server. Use a hash ring if you add a node, look at:

What happens to a node, look at:

  

See I added a Node4 node, only affect the data of a key value, originally this key value should be on the NODE1 server, and now go to NODE4. Using the consistent hash algorithm, it will also affect the entire cluster, but the impact is only the bold paragraph, compared to the remainder hash algorithm affects far more than half of the impact rate, this effect is much smaller. More importantly, the more cache server nodes in the cluster, the smaller the impact of the increase in the node, a good understanding. In other words, as the size of the cluster increases, the probability of continuing to hit the original cached data is increasing, although there are still small portions of the data cache that cannot be read in the server, but this ratio is small enough that it does not cause a fatal load on the database even if the database is accessed.

As for the specific application, this 232-length consistent hash ring is usually implemented using a two-fork search tree, and the binary search tree is the problem of the algorithm, and it can query the relevant data on its own.

Memcache Implementation principle

The first thing to illustrate is that the memcache data is stored in memory and stored in memory, which the individual thinks means:

1, access to data faster than the traditional relational database, because Oracle, MySQL, these traditional relational database in order to maintain data persistence, data stored in the hard disk, IO operation slow

2, memcache data stored in memory also means that as long as the memcache restart, the data will disappear

3, since memcache data stored in memory, then bound to be limited by the number of machines, this previous article written many times, 32-bit machine can only use 2GB memory space, 64-bit machine can be considered no upper limit

Then we take a look at the principle of memcache, memcache the most important could there be memory allocation of content, memcache the use of memory allocation method is fixed space allocation, or draw a picture of their own description:

  

This picture involves the slab_class, slab, page, chunk four concepts, the relationship between them is:

1, memcache the memory space into a group of slab

2, each slab under a number of page, each page by default is 1M, if a slab occupy 100M memory, then this slab should have 100 page

3, each page contains a set of Chunk,chunk is a real place to store data, the same slab inside the size of the chunk is fixed

4, with the same size chunk slab is organized together, called Slab_class

The Memcache memory allocation is limited by the number of Allocator,slab, several, more than 10, or dozens of, which is related to the configuration of the boot parameter.

The value in Memcache is determined by the size of value, and value is always stored in the slab closest to the chunk size, such as slab[1] chunk size 80 bytes, slab[2] The chunk size is 100 bytes, slab[3] Chunk size is 128 bytes (chunk in adjacent slab is basically growing at 1.25, Memcache can be specified with-F at startup), then come over a 88-byte value, This value will be placed in the number 2nd slab. Put slab, the first slab to apply for memory, the application of memory is in page units, so when the first data is placed, regardless of size, there will be a 1M size page is assigned to the slab. After applying to page, slab will slice the page's memory by chunk size, so it becomes an chunk array, and finally select one of the chunk array to store the data.

If there is no chunk in this slab can be assigned what to do, if memcache boot does not append-m (Prohibit LRU, in this case memory is not enough will be reported out of the errors), Then Memcache will clean up the data from the least recently used chunk in this slab, and then put up the latest data. For memcache memory allocation and recovery algorithm, summarize three points:

1, memcache memory allocation chunk There will be a waste of memory, 88 bytes of value allocated 128 bytes (immediately after the large use) of the chunk, the loss of 30 bytes, but this also avoids the problem of managing memory fragmentation

2, memcache the LRU algorithm is not for the global, is for the slab

3, it should be possible to understand why Memcache storage value size is limited, because a new data, slab will first in page to apply for a piece of memory, the requested memory is only 1 m, so the value size naturally cannot be greater than 1 m

Again summarize the characteristics and limitations of memcache

The above has done a more detailed interpretation of memcache, here again summarizes the limitations and characteristics of memcache:

1. There is no limit to the amount of item data that can be saved in memcache, as long as the memory is sufficient

2, Memcache single process in 32-bit machine The maximum use of memory 2G, this previous article mentioned several times, 64-bit machine is not limited

3, key Max is 250 bytes, more than this length can not be stored

4, single item maximum data is 1MB, more than 1MB of data is not stored

5, Memcache server is not secure, such as a known memcache node, you can telnet to the past, and through the flush_all to let existing key value pairs immediately expire

6. It is not possible to traverse all the item in Memcache because the operation is relatively slow and will block other operations

7, Memcache high-performance from the two-stage hash structure: the first stage in the client, the hash algorithm based on the key value of a node, the second stage on the server, through an internal hash algorithm, find the real item and return to the client. From the implementation point of view, Memcache is a non-blocking, event-based server program

8, Memcache set to add a key value, the incoming expiry 0 means that the key value is permanently valid, the key value will expire after 30 days, see the source code of MEMCACHE.C:

[JS] View plaincopy

#define REALTIME_MAXDELTA 60*60*24*30

Static rel_time_t Realtime (const time_t exptime) {

if (Exptime = = 0) return 0;

if (Exptime > Realtime_maxdelta) {

if (Exptime <= process_started)

Return (rel_time_t) 1;

Return (rel_time_t) (exptime-process_started);

} else {

Return (rel_time_t) (Exptime + current_time);

}

}

This time of failure is written in the Memcache source code, the developer has no way to change the key value of memcache the expiration time is 30 days this limit

Memcache Instruction Summary

As mentioned above, a known memcache node, directly telnet past, you can use a variety of commands to operate the memcache, see below memcache what kinds of commands:

Command function

Get returns the value corresponding to key

Add a key value, no add success and prompt STORED, there is a failure and prompt not_stored

Set unconditionally sets a key value, does not increase, has the overwrite, the operation succeeds prompt stored

Replace replaces the data with the corresponding key value and fails if the key value does not exist

Stats return Memcache General statistics (read more below)

Stats items returns the number of item in each slab and the age of the oldest item (the number of seconds the last access distance is now)

Stats slabs returns information for each slab created during the memcache run (read more below)

Version returns the current Memcache revision number

Flush_all clears all key values, but does not delete items, so memcache still occupies memory at this time

Quit close Connection

Stats instruction Interpretation

Stats is a more important instruction to list the status of the current Memcache server, and take a set of data for example:

[JS] View plaincopy

STAT PID 1023

STAT Uptime 21069937

STAT Time 1447235954

STAT version 1.4.5

STAT Pointer_size 64

STAT Rusage_user 1167.020934

STAT Rusage_system 3346.933170

STAT Curr_connections 29

STAT total_connections 21

STAT Connection_structures 49

STAT Cmd_get 49

STAT Cmd_set 7458

STAT Cmd_flush 0

STAT get_hits 7401

STAT get_misses 57

.. (Delete, incr, DECR, CAs hits and misses, CAs one more badval)

STAT Auth_cmds 0

STAT auth_errors 0

STAT Bytes_read 22026555

STAT Bytes_written 8930466

STAT limit_maxbytes 4134304000

STAT Accepting_conns 1

STAT Listen_disabled_num 0

STAT Threads 4

STAT bytes 151255336

STAT Current_items 57146

STAT Total_items 580656

STAT Evicitions 0

These parameters reflect the basic information of the Memcache server, which means:

Role of parameter name

Process ID of the Pidmemcache server

Uptime number of seconds the server has been running

Time Server Current UNIX timestamp

Versionmemcache version

Pointer_size current operating system pointer size, reflecting the number of operating system bits, 64 means that the Memcache server is 64-bit

Cumulative user time for the rusage_user process

Cumulative system time for the Rusage_system process

Curr_connections the number of connections currently open

Total_connections the number of connections that have been opened since the server started

Connection_structures number of connection constructs allocated by the server

Cmd_getget Command Total Request count

Cmd_setset Command Total Request count

Cmd_flushflush_all Command Total Request count

Get_hits total Hits, important, the most important parameter of the cache is the cache hit rate, get_hits/(Get_hits + get_misses), such as the cache hit ratio is 99.2%

Total number of get_misses misses

Number of AUTH_CMDS authentication commands processed

Auth_errors authentication failed processing times

Bytes_read total number of bytes read

Bytes_written total number of bytes sent

Limit_maxbytes The amount of memory allocated to Memcache (in bytes)

Whether the Accepting_conns has reached the maximum value of the connection, 1 is reached, and 0 indicates that it has not reached

Listen_disabled_num count the number of times the current server connection has reached the maximum number of connections, this should be 0 or close to 0, if this number is growing, be careful of our service

Threads the current Memcache bus path, because Memcache threads are event-driven, there is no one thread corresponding to a user request

Bytes The total number of items stored in the current server bytes

Current_items The total number of items stored by the current server

Total_items The total number of items stored since the server started

Interpretation of Stats slab instruction

If you understand the Memcache storage mechanism above, let's look at the information in each slab, or take a set of data for example:

[JS] View plaincopy

1 Stat1:chunk_size 96

2 ...

3 STAT 2:chunk_size 144

4 STAT 2:chunks_per_page 7281

5 STAT 2:total_pages 7

6 STAT 2:total_chunks 50967

7 STAT 2:used_chunks 45197

8 STAT 2:free_chunks 1

9 STAT 2:free_chunks_end 5769

STAT 2:mem_requested 6084638

STAT 2:get_hits 48084

STAT 2:cmd_set 59588271

STAT 2:delete_hits 0

STAT 2:incr_hits 0

0 STAT 2:decr_hits

0 STAT 2:cas_hits

0 STAT 2:cas_badval

18 ...

STAT 3:chunk_size 216

20 ...

First See, the second slab of the chunk_size (144)/The first slab of chunk_size (96) = 1.5, the third slab of chunk_size (216)/The slab of the second chunk_size (144) = 1.5, it can be determined that the growth factor of this memcache is 1.5,chunk_size to 1.5 times times the growth. Then explain the meaning of the following field:

Role of parameter name

Chunk_size the size of the current slab per chunk, in bytes

Chunks_per_page each page can hold the number of chunk, because each page is fixed to 1M that is 1024*1024 bytes, so this value is (1024*1024/chunk_size)

Total_pages The total number of page assigned to the current slab

Total_chunks Current slab maximum number of chunk that can be stored, this value is Total_pages*chunks_per_page

Used_chunks number of chunks that have been allocated to the storage object

Free_chunks the number of chunk that have been used but have been recycled due to expiration

Free_chunks_end the number of chunk that are newly allocated but not yet used, and this value is not 0 indicates that the current slab has never experienced enough capacity

Mem_requested the total number of memory space bytes requested to store data in the current slab, (total_chunks*chunk_size)-mem_requested indicates how much memory is idle in the current slab, which includes unused slab+ Wasted memory in the slab used

Get_hits the number of GET requests hit in the current slab

Cmd_set the number of all set command requests received in the current slab

Delete_hits the number of delete requests hit in the current slab

Incr_hits the number of INCR requests that are hit in the current slab

Decr_hits the number of DECR requests that are hit in the current slab

Cas_hits the number of CAS requests hit in the current slab

Cas_badval the number of CAS requests that were hit in the current slab but failed to update

Seeing the output of this command is large, all information is useful. For example, such as the first slab used in the chunks very few, the second slab used in a lot of chunks, then you can consider the appropriate increase in memcache growth factor, so that some of the data fell to the first slab, the appropriate balance of two slab in memory, Avoid wasted space.

Java implementation examples of Memcache

Speaking so much, as a Java programmer, how can not write memcache to write the implementation of the client? Memcache's clients have many third-party jar packages that provide implementations, the better of which is xmemcached, xmemcached has high efficiency, IO non-blocking, low resource consumption, supports complete protocols, allows to set node weights, allows dynamic additions and deletions of nodes, supports JMX, It is widely used to support the integration with the spring framework, the use of connection pooling, and the advantages of good scalability. Here the use of Xmemcache to write a simple memcache customer single instance, and has not been verified, purely to stimulate:

[JS] View plaincopy

public class Memcachemanager

{

private static Memcachemanager instance = new Memcachemanager ();

/** Xmemcache allows developers to adjust the load of the memcache by setting the node weights, the higher the weight, the more data the Memcache node stores, the greater the load * *

private static Memcachedclientbuilder MCB =

New Xmemcachedclientbuilder (Addrutil.getaddresses ("127.0.0.1:11211 127.0.0.2:11211 127.0.0.3:11211"), New Int[]{1, 3 , 5});

private static Memcachedclient MC = null;

/** Initialize Load client memcache information */

Static

{

Mcb.setcommandfactory (New Binarycommandfactory ());

Using a binary file

Mcb.setconnectionpoolsize (10);

Number of connection pools, that is, number of clients

Try

{

MC = Mcb.build ();

}

catch (IOException E)

{

E.printstacktrace ();

}

}

Private Memcachemanager ()

{

}

Public Memcachemanager getinstance ()

{

return instance;

}

/** setting data to the Memcache server */

public void Set (String key, int expiry, Object obj) throws Exception

{

Mc.set (key, expiry, obj);

}

/** getting data from the memcache server */

Public Object get (String key) throws Exception

{

return Mc.get (key);

}

/**

* MemCache Atomic updates via compare and set, the CAS protocol, similar to optimistic locking, with a CAS value attached to each request to store a data MemCache

* Compared to this CAS value is equal to the current stored data of the CAS value, if the same as overwrite old data, if not equal to think that the update failed, which is particularly useful in a concurrency environment

*/

public boolean update (String key, Integer i) throws Exception

{

Getsresponse result = Mc.gets (key);

Long cas = Result.getcas ();

Attempt to update key corresponding to value

if (!mc.cas (key, 0, I, CAs))

{

return false;

}

return true;

}

}

Memcache usage Explanation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.