Deep Memcached Analysis

Source: Internet
Author: User
Tags apc
Memcached is a distributed memory object cache system developed by danga.com (technical team operating LiveJournal) to reduce database load and improve performance in dynamic systems. I believe many people have used this item. This article aims to gain a deeper understanding of this outstanding open-source software through implementation and code analysis of memcached, we can further optimize it based on our needs. In the end, we will analyze the BSM_Memcache extension to better understand the usage of memcached. Some of the content in this article may require a good mathematical foundation as an aid. ◎ What is Memcached?Before explaining this question, we must first know what it is "not ". Many people use it as a storage carrier in the form of SharedMemory. Although memcached uses the same "Key => Value" method to organize data, however, it is very different from local caches such as shared memory and APC. Memcached is distributed, that is, it is not local. It is based on network connections (of course, it can also use localhost) to complete the service. It is an application-independent program or Daemon process ). Memcached uses the libevent library to implement network connection services. In theory, it can handle infinite connections, but unlike Apache, it is more stable and persistent connection oriented, therefore, its actual concurrency capability is limited. Under conservative circumstances, the maximum number of simultaneous connections of memcached is 200, which is related to the Linux thread capability. This value can be adjusted. For more information about libevent, see related documentation. Memcached memory usage is also different from APC. APC is based on shared memory and MMAP. memcachd has its own memory allocation algorithm and management method. It has nothing to do with shared memory and has no restrictions on shared memory. Generally, each memcached process can manage 2 GB of memory space. If you need more space, you can increase the number of processes. ◎ What is Memcached suitable?In many cases, memcached has been abused, which of course has to be complained about. I often see people posting on the Forum, similar to "How to Improve efficiency". The reply is "using memcached". As for how to use and where to use memcached, there is no such thing. Memcached is not omnipotent, and it is not applicable to all occasions. Memcached is a "distributed" Memory Object cache system. That is to say, applications that do not require "distribution", do not need to be shared, or are simply as small as those with only one server, memcached does not bring any benefits. On the contrary, memcached slows down system efficiency because network connections also require resources, even local UNIX connections. According to my previous test data, the local read/write speed of memcached is dozens of times slower than the direct PHP memory array, while the APC and shared memory modes are similar to direct arrays. It can be seen that using memcached is not cost-effective if it is only a local cache. Memcached is often used as the database front-end cache. Because it has much less overhead than the database, such as SQL parsing and disk operations, and uses memory to manage data, it can provide better performance than directly reading the database, in large systems, the same data is frequently accessed. memcached can greatly reduce the database pressure and improve the system execution efficiency. In addition, memcached is often used as a storage medium for data sharing between servers. For example, in the SSO system, data stored in the single-point logon status of the system can be stored in memcached and shared by multiple applications. It should be noted that memcached uses memory to manage data, so it is easy to lose. When the server is restarted or the memcached process is suspended, data will be lost, so memcached cannot be used to persistently store data. Many people mistakenly understand that memcached has a very good performance, so far as the comparison of memory and hard disk, in fact, memcached does not get hundreds of thousands of read/write speeds to use memory, its actual bottleneck lies in network connection. Compared with the database system that uses disks, it has the advantage of being "light" because it does not overhead and has direct read/write methods, it can easily cope with a very large amount of data exchange, so it is often possible that the two Gigabit network bandwidth is full, the memcached process itself does not occupy much CPU resources. ◎ How Memcached worksIn the following sections, you are advised to prepare the source code of memcached. Memcached is a traditional network service program. If the-d parameter is used during startup, It will be executed as a daemon process. Daemon. c is used to create daemon. This program has only one daemon function. This function is very simple (unless otherwise stated, the Code is subject to 1.2.1 ):

CODE: # include <fcntl. h> # include <stdlib. h> # include <unistd. h> int daemon (nochdir, noclose) int nochdir, noclose; {int fd; switch (fork () {case-1: return (-1); case 0: break; default: _ exit (0);} if (setsid () =-1) return (-1); if (! Nochdir) (void) chdir ("/"); if (! Noclose & (fd = open ("/dev/null", O_RDWR, 0 ))! =-1) {(void) dup2 (fd, STDIN_FILENO); (void) dup2 (fd, STDOUT_FILENO); (void) dup2 (fd, STDERR_FILENO ); if (fd> STDERR_FILENO) (void) close (fd) ;}return (0 );}

After this function fork the entire process, the parent process will exit, and then relocates STDIN, STDOUT, and STDERR to the empty device, and daemon will be established successfully. The starting process of Memcached itself is in memcached. the main function of c has the following sequence: 1. Call settings_init () set the initialization parameter 2. Read the parameter from the startup command to set the setting Value 3. Set the LIMIT parameter 4. Start network socket listening (if not socketpath exists) (UDP is supported after 1.2) 5. Check the user identity (Memcached does not allow root startup) 6. If a socketpath exists, enable UNIX local connection (Sock pipeline) 7. If it is started in-d mode, create a daemon process (call the daemon function above) 8. initialize item, event, status information, hash, connection, slab 9. If managed takes effect in settings, create a bucket array 10. Check whether the Memory Page needs to be locked. 11. initialize the signal, connect to and delete the queue. 12. If the daemon method is used, process ID 13 and event start. The startup process ends and the main function enters the loop. In the daemon mode, stderr has been redirected to a black hole, so no visible error information is reported during execution. The main cyclic function of memcached. c is drive_machine. The input parameter is the structure pointer to the current connection. The action is determined based on the state of the state member. Memcached uses a set of custom protocols for data exchange, its protocol documentation can refer to: http://code.sixapart.com/svn/memcached/trunk/server/doc/protocol.txt In the API, the line feed symbol is unified \ r \ n ◎ Memcached Memory ManagementMemcached has a special memory management method. To improve efficiency, Memcached uses pre-apply and grouping methods to manage memory space, instead of malloc every time data needs to be written, free a pointer when deleting data. Memcached uses slab> chunk to manage the memory. The slab space partitioning algorithms in slabs. c of 1.1 and 1.2 are somewhat different, which will be described later. Slab can be understood as a memory block. An slab is the minimum unit of memory applied for by memcached at a time. In memcached, the size of an slab is 1048576 bytes (1 MB) by default ), therefore, memcached uses all the memory in MB. Each slab is divided into several chunks, and each chunk stores an item. Each item also contains the item struct, key, and value (note that the value in memcached is only a string ). Slab forms a linked list by their own IDs. These linked lists are mounted on an slabclass array by id, and the entire structure looks a bit like a two-dimensional array. The length of slabclass is 21 in 1.1 and 1.2 in 200. Slab has an initial chunk size, which is 1 byte in 1.1, 80 byte in 1.2, and a factor value in 1.2. The default value is 1.25 in 1.1, chunk Size indicates the initial size * 2 ^ n, n is classid, that is, slab with id 0. Each chunk is 1 byte and slab with id 1, the size of each chunk is 2 bytes, And the id is 2 slab. The size of each chunk is 4 bytes ...... Slab with id 20, each chunk is 1 MB, that is, slab with id 20 has only one chunk:

CODE: void slabs_init (size_t limit) {int I; int size = 1; mem_limit = limit; for (I = 0; I <= POWER_LARGEST; I ++, size * = 2) {slabclass [I]. size = size; slabclass [I]. perslab = POWER_BLOCK/size; slabclass [I]. slots = 0; slabclass [I]. sl_curr = slabclass [I]. sl_total = slabclass [I]. slabs = 0; slabclass [I]. end_page_ptr = 0; slabclass [I]. end_page_free = 0; slabclass [I]. slab_list = 0; slabclass [I]. list_size = 0; Slabclass [I]. killing = 0;}/* for the test suite: faking of how much we 've already malloc 'd */{char * t_initial_malloc = getenv ("T_MEMD_INITIAL_MALLOC "); if (t_initial_malloc) {mem_malloced = atol (getenv ("T_MEMD_INITIAL_MALLOC") ;}}/* pre-allocate slabs by default, unless the environment variable for testing is set to something non-zero */{char * pre_alloc = getenv ("T_MEMD_SLABS_ALLOC"); if (! Pre_alloc | atoi (pre_alloc) {slabs_preallocate (limit/POWER_BLOCK );}}}

In 1.2, the chunk size indicates the initial size * f ^ n, f indicates the factor, in memcached. c defines n as classid. At the same time, not all 201 heads must be initialized. Because factor is variable, initialization only loops until the calculated size reaches half of slab size, and it starts from id1, that is, slab with id 1, each chunk is 80 bytes in size, slab with id 2, and the size of each chunk is 80 * f, slab with id 3, each chunk is 80 * f ^ 2. The initialization size has a correction value CHUNK_ALIGN_BYTES, which is used to ensure the n-byte arrangement (the result is an integral multiple of CHUNK_ALIGN_BYTES ). In this way, memcached1.2 is initialized to id40 under standard conditions. Each chunk in this slab is 504692 in size, and each slab has two chunks. Finally, the slab_init function will complement an id41 at the end, which is the whole block, that is, this slab only has a 1 MB chunk:

CODE: void slabs_init (size_t limit, double factor) {int I = POWER_SMALLEST-1; unsigned int size = sizeof (item) + settings. chunk_size;/* Factor of 2.0 means use the default memcached behavior */if (factor = 2.0 & size <128) size = 128; mem_limit = limit; memset (slabclass, 0, sizeof (slabclass); while (++ I <POWER_LARGEST & size <= POWER_BLOCK/2) {/* Make sure items are always n-byte align Ed */if (size % CHUNK_ALIGN_BYTES) size + = CHUNK_ALIGN_BYTES-(size % CHUNK_ALIGN_BYTES); slabclass [I]. size = size; slabclass [I]. perslab = POWER_BLOCK/slabclass [I]. size; size * = factor; if (settings. verbose> 1) {fprintf (stderr, "slab class % 3d: chunk size % 6d perslab % 5d \ n", I, slabclass [I]. size, slabclass [I]. perslab) ;}} power_largest = I; slabclass [power_largest]. size = POWER_BLOCK; slabc Lass [power_largest]. perslab = 1;/* for the test suite: faking of how much we 've already malloc 'd */{char * t_initial_malloc = getenv ("T_MEMD_INITIAL_MALLOC"); if (t_initial_malloc) {mem_malloced = atol (getenv ("T_MEMD_INITIAL_MALLOC") ;}# ifndef DONT_PREALLOC_SLABS {char * pre_alloc = getenv ("T_MEMD_SLABS_ALLOC"); if (! Pre_alloc | atoi (pre_alloc) {slabs_preallocate (limit/POWER_BLOCK) ;}# endif}

It can be seen from the above that memcached memory allocation is redundant. When an slab cannot be divisible by the chunk size it owns, the remaining space at the end of slab will be discarded, for example, in id40, two chunks occupy 1009384 bytes. This slab occupies 1 MB in total, and 39192 bytes are wasted. Memcached uses this method to allocate memory to quickly locate the slab classid through the item length, which is similar to hash because the item length can be calculated, for example, if the length of an item is 300 bytes, we can see in 1.2 that it should be stored in slab of id7. Because the chunk size of id6 is 252 bytes according to the calculation method above, the chunk size of id7 is 316 bytes, And the chunk size of id8 is 396 bytes, indicating that all items between 252 and 316 bytes should be saved in id7. Similarly, in 1.1, it can be calculated that it is between 256 and 512, and should be placed in id9 with chunk_size of 512 (32-bit system ). During Memcached initialization, slab will be initialized (we can see that slabs_init () is called in the main function ()). It checks a constant DONT_PREALLOC_SLABS in slabs_init (). If this is not defined, it indicates that slab is initialized using the pre-allocated memory mode. In this way, all defined slabclasses are used, create an slab for each id. This means that 1.2 will allocate 41 MB of slab space after starting the process in the default environment. In this process, memcached's second memory redundancy occurs, because an id may not be used at all, but it also applies for an slab by default. Each slab will use 1 MB of memory. When an slab is used up, if a new item needs to be inserted with this id, it will apply for a new slab. When applying for a new slab, the slab linked list with the corresponding id will grow, this linked list is exponentially increasing. In the grow_slab_list function, the length of this chain changes from 1 to 2, from 2 to 4, from 4 to 8 ...... :

CODE: static int grow_slab_list (unsigned int id) {slabclass_t * p = & slabclass [id]; if (p-> slabs = p-> list_size) {size_t new_size = p-> list_size? P-> list_size * 2: 16; void * new_list = realloc (p-> slab_list, new_size * sizeof (void *); if (new_list = 0) return 0; p-> list_size = new_size; p-> slab_list = new_list;} return 1 ;}

When positioning an item, slabs_clsid function is used. The input parameter is the item size and the returned value is classid. From this process, we can see that the third memory redundancy of memcached occurs during item storage, items are always smaller than or equal to the chunk size. when the item is smaller than the chunk size, space is wasted. ◎ Memcached NewHash AlgorithmThe Memcached item is saved based on a large hash table. Its actual address is the chunk offset in slab. However, its location depends on the hash result of key and is found in primary_hashtable. All hash and item operations are defined in assoc. c and items. c. Memcached uses an algorithm called NewHash, which has good performance and high efficiency. The NewHash values of 1.1 and 1.2 are different. The main implementation method is the same. The hash Functions of 1.2 are optimized and better adaptive. For the NewHash prototype, see http://burtleburtle.net/bob/hash/evahash.html. Mathematicians are always a little strange ~ U4 and u1 are defined to facilitate the conversion. u4 is the unsigned long integer, and u1 is the unsigned char (0-255 ). For specific code, refer to the 1.1 and 1.2 source code packages. Note that the hashtable length here is also different between 1.1 and 1.2. The HASHPOWER constant defined in 1.1 is 20, and the hashtable table length is hashsize (HASHPOWER), which is 4 MB (hashsize is a macro, 1 shifts n places to the right). In 1.2, the variable 16 indicates that the hashtable table is 65536 long:

CODE: typedef unsigned long int ub4;/* unsigned 4-byte quantities */typedef unsigned char ub1;/* unsigned 1-byte quantities */# define hashsize (n) (ub4) 1 <(n) # define hashmask (n) (hashsize (n)-1)

In assoc_init (), primary_hashtable is initialized. The corresponding hash operations include assoc_find (), assoc_expand (), assoc_move_next_bucket (), assoc_insert (), and assoc_delete (), the read/write operation corresponding to item. Here, assoc_find () is a function used to find the corresponding item address based on the key and key length. (Note that in C, many times the string and string length are directly input at the same time, instead of strlen in the function), the returned is the item structure pointer, and its data address is on a chunk in slab. Items. c is the data item operation program. Each complete item contains several parts. It is defined in item_make_header () as: key nkey: key Length flags: user-Defined flag (in fact this flag is not enabled in memcached) nbytes: Value length (including Line Break symbol \ r \ n) suffix: suffix Buffer nsuffix: suffix length a complete item length is the Key Length + value Length + suffix Length + item structure size (32 bytes). The item operation calculates the slab classid Based on the length. Each bucket in hashtable has a double-stranded table. When item_init () is used, the heads, tails, and sizes arrays are initialized to 0, the size of the three groups is constant LARGEST_ID (the default value is 255, which needs to be modified with the factor). During each item_assoc () operation, it will first try to get an idle chunk from slab. If there is no available chunk, it will scan 50 times in the linked list to get an item kicked off by LRU and unlink it, insert the items to be inserted into the linked list. Note the refcount member of item. After an item is unlinked, it is removed from the linked list. It is not free immediately, but put in the delete Queue (item_unlink_q () function ). Items correspond to some read/write operations, including remove, update, and replace. Of course, the most important thing is the alloc operation. Another feature of item is its expiration time, which is a very useful feature of memcached. Many applications rely on memcached's item expiration, such as session storage and operation locks. The item_flush_expired () function is used to scan the items in the table and perform the unlink operation on the expired items. Of course, this is only a collection action. In fact, you need to make time judgments during get:

CODE:/* expires items that are more recent than the oldest_live setting. */void item_flush_expired () {int I; item * iter, * next; if (! Settings. oldest_live) return; for (I = 0; I <LARGEST_ID; I ++) {/* The LRU is sorted in decreasing time order, and an item's timestamp * is never newer than its last access time, so we only need to walk * back until we hit an item older than the oldest_live time. * The oldest_live checking will auto-expire the remaining items. */for (iter = heads [I]; iter! = NULL; iter = next) {if (iter-> time> = settings. oldest_live) {next = iter-> next; if (iter-> it_flags & ITEM_SLABBED) = 0) {item_unlink (iter );}} else {/* We 've hit the first old item. continue to the next queue. */break ;}}}}

CODE:/* wrapper around assoc_find which does the lazy expiration/deletion logic */item * get_item_notedeleted (char * key, size_t nkey, int * delete_locked) {item * it = assoc_find (key, nkey); if (delete_locked) * delete_locked = 0; if (it & (it-> it_flags & ITEM_DELETED )) {/* it's flagged as delete-locked. let's see if that condition is past due, and the 5-second delete_timer just hasn't gotten to it Yet... */if (! Item_delete_lock_over (it) {if (delete_locked) * delete_locked = 1; it = 0 ;}} if (it & settings. oldest_live & settings. oldest_live <= current_time & it-> time <= settings. oldest_live) {item_unlink (it); it = 0 ;}if (it & it-> exptime <= current_time) {item_unlink (it ); it = 0;} return it ;}

Memcached's memory management method is very sophisticated and efficient. It greatly reduces the number of times of direct alloc system memory, and reduces the probability of function overhead and memory fragmentation, although this method may cause some redundant waste, this waste is negligible in large-scale system applications.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.