[Turn]memcached improve hit ratio, memcached server LRU in-depth analysis

Last Update:2015-10-21 Source: Internet

Author: User

Tags memcached

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Memcached, the well-known remote distribute cache (do not know can be javaeye, or Google a little, or Baidu a bit, but in view of Baidu's ranking business taste too strong (from a recent event can be seen), So it is recommended to javaeye a little bit), it is very simple to use, it is used in many sites, very few large sites do not use memcached.

Once I have seen a lot of analysis of memcached internal mechanism of the article, a little harvest, but after seeing and forget, and there is no deep concept, but recently I encountered a problem, this problem forced me to re-understand memcache, below I explain the problems I have encountered

Question: I have tens of millions of of the data, this data will often be used, for now, it must be put into the memcached to ensure the speed of access, but my memcached data is often lost, and business needs are memcached data can not be lost. When my data is lost, the memory of memcached server is used to 60%, that is, 40% of the memory is seriously wasted. But not all applications are like this, and other applications have less wasted memory. Why is the memory used to 60% when the LRU is executed (the reason is that the LRU execution is because I found that my data is always put in front of the loss, and in this process, the data is not accessed, such as the first visit, only access to section 1000w, and section 300w or before the data have been lost, from the log, the article 300w must be put in it.

With these doubts, I began to re-examine the product of memcached, starting with its memory model: we know that C + + allocates memory in two ways, pre-allocation and dynamic allocation, obviously, pre-allocating memory will make the program faster, but its disadvantage is not efficient use of memory, The dynamic allocation can effectively use memory, but will make the program run inefficient, memcached memory allocation is based on the above principle, obviously in order to get faster speed, sometimes we have to space to change time.

That is to say, memcached will pre-allocate memory, yes, memcached allocating memory is called allocator, first of all, there are 3 concepts:
1 slab
2 page
3 Chunk
Explain that, in general, a memcahced process will be in advance to divide themselves into several slab, each slab under a number of page, each page under a number of chunk, if we think of these 3 strokes as object, this is the two one-to-many relationship. In general, the number of slab is limited, a few, more than 10, or dozens of, which is related to the memory of the process configuration. And each slab under the page by default is 1m, that is, if a slab consumes 100m of memory, then by default this slab has the number of page is 100, and chunk is the data we have to store the final place.

For example, I start a memcached process, take up memory 100m, and then open telnet,telnet localhost 11211, after connecting memcache, input stats slabs, enter, the following data appears:

Java code

STAT 1: chunk_size
STAT 1: chunks_per_page 13107
STAT 1: total_pages 1
STAT 1: total_chunks 13107
STAT 1: used_chunks 13107
STAT 1: free_chunks 0
STAT 1: free_chunks_end 13107
STAT 2: chunk_size
STAT 2: chunks_per_page 10485
STAT 2: total_pages 1
STAT 2: total_chunks 10485
STAT 2: used_chunks 10485
STAT 2: free_chunks 0
STAT 2: free_chunks_end 10485
STAT 3: chunk_size
STAT 3: chunks_per_page 8192
STAT 3: total_pages 1
STAT 3: total_chunks 8192
STAT 3: used_chunks 8192
STAT 3: free_chunks 0
STAT 3: free_chunks_end 8192

These are the top 3 slab details.
Chunk_size represents the size of the data storage block, Chunks_per_page represents the number of chunk in a page of memory, Total_pages represents the number of pages per slab. Total_chunks represents the total number of chunk under this slab (=total_pages * chunks_per_page), Used_chunks says slab has been used under chunk, free_ Chunks indicates the number of chunks that can be used under the slab.

From the above example Slab 11 has a total of 1m of memory space, and now has been used up, SLAB2 also has 1m of memory space, has been used up, SLAB3 is still the case. And from these 3 slab chunk size can be seen, the first chunk is 80b, the second is 100b, the 3rd is 128b, basically the last one is 1.25 times times the previous one, but this growth we can control, we can get the process parameters at the start-up F to modify this value, for example, –f 1.1 indicates that the growth factor is 1.1, then the first slab in chunk for 80b, the second slab chunk should be about 80*1.1.

Explain so much also should be able to see that I met the cause of the problem, if not to see, then I add the key: Memcached in the new value to store the address is determined by the size of the value, Value is always chosen to be stored in the chunk with the closest slab, such as the above example, if my value is 80b, then all of my value will always be stored in the number 1th slab, and the 1th slab free_chunks is 0, What do you do, if you do not append-m (no LRU, in this case, out of memory) when you start memcached, then memcached will clear the data from the least recently used chunk in this slab, and then put up the latest data. This explains why the LRU executes when my memory is still 40%, because the chunk_size in my other slab are much larger than my value, so my value doesn't fit in those slab. It will only be placed in the slab of the chunk closest to my value (and these slab are already full and depressed). This led to my data being overwritten, the latter covering the former.

The problem is found, the solution is still not found, because my data must require 100% hit rate, I can only adjust the growth factor of slab and page size to try to get close to 100%, but not 100% guaranteed hit is 100% (how to read it so awkward, Self-review of their own language level), if you say, this solution is not ah, because my memcached server can not stop Ah, it does not matter there is another way, is Memcached-tool, execute the move command, such as: Move 3 1, the representative of 3rd slab in a memory page moved to 1th Slab, someone asked, what is the use of, for example, my 20th number slab utilization is very low, but the page is a lot, such as 200, then is 200m, and 2 good slab often occur LRU, obviously page is not enough , I can move 20 2, a memory page of number 20th slab to 2nd slab, so that it can be more efficient use of memory (someone said, only one page at a time, how much trouble ah?) Ahuaxuan said, write a script, loop it.

Some people say no ah, my memcache data can not be lost ah, OK, try Sina memcachedb bar, although I have not used, but suggest you can try, it also make use of Memcache protocol and BerkeleyDB do (write here, I have to admire Danga, I think its biggest contribution is not memcache server itself, but Memcache protocol), it is said to be used in many Sina's applications, including Sina's blog.

Add, the stats Slab command can see the slab in memcached, and the stats command can see some of your memcached's health, such as hit rate, for example:

Java code

STAT pid 2232
STAT uptime 1348
STAT time 1218120955
STAT version 1.2. 1
STAT pointer_size
STAT curr_items 0
STAT total_items 0
STAT bytes 0
STAT curr_connections 1
STAT total_connections 3
STAT connection_structures 2
STAT cmd_get 0
STAT cmd_set 0
STAT get_hits 0
STAT get_misses 0
STAT bytes_read
STAT bytes_written 16655
STAT limit_maxbytes 104857600

From the above data can see this memcached process hit rate is very good, get_misses low up to 0, how to ah, because this process so I just started, I only use telnet to connect a bit, so curr_connections for 1, and Total_ Items is 0, because I did not put the data in, Get_hits is 0, because I did not call the Get method, the final result is misses of course 0, wow, in other words, hit rate is 100%, and yy.

The time has come to summarize, from this article we can get the following several conclusions:
Conclusion One, memcached LRU is not a global, but for slab, can be said to be regional.
Conclusion Two, it is necessary to increase the hit rate of memcached, estimate the size of our value and adjust the memory page size and growth factor appropriately.
Conclusion Three, it is much better to find answers with questions than to look at them.

[Turn]memcached improve hit ratio, memcached server LRU in-depth analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More