In-depth analysis of memcached server LRU

Source: Internet
Author: User
Today, I visited javaeye and saw a very useful blog. I mainly talked about the memcached hit rate problem ~~

Memcached, a well-known remote distribute cache (unknown) can be stored in javaeye, Google, or Baidu, however, the ranking of Baidu is too commercial (as can be seen from the recent XX event), so we recommend javaeye release. It is also very simple to use, it is used on many websites, and almost few large websites do not use memcached.

I have also read a lot about the internal mechanism of memcached. Article A little bit of gains, but I forgot it after reading it, and there is no profound concept. But recently I encountered a problem that forced me to come to know memcache again, next I will discuss my problems

Problem: I have tens of millions of data that will be frequently used. Currently, it must be stored in memcached to ensure access speed, but data in my memcached is often lost, the business requirement is that data in memcached cannot be lost. When my data is lost, the memory of the memcached server is only 60%, that is, 40% of the memory is seriously wasted. But not all applications are like this, and other applications have less memory waste. Why is LRU executed when the memory usage reaches 60%? (the reason is that LRU is executed because I found that my data is always put in front, and in this process, these data are not accessed. For example, when you first access, you can only access the million records, but the million records or the previous data have been lost. From the log, article 300w must have been put in ).

With these questions, I began to review the memcached product. First, we started from its memory model: we know that there are two ways to allocate memory in C ++: pre-allocation and dynamic allocation, obviously, pre-allocated memory will make Program It is faster, but its disadvantage is that it cannot effectively use the memory, and dynamic allocation can effectively use the memory, but it will reduce the program running efficiency. memcached memory allocation is based on the above principle, obviously, in order to get a faster speed, sometimes we have to change the space for time.

That is to say, memcached will allocate memory in advance. By the way, memcached allocates memory in Allocator mode. First, there are three concepts:
1 Slab
2 page
3 chunk
Generally, a memcahced process divides itself into several slabs in advance. Each slab has several pages, and each page has multiple chunks, if we think of the three Doon objects as objects, this is a one-to-many relationship. Generally speaking, the number of slab instances is limited to a few, dozens, or dozens. This is related to the memory configured by the process. The default page size for each slab is 1 MB. That is to say, if an slab occupies 100 MB of memory, the number of pages owned by this slab is by default, and Chunk is the final place where we have to store data.

For example, if I start a memcached process that occupies 100 MB of memory, telnet and telnet localhost 11211. After connecting to memcache, enter stats slabs and press enter to display the following data:
  1. Stat1: Chunk_size80
  2. Stat1: Chunks_per_page13107
  3. Stat1: Total_pages1
  4. Stat1: Total_chunks13107
  5. Stat1: Used_chunks13107
  6. Stat1: Free_chunks0
  7. Stat1: Free_chunks_end13107
  8. Stat2: Chunk_size100
  9. Stat2: Chunks_per_page10485
  10. Stat2: Total_pages1
  11. Stat2: Total_chunks10485
  12. Stat2: Used_chunks10485
  13. Stat2: Free_chunks0
  14. Stat2: Free_chunks_end10485
  15. Stat3: Chunk_size128
  16. Stat3: Chunks_per_page8192
  17. Stat3: Total_pages1
  18. Stat3: Total_chunks8192
  19. Stat3: Used_chunks8192
  20. Stat3: Free_chunks0
  21. Stat3: Free_chunks_end8192


The above is the detailed information of the first three slabs.
Chunk_size indicates the size of data storage blocks, chunks_per_page indicates the number of chunks in a memory page, and total_pages indicates the number of pages in each slab. Total_chunks indicates the total number of chunks in the slab (= total_pages * chunks_per_page). used_chunks indicates the number of chunks used in the slab. free_chunks indicates that the slab can also use the number of Chun.

From the above example, slab 1 has a total of 1 MB of memory space, and now it has been used up. slab2 also has 1 MB of memory space and is also used up. slab3 still has this situation. The size of the chunk in the three slab shows that the first chunk is 80B, the second is 100b, and the third is 128b, basically, the last one is 1.25 times the previous one, but we can control this growth. We can modify this value by getting the process parameter-f at startup, for example, if-F 1.1 indicates that the growth factor is 1.1, the chunk in the first slab is 80 B, and the chunk in the second slab is about 80*1.1.

If I have explained so many problems, I can see the cause. If I still cannot see the cause, I would like to add the following key sentence: the address for storing the new value in memcached is determined by the size of the value. The value is always stored in the slab closest to the chunk. For example, in the above example, if my value is 80B, all my values will always be stored in slab 1, and free_chunks in slab 1 are already 0. What should I do, if you do not append-m when starting memcached (LRU is disabled, in this case, the out of memory will occur if the memory is insufficient ), therefore, memcached clears the data in the chunk that is least recently used in this slab, and then puts the latest data. This explains why LRU is executed when my memory is 40%, because chunk_size in my other slab is much larger than my value, therefore, my values will not be placed in the slab, but will only be placed in the slab where the chunk closest to my value is located (these slab have been full and depressed ). As a result, my data is constantly overwritten, and the latter overwrites the former.

The problem is found, and the solution is still not found, because my data must require a hit rate of 100%, I can only adjust the slab Growth Factor and page size to try to make the hit rate close to 100%, however, it cannot be guaranteed that the hit rate is 100% (how can this problem be difficult to read? I will review my language level). If you say this is not a good solution, because my memcached server cannot be stopped, it doesn't matter whether there is another method, that is, memcached-tool, to execute the move command, such as: Move 3 1, the Representative moved a memory page in slab 3 to slab 1. Someone asked, What is the purpose? For example, my slab 20 has a very low utilization rate, however, there are many pages, such as 200, which is 200 m, and 2 Slab often has LRU. Obviously, if page is not enough, I can move 20 2, move a memory page of slab No. 20 to slab No. 2 to make more effective use of the memory. (Some people have said that it is troublesome to move only one page at a time? Ahuaxuan said, "write a script and repeat it ).

Some people say no. Data in my memcache cannot be lost. OK. Try memcachedb of Sina. Although I have never used it, I suggest you try it, it also makes use of memcache protocol and berkeleydb (here, I have to admire danga. I think its biggest contribution is not the memcache server itself, but the memcache protocol ), it is said to have been used in many Sina applications, including Sina blogs.

In addition, the stats slab command can view the slab situation in memcached, while the stats command can view some of your memcached health conditions, such as the hit rate, for example:

  1. Stat PID2232
  2. Stat uptime1348
  3. Stat time1218120955
  4. Stat version1.2.1
  5. Stat pointer_size32
  6. Stat curr_items0
  7. Stat total_items0
  8. Stat bytes0
  9. Stat curr_connections1
  10. Stat total_connections3
  11. Stat connection_structures2
  12. Stat cmd_get0
  13. Stat performance_set0
  14. Stat get_hits0
  15. Stat get_misses0
  16. Stat bytes_read26
  17. Stat bytes_written16655
  18. Stat limit_maxbytes104857600

From the above data, we can see that the memcached process has a very good hit rate. The get_misses rate is as low as 0. What's the problem? I just used telnet to connect the process that I just started, so curr_connections is 1, and total_items is 0, because I didn't put the data in, get_hits is 0, because I didn't call the get method, and the final result is misses, of course, 0. Wow, in other words, the hit rate is 100%, and then yy.

The conclusion is as follows:
Conclusion 1: The LRU obtained by memcached is not global, but for slab. It can be said that it is regional.
Conclusion 2: To increase the memcached hit rate, it is necessary to estimate the value size and adjust the memory page size and growth factor appropriately.
Conclusion 3: finding answers with questions is much better than simply looking at them.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.