Go Summary of the principle and hit rate of memcache

Source: Internet
Author: User

from:http://blog.csdn.net/hbzyaxiu520/article/details/19546969

1 What is Memcache?
Memcache is a danga.com project, the first to serve LiveJournal, many people around the world use this cache project to build their own heavy load of the site, to share the pressure of the database.
It can handle any number of connections, using non-blocking network IO. Since its working mechanism is to open up a space in memory, then establish a hashtable,memcached self-governed these hashtable.
   
Why are there two names of Memcache and memcached?
really memcache is the name of this project, and memcached is its server's main program file name,
    
Memcache Official website: http://www.danga.com/memcached,
2 memcache Working principle
First of all, memcached is run in a daemon mode in one or more servers, at any time to accept customer connection manipulation, the client can be written in various languages, the currently known client API includes perl/php/python/ruby/java/c#/c and so on. After the client has established a connection to the Memcached service, the next thing is to access the object, each object that is accessed has a unique identifier key, the access control is through the key, and the object saved to memcached is actually placed in memory, not stored in Cache file, which is why memcached can be so efficient and fast. Note that these objects are not persistent, and the data inside will be lost after the service is stopped.
like many cache tools, the Memcached principle is not complicated. It uses the C/S mode, on the server side to start the service process, at startup can specify the listening IP, its own port number, the size of the memory used and several key parameters. Once started, the service is always in a usable state. The current version of Memcached is implemented via C, using a single-process, single-threaded, asynchronous I/O, event-based (event_based) service approach. Use Libevent as an event notification implementation. Multiple servers can work together, but there is no communication between these servers, and each server simply governs its own data. Client side by specifying the Server IP address (through the domain name should also be possible). The object or data that needs to be cached is saved on the server side in the form of a key->value pair. The value of key is converted by hash, and value is passed to the corresponding specific Server according to the hash value. When object data needs to be obtained, it is also based on key. The key is hashed first, and the obtained value determines which server it is stored on, and then makes a request to the server. The client only needs to know which server the value of the hash (key) will be stored on.

In the final analysis, Memcache's job is to maintain a huge hash table in the memory of a dedicated machine to store some of the arrays and files that are often read and written, thus greatly improving the efficiency of the site.

Memcache hit rate

Look First,

On Linux or Windows, there is a Telnet connection memcache, and the following command appears in stats, and these states are described as follows:

Pid
Process ID of the Memcache server

Uptime
Number of seconds the server has been running

Time
Server's current UNIX timestamp

Version
Memcache version

Pointer_size
The current operating system pointer size (32-bit system is generally 32bit)

Rusage_user
Cumulative user time for the process

Rusage_system
Cumulative system time for processes

Curr_items
The number of items currently stored by the server

Total_items
The total number of items stored since the server was started

bytes
The number of bytes occupied by the current server store items

Curr_connections
Number of connections currently open

Total_connections
Number of connections that have been opened since the server was started

Connection_structures
Number of connection constructs allocated by the server

Cmd_get
Get command (GET) total number of requests

Cmd_set
Set command (Save) Total Request count

Get_hits
Total Hit Count

Get_misses
Total number of Misses

Evictions
Items deleted in order to get free memory (the space allocated to memcache needs to be deleted after the old items are allocated to the new items)

Bytes_read
Total bytes read (number of requests bytes)

Bytes_written
Total Bytes sent (bytes of result)

Limit_maxbytes
Size of memory allocated to Memcache (bytes)

Threads
Current number of threads

First, Cache Hit ratio = Get_hits/cmd_get * 100%
Second, the number of get_misses plus get_hits should be equal to Cmd_get
Third, Total_items = = Cmd_set = = get_misses, when the maximum available memory is exhausted, memcached will delete some content, the equation will not be established

Memcached/scripts/memcached-tool

Memcached, known to the remote distribute cache (do not know can be javaeye a moment, or Google a little, or Baidu a bit, But in view of Baidu's ranking business flavor is too strong (from a recent event can be seen), so it is recommended to javaeye a bit), the use is very simple, it is used in many sites, almost very few large sites will not use memcached.

once I have seen a lot of analysis of memcached internal mechanism of the article, a little harvest, but after seeing and forget, and there is no deep concept, but recently I encountered a problem, this problem forced me to re-understand memcache, below I explain the problems I have encountered

question: I have tens of millions of of the data, this data will often be used, for now, it must be put into the memcached to ensure the speed of access, but my memcached data is often lost, and business needs are memcached data can not be lost. When my data is lost, the memory of memcached server is used to 60%, that is, 40% of the memory is seriously wasted. But not all applications are like this, and other applications have less wasted memory. Why is the memory used to 60% when the LRU is executed (the reason is that the LRU execution is because I found that my data is always put in front of the loss, and in this process, the data is not accessed, such as the first visit, only access to section 1000w, and section 300w or before the data have been lost, from the log, the article 300w must be put in it.

with these doubts, I began to re-examine the product of memcached, starting with its memory model: we know that C + + allocates memory in two ways, pre-allocation and dynamic allocation, obviously, pre-allocating memory will make the program faster, but its disadvantage is not efficient use of memory, The dynamic allocation can effectively use memory, but will make the program run inefficient, memcached memory allocation is based on the above principle, obviously in order to get faster speed, sometimes we have to space to change time.

That is to say, memcached will pre-allocate memory, yes, memcached allocating memory is called allocator, first of all, there are 3 concepts:
1 Slab
2 page
3 Chunk
explain that, in general, a memcahced process will be in advance to divide themselves into several slab, each slab under a number of page, each page under a number of chunk, if we think of these 3 strokes as object, this is the two one-to-many relationship. In general, the number of slab is limited, a few, more than 10, or dozens of, which is related to the memory of the process configuration. And each slab under the page by default is 1m, that is, if a slab consumes 100m of memory, then by default this slab has the number of page is 100, and chunk is the data we have to store the final place.

For example, I start a memcached process, take up memory 100m, and then open telnet,telnet localhost 11211, after connecting memcache, input stats slabs, enter, the following data appears:

  1. STAT 1:chunk_size 80
  2. STAT 1:chunks_per_page 13107
  3. STAT 1:total_pages 1
  4. STAT 1:total_chunks 13107
  5. STAT 1:used_chunks 13107
  6. STAT 1:free_chunks 0
  7. STAT 1:free_chunks_end 13107
  8. STAT 2:chunk_size 100
  9. STAT 2:chunks_per_page 10485
  10. STAT 2:total_pages 1
  11. STAT 2:total_chunks 10485
  12. STAT 2:used_chunks 10485
  13. STAT 2:free_chunks 0
  14. STAT 2:free_chunks_end 10485
  15. STAT 3:chunk_size 128
  16. STAT 3:chunks_per_page 8192
  17. STAT 3:total_pages 1
  18. STAT 3:total_chunks 8192
  19. STAT 3:used_chunks 8192
  20. STAT 3:free_chunks 0
  21. STAT 3:free_chunks_end 8192

these are the top 3 slab details.
Chunk_size represents the size of the data storage block, Chunks_per_page represents the number of chunk in a page of memory, Total_pages represents the number of pages per slab. Total_chunks represents the total number of chunk under this slab (=total_pages * chunks_per_page), Used_chunks says slab has been used under chunk, free_ Chunks indicates the number of chunks that can be used under the slab.

from the above example Slab 11 has a total of 1m of memory space, and now has been used up, SLAB2 also has 1m of memory space, has been used up, SLAB3 is still the case. And from these 3 slab chunk size can be seen, the first chunk is 80b, the second is 100b, the 3rd is 128b, basically the last one is 1.25 times times the previous one, but this growth we can control, we can get the process parameters at the start-up F to modify this value, for example, –f 1.1 indicates that the growth factor is 1.1, then the first slab in chunk for 80b, the second slab chunk should be about 80*1.1.

explain so much also should be able to see that I met the cause of the problem, if not to see, then I add the key: Memcached in the new value to store the address is determined by the size of the value, Value is always chosen to be stored in the chunk with the closest slab, such as the above example, if my value is 80b, then all of my value will always be stored in the number 1th slab, and the 1th slab free_chunks is 0, What do you do, if you do not append-m (no LRU, in this case, out of memory) when you start memcached, then memcached will clear the data from the least recently used chunk in this slab, and then put up the latest data. This explains why the LRU executes when my memory is still 40%, because the chunk_size in my other slab are much larger than my value, so my value doesn't fit in those slab. It will only be placed in the slab of the chunk closest to my value (and these slab are already full and depressed). This led to my data being overwritten, the latter covering the former.

The problem is found, the solution is still not found, because my data must require 100% hit rate, I can only adjust the slab growth factor and page size to try to make the hit ratio nearly 100%, but not 100% guaranteed hit is 100% (How to read it so awkward, self-review of their own language level), if you say, this solution is not ah, because my memcached server can not stop Ah, it does not matter there is another way, is Memcached-tool, execute the move command, such as: move 3 1, the representative of the 3rd slab in a memory page moved to 1th Slab, someone asked, what is the use, for example, my 20th number slab utilization is very low, but the page is a lot, such as 200, then is 200m, and 2 good slab often happen LRU, Obviously page is not enough, I can move 20 2, a memory page of 20th slab to 2nd number slab, so that the more efficient use of memory (someone said, only one page at a time, how much trouble ah?) Ahuaxuan said, write a script, loop it.

Some people say no ah, my memcache data can not be lost ah, OK, try Sina memcachedb bar, although I have not used, but suggest you can try, it also make use of Memcache protocol and BerkeleyDB do (write here, I have to admire Danga, I think its biggest contribution is not memcache server itself, but Memcache protocol), it is said to be used in many Sina's applications, including Sina's blog.

Add, the stats Slab command can see the slab in memcached, and the stats command can see some of your memcached's health, such as hit rate, for example:

  1. STAT PID 2232
  2. STAT Uptime 1348
  3. STAT Time 1218120955
  4. STAT version 1.2.1
  5. STAT Pointer_size 32
  6. STAT Curr_items 0
  7. STAT Total_items 0
  8. STAT bytes 0
  9. STAT curr_connections 1
  10. STAT Total_connections 3
  11. STAT Connection_structures 2
  12. STAT Cmd_get 0
  13. STAT Cmd_set 0
  14. STAT Get_hits 0
  15. STAT get_misses 0
  16. STAT Bytes_read 26
  17. STAT Bytes_written 16655
  18. STAT limit_maxbytes 104857600


From the above data can see this memcached process hit rate is very good, get_misses low up to 0, how to ah, because this process so I just started, I only use telnet to connect a bit, so curr_connections for 1, and Total_ Items is 0, because I did not put the data in, Get_hits is 0, because I did not call the Get method, the final result is misses of course 0, wow, in other words, hit rate is 100%, and yy.

The time has come to summarize, from this article we can get the following several conclusions:
Conclusion One, memcached LRU is not a global, but for slab, can be said to be regional.
Conclusion Two, it is necessary to increase the hit rate of memcached, estimate the size of our value and adjust the memory page size and growth factor appropriately.
Conclusion Three, it is much better to find answers with questions than to look at them.

Go to: http://www.javaeye.com/topic/225692

Go Summary of the principle and hit rate of memcache

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.