Memcache Introduction and Features

Last Update:2016-01-21 Source: Internet

Author: User

Tags memcached

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1, Memcached Introduction

What is 1.1 memcached ?

Memcached is a software developed by Brad Fitzpatric, a Danga Interactive company in LiveJournal. It has become an important factor in improving Web application extensibility in many services such as Mixi, Hatena, Facebook, Vox, and LiveJournal. Many Web applications save data to an RDBMS, where the application server reads the data and displays it in the browser. However, with the increase of data volume and the concentration of access, the burden of RDBMS, database response deterioration, site display delay and other significant impact. This is the time to memcached. Memcached is a high-performance distributed memory cache server. The general purpose is to reduce the number of database accesses by caching database query results to improve the speed and scalability of dynamic Web applications.

memcached as a distributed cache server running at high speed has the following characteristics:

The protocol is simple: memcached Server client communication does not use a format such as complex XML, but rather uses a simple text-based protocol.
Event handling based on Libevent: Libevent is a library that encapsulates the time-processing functions of Linux epoll, BSD-like operating systems, and so on, into a unified interface. Memcached uses this libevent library, so it can perform its high performance on Linux, BSD, Solaris and other operating systems.
Built-in memory storage: To improve performance, the data saved in memcached is stored in Memcached's built-in memory storage space. Since the data only exists in memory, restarting memcached and restarting the operating system will cause all data to disappear. Additionally, when the content capacity reaches the specified value, memcached automatically deletes the non-applicable cache. Research memcached This product begins with its memory model: we know that there are two ways to allocate memory in C + +, pre-allocation and dynamic allocation, obviously, pre-allocating memory will make the program faster, but its disadvantage is that the memory can not be effectively used, and dynamic allocation is efficient use of memory, However, the efficiency of the program will be reduced, memcached memory allocation is based on the above principle, obviously in order to achieve faster speed, sometimes we have to space to change time.
Memcached's high performance stems from a two-stage hash (two-stage hash) structure. Memcached is like a huge hash table that stores a lot of pairs. With key, you can store or query arbitrary data. The client can store the data on more than one memcached. When querying the data, the client first calculates the hash value of the key (phase a hash), and then selects a node, the client sends the request to the selected node, and then the Memcached node passes an internal hash algorithm (phase two hash). Find the real data (item) and return it to the client. From the implementation point of view, Memcached is a non-blocking, event-based server program.
memcached non-interoperable distributed: memcached Although it is a "distributed" cache server, there is no distributed functionality on the server side. Each memcached does not communicate with each other to share information. His distribution is mainly implemented through the client.

Visible memcached is distributed, and is fully implemented by the client library. This distribution is the biggest feature of memcached.

Memcached Memory Management

Recent memcached by default, a mechanism named slab allocatoion is used to allocate and manage memory. Prior to the change mechanism, the allocation of memory was performed simply by malloc and free for all records. However, this approach can lead to memory fragmentation, aggravating the burden on the operating system memory manager.

The basic principle of the Slab allocator is to divide the allocated memory into blocks of a specific length according to a predetermined size, which completely resolves the memory fragmentation problem. The principle of Slab Allocation is quite simple. Divide the allocated memory into blocks (CHUCNK) of various sizes and divide the same size blocks into groups (CHUCNK collections)

And slab allocator also has the purpose of reusing allocated memory. In other words, the allocated memory is not freed, but reused.

Main terms of Slab Allocation

Page: The memory space allocated to slab, which is 1MB by default. After assigning to slab, the slab is divided into chunk according to the size of the.
Chunk: The memory space used to cache records.
Slab Class: A group of chunk of a specific size.

The principle of caching records in slab

Memcached depending on the size of the data received, select the slab (Figure 2) with the most appropriate data size, memcached the list of free chunk in slab, select chunk from the list, and then cache the data in it.

memcached Efficient use of resources in data deduplication

Memcached data does not actually disappear from memcached when data is deleted. The memcached does not release allocated memory. After the record times out, the client can no longer see the record (invisible Transparent) and its storage space can be reused.

Lazy expriationmemcached internally does not monitor whether the record is out of date, but instead looks at the timestamp of the record at get and checks whether the record is out of date. This technique is called the lazy expiration. Therefore, memcached will not consume CPU time on expired monitoring.

In the case of cache storage full capacity of the deletion needs to consider a variety of mechanisms, on the one hand, according to the queue mechanism, on the other hand should be the priority of the cache object itself, according to the priority of the cache object to delete objects.

LRU: The principle of effectively deleting data from the cache

Memcached will prefer to use a record space that has timed out, but even so, there is a situation where there is a lack of time between additional records. You will use the least recently used (LRU) mechanism to allocate space at this time. This is the mechanism for deleting the least used records. Therefore, when memcached has insufficient memory space (unable to get new space from Slab Class), it searches from the most recent unused record and allocates space to the new record.

memcached Distributed

Although Memcached is called a "distributed" cache server, there is no "distributed" functionality on the server side. The distribution of memcached is completely client-implemented. Now let's take a look at how memcached implements distributed caching.

For example, assuming that the memcached server has node1~node3 three, the application will save data with the key named "Tokyo" "Kanagawa" "Chiba" "Saitama" "Gunma".

First add "Tokyo" to the memcached. When "Tokyo" is passed to the client library, the client-implemented algorithm determines the memcached server that holds the data based on the "key". When the server is selected, it commands it to save "Tokyo" and its values.

Similarly, "Kanagawa" "Chiba" "Saitama" "Gunma" is the first to select the server and then save.

Next, you get the saved data. The key "Tokyo" To get is also passed to the library. The function library selects the server according to the "key" by the same algorithm as when the data is saved. Using the same algorithm, you can select the same server as you saved, and then send a GET command. As long as the data is not deleted for some reason, the saved value can be obtained.

This allows the memcached to be distributed by saving different keys to different servers. memcached server, the key will be scattered, even if a memcached server failure can not connect, nor affect the other cache, the system can continue to run.

Memcached Cache Distribution Policy:http://blog.csdn.net/bintime/article/details/6259133

Remainder distributed algorithm

is to "scatter according to the remainder of the number of servers." Evaluates the integer hash of the key, divided by the number of servers, and selects the server based on the remaining number

Disadvantages of the remainder algorithm

The remainder calculation method is simple, the dispersion of the data is very good, but also has its shortcomings. That is, when the server is added or removed, the cost of the cache reorganization is significant. When you add a server, the remainder can change so that you cannot get the same server as you saved, which can affect the cache hit.

consistent hashing simple description

Consistent Hashing is as follows: first, the hash value of the memcached Server (node) is calculated and configured on the 0~232 Circle (Continuum). It then uses the same method to find the hash value of the key that stores the data and maps it to the circle. It then searches clockwise from where the data is mapped, saving the data to the first server found. If more than 232 still cannot find the server, it will be saved to the first memcached server.

Adds a memcached server from the state. The remainder of the distributed algorithm affects the cache hit rate because the server that holds the key changes dramatically, but in consistent hashing, only the keys on the first server that increase the location of the server counter-clockwise on continuum are affected.

Therefore, consistent hashing minimizes the redistribution of keys. Moreover, some consistent hashing implementation methods also adopt the idea of virtual node. With the general hash function, the distribution of the server map location is very uneven. Therefore, using the idea of a virtual node, assign 100~200 points to each physical node (server) on the continuum. This can suppress uneven distribution and minimize cache redistribution when the server is increasing or decreasing.

Cache multiple Replicas

Cache Multiple replicas are primarily used to store multiple copies of cached data when cached data is stored to prevent cache invalidation. Cache invalidation occurs in the following situations:

1. Cache timeout is removed (normal expiration)
2. Cache is removed due to storage space limitations (Exception invalidation)
3. Cache invalidation due to cache node changes (exception invalidation)

In the case of caching multiple replicas, the distributed distribution policy of the cache needs to be reconsidered. The second cache of multiple replicas is actually possible for multiple read nodes, which can be done as a distributed parallel read, which is another issue that can be considered.

Consistency issues with cached data

The cached data is as read-only as possible, so the cache itself is a data scenario that is not suitable for a large number of write and update operations. In the case of read, if there is a change in data, one is to update both the cache and the database. One is to directly invalidate the cached data.

Memcache hit rate

Cache Hit Ratio = Get_hits/cmd_get * 100% (Total hit count/total number of requests)

To increase the hit rate of memcached, it is necessary to estimate the size of our value and adjust the memory page size and growth factor appropriately.

The increase in hit rate can be achieved through a variety of scenarios.

First, increase the amount of memory that the service acquires

Second, increasing space utilization, which is actually another way to increase the amount of memory

Third, apply a level to the LRU again

Four, for the overall hit rate, you can take an effective redundancy strategy, reduce the distributed service when a server service jitter occurs

Some attention

1. Memcache already allocated memory will no longer be actively cleaned up.

2. Memcache the memory page assigned to a slab can no longer be assigned to another slab.

3. Flush_all cannot reset the layout of memcache allocated memory pages, just to expire all of the item.

4. Memcache Maximum stored item (key+value) size limit is 1M, which is limited by page size 1M

5. Because the memcache distributed is the client program through the hash algorithm to get the key to achieve, different languages may be different hash algorithm, the same client program may also use dissimilar methods, so in the multi-language, multi-module sharing the same set of memcached services, It is important to note that the same hash algorithm is selected on the client

6. You can disable LRU substitution with the-m parameter when starting memcached, and add and set will return failure when memory runs out

7. Memcached starts by specifying the amount of data storage, does not include the memory that it occupies, and the administrative space that is set up to save the data. As a result, it consumes more memory than the memory allocation specified at startup, which requires attention.

8. Memcache storage is limited to the length of the key, PHP and C maximum length is 250

Memcache Introduction and Features

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More