Memcache Introduction & Memory allocation mechanism

Last Update:2015-07-09 Source: Internet

Author: User

Tags apc greatest common divisor memcached

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

About this thing in the end should be stored data online has been a lot of statements, some say that SQL after MD5 as a key value, the result as a content storage, also some people say that according to business logic misplaced, anyway is the exhilaration of speculation. After nearly 2 years of practice, I still feel that we should store it according to business logic, and I can't encrypt and store the result set. In doing so, it is virtually impossible to update the data in a timely manner, depending on the expiration time of the MEMCAHCE. The information class static data is suitable, but this kind of website generally will do the static processing, therefore the memcache also cannot play the very big use. The real niche is community-based sites, most of which are Dynamic Data, and the performance requirements are high, so memcahce more appropriate (seemingly memcache originally to solve this problem was created). "Baidu Library: memcached principle and use of the detailed" "Baidu Library: memcached build Distributed Cache" by caching the database query results, reduce the number of database access. "Detailed distributed cache-memcached and distributed implementation Method" Summary: Distributed cache for the following considerations, the first is the cache itself, the horizontal linear extension problem, followed by cache large concurrency of its own performance issues, again avoid the single point of failure of the cache (multi-replica and replica consistency). The core technology of distributed cache includes the management problem of memory itself, including the allocation of memory, management and recovery mechanism. Secondly, distributed management and distributed algorithm, followed by cache key value management and routing. Many people use it as a storage carrier of the same form as sharedmemory, although memcached uses the same "key=>value" to organize data, but it differs greatly from local caches such as shared memory and APC. The memcached is distributed, meaning it is not local. It is based on a network connection (and of course it can also use localhost) to complete the service itself, which is an application-independent program or daemon (daemon mode). Memcached uses the Libevent library to implement network connectivity services, theoretically can handle an unlimited number of connections, but it is different from Apache, it is more time for stable continuous connection, so its actual concurrency capacity is limited. In a conservative case, the maximum number of simultaneous connections for memcached is 200, which is related to the Linux threading capability, which can be adjusted. Refer to the relevant documentation for Libevent. Memcached memory usage also differs from APC. APC is based on shared memory and Mmap, MEMCACHD has its own memory allocation algorithms and management methods, it does not have a relationship with shared memory, and there is no limit to shared memory, typically, each memcached process can manage 2GB of memory space, and can increase the number of processes if more space is required. "Use" memcached in many cases as Database front-end cacheUse of. Because it is much less expensive than database SQL parsing, disk operations, and it is using memory to manage the data, it can provide better performance than directly read the database, in large systems, access to the same data is very frequent, memcached can greatly reduce the database pressure, Improve the efficiency of system execution. In addition, memcached is often used as a storage medium for data sharing between servers, such as storing data in the SSO system as a single-point login, which can be saved in memcached and shared by multiple applications. "Memcached Features" (1) The protocol is simple: memcached Server client communication does not use a format such as complex XML, but rather uses a simple text-based protocol. (2) Event processing based on Libevent: Libevent is a library that encapsulates the epoll of Linux, the kqueue of BSD-like operating systems, and other time-processing functions into a unified interface. Memcached uses this libevent library, so it can perform its high performance on Linux, BSD, Solaris and other operating systems. (3) built-in memory storageMode: In order to improve performance, the data saved in memcached is stored in the memcached built-in memory storage space. Since the data exists only in memory, Restarting memcached, restarting the operating system will cause all data to disappear。 In addition, memcached will automatically delete the non-applicable cache after the content capacity reaches the specified value. (4) memcached non-interoperable distributed: memcached Although it is a "distributed" cache server, there is no distributed functionality on the server side. Each memcached does not communicate with each other to share information. of his distribution is mainly through Client ImplementOf "Memcached distributed"
Although Memcached is called a "distributed" cache server, there is no "distributed" functionality on the server side. The distribution of memcached is completely client-implemented. Now let's take a look at how memcached implements distributed caching.

For example, assuming that the memcached server has node1~node3 three, the application will save data with the key named "Tokyo" "Kanagawa" "Chiba" "Saitama" "Gunma".

First add "Tokyo" to the memcached. When "Tokyo" is passed to the client library, the client-implemented algorithm determines the memcached server that holds the data based on the "key". When the server is selected, it commands it to save "Tokyo" and its values.

Similarly, "Kanagawa" "Chiba" "Saitama" "Gunma" is the first to select the server and then save.

Next, you get the saved data. The key "Tokyo" To get is also passed to the library. The function library selects the server according to the "key" by the same algorithm as when the data is saved. Using the same algorithm, you can select the same server as you saved, and then send a GET command. As long as the data is not deleted for some reason, the saved value can be obtained.

This allows the memcached to be distributed by saving different keys to different servers. memcached server, the key will be scattered, even if a memcached server failure can not connect, nor affect the other cache, the system can continue to run.

Memcached Cache distribution Policy: http://blog.csdn.net/bintime/article/details/6259133 "Consistent hashing simple description" consistent Hashing is as follows: first, the hash value of the memcached Server (node) is calculated and configured on the 0~232 Circle (Continuum). It then uses the same method to find the hash value of the key that stores the data and maps it to the circle. It then searches clockwise from where the data is mapped, saving the data to the first server found. If more than 232 still cannot find the server, it will be saved to the first memcached server. Add a memcached server from the state. The remainder of the distributed algorithm affects the cache hit rate because the server that holds the key changes dramatically, but in consistent hashing, only the keys on the first server that increase the location of the server counter-clockwise on continuum are affected. Therefore, the consistent hashing minimizes the redistribution of the keys. Moreover, some consistent hashing implementation methods also adopt the idea of virtual node. With the general hash function, the distribution of the server map location is very uneven. Therefore, using the idea of a virtual node, assign 100~200 points to each physical node (server) on the continuum. This can suppress uneven distribution and minimize cache redistribution when the server is increasing or decreasing.

"Cache Policy"

When the MS Hash table is full, the new insert data replaces the old data, and the updated strategy is the LRU (least recently used) and the effective time limit for each kv pair. The KV-to-store effective time limit is set in the MC driven by app and passed as a parameter to Ms.

While Ms Adoption is a lazy alternative, MS does not open an additional process to monitor the outdated kv pairs and delete them in real time, but only if and when the new data is inserted, and there is no extra space left to remove the action.

The cache multi-copy cache is primarily used to store multiple copies of cached data when cached data is stored to prevent cache invalidation. Cache invalidation occurs in the following situations: 1. Cache timeout is removed (normal expiration) 2. The cache was removed due to storage space limitations (Exception invalidation) 3. Cache invalidation due to cache node changes (exception invalidation) in the case of caching multiple replicas, the cached distributed distribution strategy needs to be reconsidered. The second cache of multiple replicas is actually possible for multiple read nodes, which can be done as a distributed parallel read, which is another issue that can be considered. The "consistency problem with cached data" cache data is as read-only as possible, so the cache itself is a data scenario that is not suitable for a large number of write and update operations. In the case of read, if there is a change in data, one is to update both the cache and the database. One is to directly invalidate the cached data.

"Memory allocation mechanism"

By default, MS is allocated memory with a built-in component called the " block allocator ". Discarding the malloc/free memory allocations of the C + + standard, in order to avoid memory fragmentation , the operating system takes more time to find these logically contiguous blocks of memory (actually disconnected). Using a block allocator, MS will take turns allocating large chunks of memory and reusing them continuously. Of course, because of the size of the blocks are different, when the size of the data and block size does not match the case, it is possible to cause memory waste.

At the same time, MS to key and data have the corresponding restrictions, the length of the key can not exceed 250 bytes, data can not exceed the block size limit---1MB.
Because the hash algorithm used by MC does not take into account the memory size of each Ms. In theory, the MC assigns the probability of the equivalent kv pair to each MS, so that if each MS memory is not the same, that could lead to a decrease in memory utilization. So an alternative solution would be to find their greatest common divisor based on the memory size of each MS, then open n capacity = Greatest common divisor instance on each MS, which would be equivalent to having multiple sub-MS with the same capacity, providing overall memory utilization.

"Memcache storage mechanism" memcache data stored in memory is not how long, but the pre-defined small size, the data is first stored in the smallest space can be stored, as for why to do so, you can see the help of Memcache. Due to this fact, we have to consider the actual size of the data to be stored, so that the memory is wasted (although the memory is the price of cabbage, but that is money AH). Memcache has a start parameter,-F, to control the size of the minimum space, the base space is 80b, and then 80* (1+f) ^n the formula to calculate the size of the space later. The default value for F is 0.25. So if you store session data, then this value can be appropriately changed, if the data stored in the resource class, this value can be appropriately changed, not recommended to change too big, so too wasted space.

Memcache Introduction & Memory allocation mechanism

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More