The principle analysis of memcache distributed deployment

Source: Internet
Author: User
Tags hash memcached

Today, in the process of encapsulating the Memcache Operation class Library, it is realized that the use of memcache has been limited to a single server, and no distributed deployment of memcache has been used. Although I know how the memcache distributed, but in order to more in-depth understanding, or through Google search the relevant information.

Here are some of the data on memcache distributed deployments that are refined from the network.

What is memcache distributed deployment? Here is an example to understand:

Assuming that the memcached server has node1~node3 three, the application will save data with the key named "Tokyo" "Kanagawa" "Chiba" "Saitama" "Gunma".


First, add "Tokyo" to the memcached. After the "Tokyo" is passed to the client library, the algorithm implemented by the client determines the memcached server that holds the data according to the "key". When the server is selected, it commands it to save "Tokyo" and its value.

Similarly, "Kanagawa" "Chiba" "Saitama" "Gunma" are the first to select the server and then save.

Next gets the saved data. The key "Tokyo" to be obtained will also be passed to the function library when it is fetched. The function library selects the server according to the "key" by the same algorithm as when the data is saved. Using the same algorithm, you can select the same server as the save, and then send the Get command. As long as the data is not deleted for some reason, the saved value can be obtained.


In this way, the different keys are saved to a different server, and the distributed memcached is realized. memcached server increased, the key will be dispersed, even if a memcached server failed to connect, and will not affect the other cache, the system can continue to run.

Here's a concrete introduction to the consistent hashing algorithm :

A brief description of consistent hashing

first find the hash value of the memcached Server (node), and configures it to the Circle (continuum) of the 0~2sup (32). The same method is then used to find the hash value of the key that stores the data and map it to the circle. Then start looking clockwise from where the data maps to, saving the data to the first server you find. If more than 2SUP (32) is still unable to locate the server, it is saved to the first memcached server.

Adds a memcached server from the state of the diagram above. Remainder distributed algorithm because the server that holds the key changes dramatically, it affects the cache hit rate, but in consistent hashing, only the keys on the first server where the server is added to the continuum are affected.

Therefore, consistent hashing minimizes the redistribution of keys. Moreover, some consistent hashing methods also adopt the idea of virtual node. Using a generic hash function, the map location of the server is distributed very unevenly. Therefore, the idea of the virtual node is used to allocate 100~200 points on the continuum for each physical node (server). This can inhibit the uneven distribution, minimize the server increase or decrease when the cache redistribution.

Here's a little bit more about virtual nodes .

Consistent hashing algorithm is too little in service node, it is easy to cause data skew problem because of uneven node division. For example, there are two servers in our system, and their loops are distributed as follows:

This will inevitably result in a large amount of data being set up on server 1, with only a very small amount positioned on server 2. In order to solve this problem of data skew, the consistent hashing algorithm introduces a virtual node mechanism, which computes multiple hashes for each service node, and places a service node in each computed result position, called a virtual node.

This can be done by adding a number to the server IP or host name. For example, in the above case, we decided to compute three virtual nodes for each server, so we can calculate the Memcached server 1#1, Memcached server 1#2, Memcached server 1#3, Memcached The hash value of the server 2#1, Memcached server 2#2, Memcached server 2#3, thus forming six virtual nodes:

At the same time, the data location algorithm is unchanged, just one more step of the virtual node to the actual node mapping, such as positioning to "Memcached server 1#1", "Memcached server 1#2", "Memcached server 1#3" Data for three virtual nodes are positioned on server 1. This solves the problem of data skew when the service node is young. In practical applications, the number of virtual nodes is usually set to 32 or even larger, so even a few service nodes can achieve a relatively uniform data distribution, to avoid the occurrence of avalanches.

Example

Start memcache services, such as this

The code is as follows Copy Code

/usr/local/bin/memcached-d-P 11213-u root-m 10-c 1024-t 8-p/tmp/memcached.pid
/usr/local/bin/memcached-d-P 11214-u root-m 10-c 1024-t 8-p/tmp/memcached.pid
/usr/local/bin/memcached-d-P 11215-u root-m 10-c 1024-t 8-p/tmp/memcached.pid

Start three to use only 10M memory for easy testing.


Distributed deployment
Memcache in PHP's PECL extension has actually implemented multiple server support in 2.0.0 versions, and is now 2.2.5. Please see the following code

The code is as follows Copy Code

$memcache = new Memcache;
$memcache->addserver (' localhost ', 11213);
$memcache->addserver (' localhost ', 11214);
$memcache->addserver (' localhost ', 11215);
$memStats = $memcache->getextendedstats ();
Print_r ($memStats);

It is not very easy to implement the distributed deployment of Memcache by the example above.

The benign operation of distributed system
In the actual use of memcache, encountered the most serious problem is to increase or decrease the server, will result in a large range of cache loss, which may lead to the performance bottlenecks in the database, in order to avoid this situation, please look at the consistent hashing algorithm, The introduction of the Chinese can refer to here, through the selection of the server algorithm changes to achieve.

Modify PHP's Memcache extension memcache.c in the source code of the

The code is as follows Copy Code

"Memcache.hash_strategy" = Standard

For

The code is as follows Copy Code
"Memcache.hash_strategy" = consistent

Recompile, this is the use of the consistent hashing algorithm to find server access to data.

Effective test data show that the use of consistent hashing can greatly improve the deletion of memcache cache when the large scope of loss.

The code is as follows Copy Code
nonconsistenthash:92% of lookups changed after adding a target to the existing 10
nonconsistenthash:90% of lookups changed after removing 1 targets
consistenthash:6% of lookups changed after adding a target to the existing 10
consistenthash:9% of lookups changed after removing 1 targets


Summary:

a hashing algorithm in a dynamic distributed caching system is a key point in the architecture of the system. Using a more distributed algorithm can make the load of multiple service nodes more balanced, and can avoid the waste of resources and overload of the server. Using a consistent hashing algorithm can minimize the cost and risk of data migration from changes in the service hardware environment. Using more reasonable configuration policies and algorithms can make the distributed caching system more efficient and stable.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.