The principle analysis of memcache distributed deployment

Source: Internet
Author: User

The following article to introduce the Memcache distributed deployment of the principle of analysis, I hope this article to you understand memcache distributed deployment will be helpful oh.

Today, in the encapsulation Memcache Operation class Library process, realize that the use of memcache has been limited to a single server, there is no use of memcache distributed deployment. Although I know Memcache's distribution is going on, but for more in-depth understanding, or through Google search for the relevant information.

Here are some of the information on memcache distributed deployments that have been picked up on the web.

What is memcache distributed deployment? Let's take a look at the following example:

Assuming that the memcached server has node1~node3 three, the application will save data with the key named "Tokyo" "Kanagawa" "Chiba" "Saitama" "Gunma".

First add "Tokyo" to the memcached. When "Tokyo" is passed to the client library, the client-implemented algorithm determines the memcached server that holds the data based on the "key". When the server is selected, it commands it to save "Tokyo" and its values.

Similarly, "Kanagawa" "Chiba" "Saitama" "Gunma" is the first to select the server and then save.

Next, you get the saved data. The key "Tokyo" To get is also passed to the library. The function library selects the server according to the "key" by the same algorithm as when the data is saved. Using the same algorithm, you can select the same server as you saved, and then send a GET command. As long as the data is not deleted for some reason, the saved value can be obtained.

This allows the memcached to be distributed by saving different keys to different servers. memcached server, the key will be scattered, even if a memcached server failure can not connect, nor affect the other cache, the system can continue to run.

Here's a detailed introduction to the consistent hashing algorithm :

A brief description of consistent hashing

The hash value of the memcached Server (node) is first calculated and configured on a circle (continuum) of 0~2sup (32). It then uses the same method to find the hash value of the key that stores the data and maps it to the circle. It then searches clockwise from where the data is mapped, saving the data to the first server found. If more than 2SUP (32) still cannot find the server, it will be saved to the first memcached server.

Add a memcached server from the state. The remainder of the distributed algorithm affects the cache hit rate because the server that holds the key changes dramatically, but in consistent hashing, only the keys on the first server that increase the location of the server counter-clockwise on continuum are affected.

Therefore, the consistent hashing minimizes the redistribution of the keys. Moreover, some consistent hashing implementation methods also adopt the idea of virtual node. With the general hash function, the distribution of the server map location is very uneven. Therefore, using the idea of a virtual node, assign 100~200 points to each physical node (server) on the continuum. This can suppress uneven distribution and minimize cache redistribution when the server is increasing or decreasing.

Here's another introduction to virtual nodes

Consistent hashing algorithm is too little in service node, so it is easy to skew the data due to uneven node division. For example, there are two servers in our system, and their rings are distributed as follows:

This inevitably results in a large amount of data being concentrated on server 1, and only a very small number will be located on server 2. In order to solve this data skew problem, the consistent hashing algorithm introduces the virtual node mechanism, that is, to compute multiple hashes for each service node, and to place a service node, called a virtual node, for each computed result location.

This can be done by adding numbers to the server IP or host name. For example, we decided to compute three virtual nodes for each server, so we can calculate "Memcached server 1#1", "Memcached server 1#2", "Memcached server 1#3", "Memcached Server 2#1 "," Memcached server 2#2 "," Memcached server 2#3 "hash values, resulting in six virtual nodes:

At the same time, the data location algorithm is not changed, just one step more virtual node to the actual node mapping, such as positioning to "Memcached server 1#1", "Memcached server 1#2", "Memcached server 1#3" Data for three virtual nodes is located on server 1. This solves the problem of data skew when the service node is young. In practical applications, the number of virtual nodes is usually set to 32 or greater, so even a few service nodes can achieve a relatively uniform distribution of data, to avoid the occurrence of avalanches.

Example

Start the Memcache service, such as this

The code is as follows Copy Code

/usr/local/bin/memcached-d-P 11213-u root-m 10-c 1024-t 8-p/tmp/memcached.pid
/usr/local/bin/memcached-d-P 11214-u root-m 10-c 1024-t 8-p/tmp/memcached.pid
/usr/local/bin/memcached-d-P 11215-u root-m 10-c 1024-t 8-p/tmp/memcached.pid

Start three using only 10M of memory for easy testing.


Distributed deployment
The Memcache in PHP's PECL extension actually implemented multi-server support in the 2.0.0 version, which is now 2.2.5. Please see the following code

The code is as follows Copy Code

$memcache = new Memcache;
$memcache->addserver (' localhost ', 11213);
$memcache->addserver (' localhost ', 11214);
$memcache->addserver (' localhost ', 11215);
$memStats = $memcache->getextendedstats ();
Print_r ($memStats);

It is not very easy to implement the distributed deployment of Memcache by the example above.

Benign operation of distributed system
In the actual use of memcache, the most serious problem is that when you increase or decrease the server, it will lead to a large range of cache loss, which may lead to the performance bottleneck of the database, in order to avoid this situation, please first look at the consistent hashing algorithm, The introduction of Chinese can be referred to here, through the access to select the server algorithm changes, to achieve.

Modify PHP memcache extension memcache.c in the source code of the

The code is as follows Copy Code

"Memcache.hash_strategy" = Standard

For

The code is as follows Copy Code
"Memcache.hash_strategy" = consistent

Recompile, this is the use of the consistent hashing algorithm to find the server access data.

Effective test data show that the use of consistent hashing can greatly improve the deletion of memcache when the cache large-scale loss of the situation.

The code is as follows Copy Code
nonconsistenthash:92% of lookups changed after adding a target to the existing 10
nonconsistenthash:90% of lookups changed after removing 1 of ten targets
consistenthash:6% of lookups changed after adding a target to the existing 10
consistenthash:9% of lookups changed after removing 1 of ten targets


Summarize:

In the dynamic distributed cache system, the hashing algorithm assumes the key point of the system architecture. The use of more reasonable distribution of the algorithm can make the load between multiple service nodes is relatively balanced, can minimize the waste of resources and server overload. Using a consistent hashing algorithm minimizes the data migration costs and risks associated with changes in the service hardware environment. Using more reasonable configuration policies and algorithms can make the distributed cache system more efficient and stable.

The principle analysis of memcache distributed deployment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.