memcached comprehensive analysis of –4. memcached Distributed Algorithm __ algorithm

Source: Internet
Author: User
Tags crc32 memcached rehash perl script

Author: Nagano Masahiro (Masahiro Nagano)
Original link: http://gihyo.jp/dev/feature/01/memcached/0004

I am Mixi's Nagano. The 2nd and 3rd times introduced memcached's internal situation from the former Sakamoto. This time no longer introduces the internal structure of memcached, began to introduce the distribution of memcached. the distributed memcached

As described in the 1th time, memcached is called a "distributed" caching server, but there is no "distributed" functionality on the server side. The server side only includes the memory storage features described in the 2nd and 3rd pre-ban, which is very simple to implement. As for the memcached distributed, it is completely implemented by the client library. This distribution is the biggest feature of memcached. What does it mean to memcached distributed?

The word "distributed" has been used many times here, but not explained in detail. Now start with a simple introduction to its principles, the implementation of each client is basically the same.

The following assumes that the memcached server has node1~node3 three, and the application will save data with the key named "Tokyo" "Kanagawa" "Chiba" "Saitama" "Gunma".

Figure 1 Distributed Introduction: Preparing

First, add "Tokyo" to the memcached. After the "Tokyo" is passed to the client library, the algorithm implemented by the client determines the memcached server that holds the data according to the "key". When the server is selected, it commands it to save "Tokyo" and its value.

Figure 2 Distributed Introduction: When adding

Similarly, "Kanagawa" "Chiba" "Saitama" "Gunma" are the first to select the server and then save.

Next gets the saved data. The key "Tokyo" to be obtained will also be passed to the function library when it is fetched. The function library selects the server according to the "key" by the same algorithm as when the data is saved. Using the same algorithm, you can select the same server as the save, and then send the Get command. As long as the data is not deleted for some reason, the saved value can be obtained.

Figure 3 Distributed Introduction: When getting

In this way, the different keys are saved to a different server, and the distributed memcached is realized. memcached server increased, the key will be dispersed, even if a memcached server failed to connect, and will not affect the other cache, the system can continue to run.

Next, the distributed approach to the Perl client function library cache::memcached implementation mentioned in the 1th time is described. a distributed approach to cache::memcached

Perl's memcached client function library cache::memcached is Memcached's author Brad Fitzpatrick's work, which can be said to be the original library of functions. cache::memcached-search.cpan.org

The function library realizes the distributed function and is the memcached standard distributed method. calculates the dispersion based on the remainder

The cache::memcached distributed approach simply means "scatter according to the remainder of the server number." The integer hash value of the key is evaluated, divided by the number of servers, and the server is selected based on the remaining number.

Here's how to simplify the cache::memcached to the following Perl script.

Use strict;
Use warnings;
Use STRING::CRC32;

My @nodes = (' Node1 ', ' node2 ', ' node3 ');
My @keys = (' Tokyo ', ' Kanagawa ', ' Chiba ', ' Saitama ', ' Gunma ');

foreach my $key (@keys) {
My $CRC = CRC32 ($key); # CRC
My $mod = $crc% ($ #nodes + 1);
My $server = $nodes [$mod]; # Select server based on remainder
printf "%s =>%s\n", $key, $server;
}

Cache::memcached uses CRC when it is seeking a hash value. string::crc32-search.cpan.org

The CRC value of the string is first evaluated, and the server is determined by the remainder of the number of server nodes divided by the value. After the code above executes, enter the following results:

Tokyo       => Node2
Kanagawa => Node3
Chiba => Node2
Saitama => Node1
Gunma => Node1

According to the results, "Tokyo" dispersed to Node2, "Kanagawa" dispersed to node3 and so on. To put it another way, when the selected server fails to connect, Cache::memcached adds the number of connections to the key, computes the hash again and attempts to connect. This action is called rehash. When you do not want to rehash, you can specify the Rehash => 0 option when generating the Cache::memcached object. to compute the dispersion disadvantage based on the remainder

The method of remainder calculation is simple, and the dispersion of data is excellent, but it also has its disadvantages. That is, when the server is added or removed, the cost of a cache reorganization is significant. When you add a server, the remainder changes dramatically, which makes it impossible to get the same server as when you save, thereby affecting the cache hit rate. Write snippets of code in Perl to verify the cost.

Use strict;
Use warnings;
Use STRING::CRC32;

my @nodes = @ARGV;
My @keys = (' a '.. ' Z ');
My%nodes;

foreach my $key (@keys) {
My $hash = CRC32 ($key);
My $mod = $hash% ($ #nodes + 1);
My $server = $nodes [$mod];
Push @{$nodes {$server}}, $key;
}

foreach My $node (sort keys%nodes) {
printf "%s:%s\n", $node, join ",", @{$nodes {$node}};
}

This Perl script shows you how to save the Keys "a" through "Z" to memcached and access. Save it as mod.pl and execute it.

First, when the server is only three:

$ mod.pl Node1 Node2 nod3
Node1:a,c,d,e,h,j,n,u,w,x
Node2:g,i,k,l,p,r,s,y
Node3:b,f,m,o,q,t,v,z

As a result, Node1 saves A, C, D, e......,node2 save G, I, k ..., and each server holds 8 to 10 data.

Next, add a memcached server.

$ mod.pl node1 Node2 node3 node4
Node1:d,f,m,o,t,v
Node2:b,i,k,p,r,y
Node3:e,g,l,n,u,w
Node4:a,c,h,j,q,s,x,z

Added a node4. Visible, only D, I, K, p, R, y hit. Like this, the server where the key is dispersed after the node is added will change dramatically. Only six of the 26 keys are accessing the original server, and all others are moved to another server. The hit rate dropped to 23%. When you use memcached in a Web application, the instant cache efficiency of adding a memcached server is significantly reduced, the load is concentrated on the database server, and there is a risk that you will not be able to provide normal services.

This problem also applies to Mixi Web applications, which makes it impossible to add memcached servers. But with the new distributed approach, it is now easy to add memcached servers. This distributed approach is called consistent hashing. Consistent hashing

About consistent hashing thought, Mixi development blog and so on many places have introduced, here only briefly explained. Mixi engineers ' Blog-スマートな dispersed で Quick キャッシュライフconsistenthashing-コンシステントハッシュ method Consistent hashing simple description

The consistent hashing is as follows: first find the hash value of the memcached Server (node) and configure it to the 0~232 Circle (Continuum). The same method is then used to find the hash value of the key that stores the data and map it to the circle. Then start looking clockwise from where the data maps to, saving the data to the first server you find. If more than 232 still cannot find the server, it is saved to the first memcached server.

Fig. 4 Consistent hashing: fundamentals

Adds a memcached server from the state of the diagram above. Remainder distributed algorithm because the server that holds the key changes dramatically, it affects the cache hit rate, but in consistent hashing, only the keys on the first server where the server is added to the continuum are affected.

Figure 5 Consistent hashing: adding a server

Therefore, consistent hashing minimizes the redistribution of keys. Moreover, some consistent hashing methods also adopt the idea of virtual node. Using a generic hash function, the map location of the server is distributed very unevenly. Therefore, the idea of the virtual node is used to allocate 100~200 points on the continuum for each physical node (server). This can inhibit the uneven distribution, minimize the server increase or decrease when the cache redistribution.

The result of testing with the Memcached client function library using the consistent hashing algorithm, described later in this article, is that the number of server units (n) and the increased number of server units (m) calculate the hit-rate formula after the server is added as follows:

(1-n/(n+m)) a function library that supports consistent hashing

Although the cache::memcached in this series are not supported by consistent hashing, there are several client libraries that support this new distributed algorithm. The first memcached client function library that supports consistent hashing and virtual nodes is a PHP library called Libketama, developed by Last.fm. Libketama-a consistent hashing algo for memcache clients–rjブログ-Users at Last.fm

As for the Perl client, the Cache::memcached::fast and cache::memcached::libmemcached supported consistent hashing, as described in the serial 1th time. Cache::memcached::fast-search.cpan.org cache::memcached::libmemcached-search.cpan.org

Both interfaces are almost identical to cache::memcached, and if you are using cache::memcached, you can easily replace them. Cache::memcached::fast Libketama, you can specify the ketama_points option when you create an object using consistent hashing.

My $memcached = Cache::memcached::fast->new ({
Servers => ["192.168.0.1:11211", "192.168.0.2:11211"],
Ketama_points => 150
});

In addition, Cache::memcached::libmemcached is a Perl module that uses the C function library libmemcached developed by Brain Aker. The libmemcached itself supports several distributed algorithms, as well as consistent hashing, whose Perl bindings also support consistent hashing. summary of Tangent software:libmemcached

This paper introduces the distributed algorithm of Memcached, which is mainly memcached distributed by client function library, and consistent hashing algorithm of efficiently distributing data. The next time you will introduce some of Mixi's experience in memcached applications, and related compatible applications.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.