Series Article Navigation:
Memcached completely dissect –1. The foundation of Memcached
memcached comprehensive analysis of –2. Understanding the memory storage of memcached
memcached comprehensive analysis of –3. The deletion mechanism and development direction of memcached
memcached comprehensive analysis of –4. The distributed algorithm of memcached
Memcached a comprehensive analysis of the. memcached Application and Compatibility program
Release Date: 2008/7/23
Nagano Masahiro (Masahiro Nagano)
Original link: http://gihyo.jp/dev/feature/01/memcached/0004
I'm Mixi's Nagano. The internal condition of memcached was introduced in the 2nd and 3rd times by the former Sakamoto. This time no longer introduces the internal structure of memcached and begins to introduce the distribution of memcached.
Memcached's distributed
As described in the 1th time, memcached is called a "distributed" cache server, but there is no "distributed" functionality on the server side. The server-side only includes the memory storage features introduced in the 2nd and 3rd times, and its implementation is very simple. As for the distribution of memcached, it is fully implemented by the client library. This distribution is the biggest feature of memcached.
What does memcached mean by distributed?
The word "distributed" has been used many times here, but it has not been explained in detail. Now let's start with a brief introduction to the principle that the implementations of each client are basically the same.
The following assumes that the memcached server has node1~node3 three, the application will save the key named "Tokyo" "Kanagawa" "Chiba" "Saitama" "Gunma" data.
Figure 1 Distributed Introduction: Preparing
First add "Tokyo" to the memcached. When "Tokyo" is passed to the client library, the client-implemented algorithm determines the memcached server that holds the data based on the "key". When the server is selected, it commands it to save "Tokyo" and its values.
Figure 2 Distributed Introduction: When adding
Similarly, "Kanagawa" "Chiba" "Saitama" "Gunma" is the first to select the server and then save.
Next, you get the saved data. The key "Tokyo" To get is also passed to the library. The function library selects the server according to the "key" by the same algorithm as when the data is saved. Using the same algorithm, you can select the same server as you saved, and then send a GET command. As long as the data is not deleted for some reason, the saved value can be obtained.
Figure 3 Distributed Introduction: When getting
This allows the memcached to be distributed by saving different keys to different servers. memcached server, the key will be scattered, even if a memcached server failure can not connect, nor affect the other cache, the system can continue to run.
Next, we introduce the distributed method of cache::memcached implementation of Perl client function library mentioned in the 1th time.
A distributed approach to cache::memcached
Perl's memcached client function library cache::memcached is Memcached's author, Brad Fitzpatrick's works, which can be said to be the original library of functions.
- cache::memcached-search.cpan.org
The function library realizes the distributed function and is a distributed method of memcached standard.
Calculate dispersion based on remainder
Cache::memcached's distributed approach is simply to "scatter according to the remainder of the number of servers." The integer hash value of the key is obtained, divided by the number of servers, and the server is selected based on the remaining number.
The following Perl scripts are simplified to illustrate the cache::memcached.
Use strict;
Use warnings;
Use STRING::CRC32;
My @nodes = (' Node1 ', ' node2 ', ' node3 ');
My @keys = (' Tokyo ', ' Kanagawa ', ' Chiba ', ' Saitama ', ' Gunma ');
foreach my $key (@keys) {
My $CRC = CRC32 ($key); # CRC
My $mod = $crc% ($ #nodes + 1);
My $server = $nodes [$mod]; # Select server based on remainder
printf "%s =%s\n", $key, $server;
}
Cache::memcached uses a CRC when seeking a hash value.
- string::crc32-search.cpan.org
The CRC value of the string is evaluated first, and the server is determined by dividing the value by the number of server nodes. After the above code executes, enter the following results:
Tokyo = Node2
Kanagawa = Node3
Chiba = Node2
Saitama = Node1
Gunma = Node1
According to this result, "Tokyo" dispersed to Node2, "Kanagawa" dispersed to node3 and so on. In other words, when the selected server is unable to connect, Cache::memcached adds the number of connections to the key, computes the hash value again, and attempts to connect. This action is called rehash. You do not want rehash to specify the rehash = 0 option when you build the Cache::memcached object.
Disadvantages of dispersion calculation based on remainder
The remainder calculation method is simple, the dispersion of the data is very good, but also has its shortcomings. That is, when the server is added or removed, the cost of the cache reorganization is significant. After the server is added, the remainder can be transformed so that the same server as the save is not available, affecting the cache hit ratio. Write the snippet code in Perl to verify its cost.
Use strict;
Use warnings;
Use STRING::CRC32;
my @nodes = @ARGV;
My @keys = (' A ' ... ' Z ');
My%nodes;
foreach my $key (@keys) {
My $hash = CRC32 ($key);
My $mod = $hash% ($ #nodes + 1);
My $server = $nodes [$mod];
Push @{$nodes {$server}}, $key;
}
foreach My $node (sort keys%nodes) {
printf "%s:%s\n", $node, join ",", @{$nodes {$node}};
}
This Perl script demonstrates how to save the "a" to "Z" key to memcached and access it. Save it as mod.pl and execute it.
First, when the server is only three:
$ mod.pl Node1 Node2 nod3
Node1:a,c,d,e,h,j,n,u,w,x
Node2:g,i,k,l,p,r,s,y
Node3:b,f,m,o,q,t,v,z
As a result, node1 save A, C, D, e......,node2 save G, I, K ..., each server has 8 to 10 data saved.
Next, add a memcached server.
$ mod.pl node1 Node2 node3 node4
Node1:d,f,m,o,t,v
Node2:b,i,k,p,r,y
Node3:e,g,l,n,u,w
Node4:a,c,h,j,q,s,x,z
Added the NODE4. Visible, only D, I, K, p, R, y hit. Like this, the server where the key is distributed after the node has been added can change dramatically. Only six of the 26 keys are accessing the original server, and all others are moved to the other server. The hit rate was reduced to 23%. When using memcached in a Web application, the instant cache efficiency in adding memcached servers is significantly reduced, and the load is concentrated on the database server, and there is a risk that a normal service cannot be provided.
This problem also applies to mixi Web applications, resulting in the inability to add memcached servers. But with the new distributed approach, it's now easy to add memcached servers. This distributed method is called consistent Hashing.
Consistent Hashing
About consistent hashing ideas, Mixi Co., Ltd. Development blog, and many other places have been introduced, here simply to explain.
- Mixi Engineers ' Blog-スマートな disperse で Quick fit キャッシュライフ
- Consistenthashing-コンシステントハッシュ method
A brief description of consistent hashing
Consistent hashing is as follows: first, the hash value of the memcached Server (node) is calculated and configured on the 0~232 Circle (Continuum). It then uses the same method to find the hash value of the key that stores the data and maps it to the circle. It then searches clockwise from where the data is mapped, saving the data to the first server found. If more than 232 still cannot find the server, it will be saved to the first memcached server.
Figure 4 Consistent Hashing: Fundamentals
Add a memcached server from the state. The remainder of the distributed algorithm affects the cache hit rate because the server that holds the key changes dramatically, but in consistent hashing, only the keys on the first server that increase the location of the server counter-clockwise on continuum are affected.
Figure 5 Consistent Hashing: adding a server
Therefore, the consistent hashing minimizes the redistribution of the keys. Moreover, some consistent hashing implementation methods also adopt the idea of virtual node. With the general hash function, the distribution of the server map location is very uneven. Therefore, using the idea of a virtual node, assign 100~200 points to each physical node (server) on the continuum. This can suppress uneven distribution and minimize cache redistribution when the server is increasing or decreasing.
The result of testing with the Memcached client function library, which is described in the following article using the consistent hashing algorithm, is that the hit rate calculation is calculated by increasing the number of servers (n) and the number of servers (m) added to the server:
(1-n/(n+m)) * 100
Library of functions supporting consistent hashing
Although cache::memcached is not supported by the consistent Hashing, several client function libraries have supported this new distributed algorithm. The first memcached client function library that supports consistent hashing and virtual nodes is the PHP library named Libketama, developed by Last.fm.
- Libketama-a consistent hashing Algo for Memcache clients–rjブログ-the Users at Last.fm
As for the Perl client, the Cache::memcached::fast and cache::memcached::libmemcached described in the 1th time of the serialization support consistent Hashing.
- cache::memcached::fast-search.cpan.org
- cache::memcached::libmemcached-search.cpan.org
Both interfaces are almost identical to cache::memcached, and if you are using cache::memcached, you can easily replace them. Cache::memcached::fast re-implemented Libketama, you can specify ketama_points options when creating objects using consistent hashing.
My $memcached = Cache::memcached::fast->new ({
Servers = ["192.168.0.1:11211", "192.168.0.2:11211"],
Ketama_points = 150
});
In addition, Cache::memcached::libmemcached is a Perl module that uses the C function library libmemcached developed by Brain Aker. The libmemcached itself supports several distributed algorithms, and also supports consistent Hashing, whose Perl bindings also support consistent Hashing.
- Tangent software:libmemcached
Summarize
This paper introduces the distributed algorithm of Memcached, the main memcached distributed is the consistent hashing algorithm, which is implemented by the client function library and efficiently distributed data. The next time you will introduce some of the Mixi's experience with memcached applications, and related compatible applications.
memcached comprehensive analysis of –4. The distributed algorithm of memcached