Cache::memcached uses CRC when it is seeking a hash value. string::crc32-search.cpan.org
The CRC value of the string is first evaluated, and the server is determined by the remainder of the number of server nodes divided by the value. After the code above executes, enter the following results:
Tokyo => node2
Kanagawa => node3
Chiba => node2 Saitama => Node1 Gunma => Node1
According to the results, "Tokyo" dispersed to Node2, "Kanagawa" dispersed to node3 and so on. To put it another way, when the selected server fails to connect, Cache::memcached adds the number of connections to the key, computes the hash again and attempts to connect. This action is called rehash. When you do not want to rehash, you can specify the Rehash => 0 option when generating the Cache::memcached object. to compute the dispersion disadvantage based on the remainder
The method of remainder calculation is simple, and the dispersion of data is excellent, but it also has its disadvantages. That is, when the server is added or removed, the cost of a cache reorganization is significant. When you add a server, the remainder changes dramatically, which makes it impossible to get the same server as when you save, thereby affecting the cache hit rate. Write snippets of code in Perl to verify the cost.
Use strict;
Use warnings;
Use STRING::CRC32;
my @nodes = @ARGV;
My @keys = (' a '.. ' Z ');
my%nodes;
The foreach my $key (@keys) {my
$hash = CRC32 ($key);
My $mod = $hash% ($ #nodes + 1);
My $server = $nodes [$mod];
Push @{$nodes {$server}}, $key;
}
foreach My $node (sort keys%nodes) {
printf '%s:%s/n ', $node, join ', ', @{$nodes {$node}}}
This Perl script shows you how to save the Keys "a" through "Z" to memcached and access. Save it as mod.pl and execute it.
First, when the server is only three:
$ mod.pl node1 node2 nod3
node1:a,c,d,e,h,j,n,u,w,x
node2:g,i,k,l,p,r,s,y
As a result, Node1 saves A, C, D, e......,node2 save G, I, k ..., and each server holds 8 to 10 data.
Next, add a memcached server.
$ mod.pl node1 node2 node3 node4 node1:d,f,m,o,t,v node2:b,i,k,p,r,y node3:e,g,l,n,u,w node4:a,c,h
, J,q,s,x,z
Added a node4. Visible, only D, I, K, p, R, y hit. Like this, the server where the key is dispersed after the node is added will change dramatically. Only six of the 26 keys are accessing the original server, and all others are moved to another server. The hit rate dropped to 23%. When you use memcached in a Web application, the instant cache efficiency of adding a memcached server is significantly reduced, the load is concentrated on the database server, and there is a risk that you will not be able to provide normal services.
This problem also applies to Mixi Web applications, which makes it impossible to add memcached servers. But with the new distributed approach, it is now easy to add memcached servers. This distributed approach is called consistent hashing. Consistent hashing
About consistent hashing thought, Mixi development blog and so on many places have introduced, here only briefly explained. Mixi engineers ' Blog-スマートな dispersed で Quick キャッシュライフconsistenthashing-コンシステントハッシュ method Consistent hashing simple description
The consistent hashing is as follows: first find the hash value of the memcached Server (node) and configure it to the 0~232 Circle (Continuum). The same method is then used to find the hash value of the key that stores the data and map it to the circle. Then start looking clockwise from where the data maps to, saving the data to the first server you find. If more than 232 still cannot find the server, it is saved to the first memcached server.
Fig. 4 Consistent hashing: fundamentals
Adds a memcached server from the state of the diagram above. Remainder distributed algorithm because the server that holds the key changes dramatically, it affects the cache hit rate, but in consistent hashing, only the keys on the first server where the server is added to the continuum are affected.
Figure 5 Consistent hashing: adding a server
Therefore, consistent hashing minimizes the redistribution of keys. Moreover, some consistent hashing methods also adopt the idea of virtual node. Using a generic hash function, the map location of the server is distributed very unevenly. Therefore, the idea of the virtual node is used to allocate 100~200 points on the continuum for each physical node (server). This can inhibit the uneven distribution, minimize the server increase or decrease when the cache redistribution.
The result of testing with the Memcached client function library using the consistent hashing algorithm, described later in this article, is that the number of server units (n) and the increased number of server units (m) calculate the hit-rate formula after the server is added as follows:
(1-n/(n+m)) a function library that supports consistent hashing
Although the cache::memcached in this series are not supported by consistent hashing, there are several client libraries that support this new distributed algorithm. The first memcached client function library that supports consistent hashing and virtual nodes is a PHP library called Libketama, developed by Last.fm. Libketama-a consistent hashing algo for memcache clients–rjブログ-Users at Last.fm
As for the Perl client, the Cache::memcached::fast and cache::memcached::libmemcached supported consistent hashing, as described in the serial 1th time. Cache::memcached::fast-search.cpan.org cache::memcached::libmemcached-search.cpan.org
Both interfaces are almost identical to cache::memcached, and if you are using cache::memcached, you can easily replace them. Cache::memcached::fast Libketama, you can specify the ketama_points option when you create an object using consistent hashing.
My $memcached = cache::memcached::fast->new ({
servers => ["192.168.0.1:11211", "192.168.0.2:11211"],
Ketama_points =>
});
In addition, Cache::memcached::libmemcached is a Perl module that uses the C function library libmemcached developed by Brain Aker. The libmemcached itself supports several distributed algorithms, as well as consistent hashing, whose Perl bindings also support consistent hashing. summary of Tangent software:libmemcached
This paper introduces the distributed algorithm of Memcached, which is mainly memcached distributed by client function library, and consistent hashing algorithm of efficiently distributing data. The next time you will introduce some of Mixi's experience in memcached applications, and related compatible applications.