memcached comprehensive analysis of –4. The distributed algorithm of memcached

Source: Internet
Author: User
Tags crc32 rehash perl script

Series Article Navigation:

Memcached completely dissect –1. The foundation of Memcached

memcached comprehensive analysis of –2. Understanding the memory storage of memcached

memcached comprehensive analysis of –3. The deletion mechanism and development direction of memcached

memcached comprehensive analysis of –4. The distributed algorithm of memcached

Memcached a comprehensive analysis of the. memcached Application and Compatibility program


Release Date: 2008/7/23
Nagano Masahiro (Masahiro Nagano)
Original link: http://gihyo.jp/dev/feature/01/memcached/0004

I'm Mixi's Nagano. The internal condition of memcached was introduced in the 2nd and 3rd times by the former Sakamoto. This time no longer introduces the internal structure of memcached and begins to introduce the distribution of memcached.

Memcached's distributed

As described in the 1th time, memcached is called a "distributed" cache server, but there is no "distributed" functionality on the server side. The server-side only includes the memory storage features introduced in the 2nd and 3rd times, and its implementation is very simple. As for the distribution of memcached, it is fully implemented by the client library. This distribution is the biggest feature of memcached.

What does memcached mean by distributed?

The word "distributed" has been used many times here, but it has not been explained in detail. Now let's start with a brief introduction to the principle that the implementations of each client are basically the same.

The following assumes that the memcached server has node1~node3 three, the application will save the key named "Tokyo" "Kanagawa" "Chiba" "Saitama" "Gunma" data.

Figure 1 Distributed Introduction: Preparing

First add "Tokyo" to the memcached. When "Tokyo" is passed to the client library, the client-implemented algorithm determines the memcached server that holds the data based on the "key". When the server is selected, it commands it to save "Tokyo" and its values.

Figure 2 Distributed Introduction: When adding

Similarly, "Kanagawa" "Chiba" "Saitama" "Gunma" is the first to select the server and then save.

Next, you get the saved data. The key "Tokyo" To get is also passed to the library. The function library selects the server according to the "key" by the same algorithm as when the data is saved. Using the same algorithm, you can select the same server as you saved, and then send a GET command. As long as the data is not deleted for some reason, the saved value can be obtained.

Figure 3 Distributed Introduction: When getting

This allows the memcached to be distributed by saving different keys to different servers. memcached server, the key will be scattered, even if a memcached server failure can not connect, nor affect the other cache, the system can continue to run.

Next, we introduce the distributed method of cache::memcached implementation of Perl client function library mentioned in the 1th time.

A distributed approach to cache::memcached

Perl's memcached client function library cache::memcached is Memcached's author, Brad Fitzpatrick's works, which can be said to be the original library of functions.

    • cache::memcached-search.cpan.org

The function library realizes the distributed function and is a distributed method of memcached standard.

Calculate dispersion based on remainder

Cache::memcached's distributed approach is simply to "scatter according to the remainder of the number of servers." The integer hash value of the key is obtained, divided by the number of servers, and the server is selected based on the remaining number.

The following Perl scripts are simplified to illustrate the cache::memcached.

Use strict;
Use warnings;
Use STRING::CRC32;

My @nodes = (' Node1 ', ' node2 ', ' node3 ');
My @keys = (' Tokyo ', ' Kanagawa ', ' Chiba ', ' Saitama ', ' Gunma ');

foreach my $key (@keys) {
My $CRC = CRC32 ($key); # CRC
My $mod = $crc% ($ #nodes + 1);
My $server = $nodes [$mod]; # Select server based on remainder
printf "%s =%s\n", $key, $server;
}

Cache::memcached uses a CRC when seeking a hash value.

    • string::crc32-search.cpan.org

The CRC value of the string is evaluated first, and the server is determined by dividing the value by the number of server nodes. After the above code executes, enter the following results:

Tokyo       = Node2
Kanagawa = Node3
Chiba = Node2
Saitama = Node1
Gunma = Node1

According to this result, "Tokyo" dispersed to Node2, "Kanagawa" dispersed to node3 and so on. In other words, when the selected server is unable to connect, Cache::memcached adds the number of connections to the key, computes the hash value again, and attempts to connect. This action is called rehash. You do not want rehash to specify the rehash = 0 option when you build the Cache::memcached object.

Disadvantages of dispersion calculation based on remainder

The remainder calculation method is simple, the dispersion of the data is very good, but also has its shortcomings. That is, when the server is added or removed, the cost of the cache reorganization is significant. After the server is added, the remainder can be transformed so that the same server as the save is not available, affecting the cache hit ratio. Write the snippet code in Perl to verify its cost.

Use strict;
Use warnings;
Use STRING::CRC32;

my @nodes = @ARGV;
My @keys = (' A ' ... ' Z ');
My%nodes;

foreach my $key (@keys) {
My $hash = CRC32 ($key);
My $mod = $hash% ($ #nodes + 1);
My $server = $nodes [$mod];
Push @{$nodes {$server}}, $key;
}

foreach My $node (sort keys%nodes) {
printf "%s:%s\n", $node, join ",", @{$nodes {$node}};
}

This Perl script demonstrates how to save the "a" to "Z" key to memcached and access it. Save it as mod.pl and execute it.

First, when the server is only three:

$ mod.pl Node1 Node2 nod3
Node1:a,c,d,e,h,j,n,u,w,x
Node2:g,i,k,l,p,r,s,y
Node3:b,f,m,o,q,t,v,z

As a result, node1 save A, C, D, e......,node2 save G, I, K ..., each server has 8 to 10 data saved.

Next, add a memcached server.

$ mod.pl node1 Node2 node3 node4
Node1:d,f,m,o,t,v
Node2:b,i,k,p,r,y
Node3:e,g,l,n,u,w
Node4:a,c,h,j,q,s,x,z

Added the NODE4. Visible, only D, I, K, p, R, y hit. Like this, the server where the key is distributed after the node has been added can change dramatically. Only six of the 26 keys are accessing the original server, and all others are moved to the other server. The hit rate was reduced to 23%. When using memcached in a Web application, the instant cache efficiency in adding memcached servers is significantly reduced, and the load is concentrated on the database server, and there is a risk that a normal service cannot be provided.

This problem also applies to mixi Web applications, resulting in the inability to add memcached servers. But with the new distributed approach, it's now easy to add memcached servers. This distributed method is called consistent Hashing.

Consistent Hashing

About consistent hashing ideas, Mixi Co., Ltd. Development blog, and many other places have been introduced, here simply to explain.

    • Mixi Engineers ' Blog-スマートな disperse で Quick fit キャッシュライフ
    • Consistenthashing-コンシステントハッシュ method
A brief description of consistent hashing

Consistent hashing is as follows: first, the hash value of the memcached Server (node) is calculated and configured on the 0~232 Circle (Continuum). It then uses the same method to find the hash value of the key that stores the data and maps it to the circle. It then searches clockwise from where the data is mapped, saving the data to the first server found. If more than 232 still cannot find the server, it will be saved to the first memcached server.

Figure 4 Consistent Hashing: Fundamentals

Add a memcached server from the state. The remainder of the distributed algorithm affects the cache hit rate because the server that holds the key changes dramatically, but in consistent hashing, only the keys on the first server that increase the location of the server counter-clockwise on continuum are affected.

Figure 5 Consistent Hashing: adding a server

Therefore, the consistent hashing minimizes the redistribution of the keys. Moreover, some consistent hashing implementation methods also adopt the idea of virtual node. With the general hash function, the distribution of the server map location is very uneven. Therefore, using the idea of a virtual node, assign 100~200 points to each physical node (server) on the continuum. This can suppress uneven distribution and minimize cache redistribution when the server is increasing or decreasing.

The result of testing with the Memcached client function library, which is described in the following article using the consistent hashing algorithm, is that the hit rate calculation is calculated by increasing the number of servers (n) and the number of servers (m) added to the server:

(1-n/(n+m)) * 100

Library of functions supporting consistent hashing

Although cache::memcached is not supported by the consistent Hashing, several client function libraries have supported this new distributed algorithm. The first memcached client function library that supports consistent hashing and virtual nodes is the PHP library named Libketama, developed by Last.fm.

    • Libketama-a consistent hashing Algo for Memcache clients–rjブログ-the Users at Last.fm

As for the Perl client, the Cache::memcached::fast and cache::memcached::libmemcached described in the 1th time of the serialization support consistent Hashing.

    • cache::memcached::fast-search.cpan.org
    • cache::memcached::libmemcached-search.cpan.org

Both interfaces are almost identical to cache::memcached, and if you are using cache::memcached, you can easily replace them. Cache::memcached::fast re-implemented Libketama, you can specify ketama_points options when creating objects using consistent hashing.

My $memcached = Cache::memcached::fast->new ({
Servers = ["192.168.0.1:11211", "192.168.0.2:11211"],
Ketama_points = 150
});

In addition, Cache::memcached::libmemcached is a Perl module that uses the C function library libmemcached developed by Brain Aker. The libmemcached itself supports several distributed algorithms, and also supports consistent Hashing, whose Perl bindings also support consistent Hashing.

    • Tangent software:libmemcached
Summarize

This paper introduces the distributed algorithm of Memcached, the main memcached distributed is the consistent hashing algorithm, which is implemented by the client function library and efficiently distributed data. The next time you will introduce some of the Mixi's experience with memcached applications, and related compatible applications.

memcached comprehensive analysis of –4. The distributed algorithm of memcached

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.