[Learn More-memcached] memcached Distributed Algorithm

Source: Internet
Author: User
Tags crc32 rehash perl script
Document directory
  • What does memcached mean by its distributed architecture?
  • Scattered Based on remainder Calculation
  • Disadvantages of scattered calculation based on Remainder
  • A brief description of consistent hashing
  • Function libraries supporting consistent hashing
  • What does memcached mean by its distributed architecture?
  • Scattered Based on remainder Calculation
  • Disadvantages of scattered calculation based on Remainder
  • A brief description of consistent hashing
  • Function libraries supporting consistent hashing

This article is reprinted. Address: http://blog.csdn.net/bintime/article/details/6259133

@ Bintime

As described in 1st, although memcached is called a "distributed" cache server, the server does not have a "distributed" function. The server only includes
2nd times,
The memory storage function introduced in the first 3rd Times is very simple to implement. The distributed memcached is fully implemented by the client library. This type of distribution is the biggest feature of memcached.

What does memcached mean by its distributed architecture?

The term "distributed" is used many times, but it is not explained in detail. Now, let's briefly introduce the principle of the solution. The implementation of each client is basically the same.

Assume that the memcached server has node1 ~ Three node3 servers. The application will save the data with the key name "Tokyo" "Kanagawa" "CHBA" "Saitama" "Gunma.

Figure 1 distributed Introduction: Preparation

First, add "Tokyo" to memcached ". After "Tokyo" is passed to the client library, the algorithm implemented by the client determines the memcached server that stores data based on the "key. After the server is selected, run the command to save "Tokyo" and its value.

Figure 2 distributed Introduction: when adding

Similarly, "Kanagawa", "CHBA", "Saitama", and "Gunma" are both saved on the server first.

Next, obtain the saved data. The key "Tokyo" to be obtained is also passed to the function library. The function library uses the same algorithm as the data storage algorithm to select a server based on the "key. If the algorithm used is the same, you can select the same server as the storage server and then send the GET command. As long as the data is not deleted for some reason, you can get the saved value.

Figure 3 distributed Introduction: Get

In this way, memcached is distributed by storing different keys on different servers. When the number of memcached servers increases, keys will be dispersed. Even if a memcached server fails to connect, other caches will not be affected, and the system will continue to run.

Next we will introduce the distributed method of the Perl client function library cache: memcached implementation mentioned in 1st.

Cache: memcached distributed Method

Perl's memcached client function library cache: memcached is the work of Brad Fitzpatrick, creator of memcached. It can be said that it is the original function library.

  • Cache: memcached-search.cpan.org

This function library implements distributed functions and is a standard distributed method for memcached.

Scattered Based on remainder Calculation

Cache: The distributed method of memcached is simply to say, "distribution based on the remainder of the number of servers ". Calculate the integer Hash Value of the key, divide it by the number of servers, and select the server based on the remaining number.

The cache: memcached is simplified to the following Perl script.

Use strict; Use warnings; use string: CRC32; my @ nodes = ('node1', 'node2', 'node3 '); my @ keys = ('Tokyo ', 'kagawa ', 'kiba', 'saitama ', 'gunm'); foreach my $ key (@ keys) {my $ CRC = CRC32 ($ key ); # CRC nodes my $ mod = $ CRC % ($ # nodes + 1); my $ Server = $ nodes [$ mod]; # select the server printf "% s => % s/n", $ key, $ server;} Based on the remainder ;}

Cache: memcached uses CRC when calculating the hash value.

  • String: CRC32-search.cpan.org

First, obtain the CRC value of the string. The server is determined by dividing the CRC value by the remainder of the number of server nodes. After the above code is executed, enter the following results:

tokyo       => node2kanagawa => node3chiba       => node2saitama   => node1gunma     => node1

According to this result, "Tokyo" is distributed to node2, and "Kanagawa" is distributed to node3. When the selected server cannot be connected, cache: memcached adds the number of connections to the key, computes the hash value again, and tries to connect. This action is called rehash. If you do not want rehash, you can specify the "rehash => 0" option when generating the cache: memcached object.

Disadvantages of scattered calculation based on Remainder

The remainder calculation method is simple and data dispersion is excellent, but it also has its disadvantages. That is, when a server is added or removed, the cost of cache reorganization is huge. After a server is added, the remainder will change dramatically, so that the server that is the same as the storage cannot be obtained, thus affecting the cache hit rate. Use Perl to write code segments to verify the cost.

use strict;use warnings;use String::CRC32;my @nodes = @ARGV;my @keys = ('a'..'z');my %nodes;foreach my $key ( @keys ) {    my $hash = crc32($key);    my $mod = $hash % ( $#nodes + 1 );    my $server = $nodes[ $mod ];    push @{ $nodes{ $server } }, $key;}foreach my $node ( sort keys %nodes ) {    printf "%s: %s/n", $node,  join ",", @{ $nodes{$node} };}

This Perl script demonstrates how to save the key "A" to "Z" to memcached and access it. Save it as mod. pl and execute it.

First, when there are only three servers:

$ mod.pl node1 node2 nod3node1: a,c,d,e,h,j,n,u,w,xnode2: g,i,k,l,p,r,s,ynode3: b,f,m,o,q,t,v,z

The result is as follows: node1 stores a, c, d, e ......, Node2 stores G, I, K ......, Each server stores 8 to 10 data records.

Next we will add a memcached server.

$ mod.pl node1 node2 node3 node4node1: d,f,m,o,t,vnode2: b,i,k,p,r,ynode3: e,g,l,n,u,wnode4: a,c,h,j,q,s,x,z

Node4. It can be seen that only D, I, K, P, R, and Y are hit. In this way, after adding nodes, the keys distributed to the server will change dramatically. Only six of the 26 keys are accessing the original server, and all others are moved to other servers. The hit rate is reduced to 23%. When memcached is used in Web applications, the cache efficiency will be greatly reduced when the memcached server is added, and the load will be concentrated on the database server, which may cause failure to provide normal services.

This problem also exists in the use of Mixi web applications, resulting in the inability to add memcached servers. However, with the new distributed method, you can easily add memcached servers. This distributed method is called consistent hashing.

Consistent hashing

The idea of consistent hashing has been introduced in many places, such as the Development blog of Mixi Corporation. Here we only briefly describe it.

  • Mixi engineers 'blog-wide spread faster than ever before.
  • Consistenthashing
A brief description of consistent hashing

Consistent hashing is as follows: first obtain the hash value of the memcached server (node) and configure it to 0 ~ 232 of the circle (continuum. Then, use the same method to obtain the hash value of the key for storing the data and map it to the circle. Search clockwise from the location where the data is mapped, and save the data to the first server. If more than 232 still cannot find the server, it will be saved to the first memcached server.

Figure 4 consistent hashing: Basic Principle

Add a memcached server from the status. The residual number distributed algorithm affects the cache hit rate because the server that saves the key changes significantly. However, in consistent hashing, the keys on the first server that is added to the Continuum server in a counterclockwise direction will be affected.

Figure 5 consistent hashing: Add a server

Therefore, consistent hashing minimizes key redistribution. In addition, some consistent hashing implementation methods also adopt the idea of virtual nodes. If a common hash function is used, the server's ing locations are unevenly distributed. Therefore, the idea of virtual nodes is used to allocate 100 ~ 200 points. In this way, the distribution will be restrained unevenly, minimizing the cache redistribution when servers increase or decrease.

The memcached client function library using the consistent hashing algorithm described below is tested by the number of servers (N) and the number of servers (m) the formula for calculating the hit rate after adding a server is as follows:

(1-N/(n + M) * 100

Function libraries supporting consistent hashing

Although memcached does not support consistent hashing, several client function libraries support this new distributed algorithm. The first memcached client function library that supports consistent hashing and virtual nodes is the PHP library named libketama, developed by last. FM.

  • Libketama-a consistent hashing algo for memcache clients-RJ has already existed-users at last. fm

As for the Perl client, cache: memcached: fast and cache: memcached: libmemcached support for consistent hashing as described in the 1st serialization.

  • Cache: memcached: Fast-search.cpan.org
  • Cache: memcached: libmemcached-search.cpan.org

Both interfaces are almost the same as those of cache: memcached. If cache: memcached is being used, it can be conveniently replaced. Cache: memcached: Fast implements libketama again. When using consistent hashing to create an object, you can specify the ketama_points option.

my $memcached = Cache::Memcached::Fast->new({    servers => ["192.168.0.1:11211","192.168.0.2:11211"],    ketama_points => 150});

In addition, cache: memcached: libmemcached is a Perl module that uses the C function library libmemcached developed by brain Aker. Libmemcached supports several distributed algorithms and consistent hashing. Its Perl binding also supports consistent hashing.

  • Tangent software: libmemcached
Summary

This article introduces the memcached distributed algorithm, which is mainly implemented by the client function library and the consistent hashing algorithm that efficiently disperses data. Next, we will introduce some of Mixi's experience in memcached applications and related compatible applications.

This article is reprinted. Address: http://blog.csdn.net/bintime/article/details/6259133

@ Bintime

As described in 1st, although memcached is called a "distributed" cache server, the server does not have a "distributed" function. The server only includes
2nd times,
The memory storage function introduced in the first 3rd Times is very simple to implement. The distributed memcached is fully implemented by the client library. This type of distribution is the biggest feature of memcached.

What does memcached mean by its distributed architecture?

The term "distributed" is used many times, but it is not explained in detail. Now, let's briefly introduce the principle of the solution. The implementation of each client is basically the same.

Assume that the memcached server has node1 ~ Three node3 servers. The application will save the data with the key name "Tokyo" "Kanagawa" "CHBA" "Saitama" "Gunma.

Figure 1 distributed Introduction: Preparation

First, add "Tokyo" to memcached ". After "Tokyo" is passed to the client library, the algorithm implemented by the client determines the memcached server that stores data based on the "key. After the server is selected, run the command to save "Tokyo" and its value.

Figure 2 distributed Introduction: when adding

Similarly, "Kanagawa", "CHBA", "Saitama", and "Gunma" are both saved on the server first.

Next, obtain the saved data. The key "Tokyo" to be obtained is also passed to the function library. The function library uses the same algorithm as the data storage algorithm to select a server based on the "key. If the algorithm used is the same, you can select the same server as the storage server and then send the GET command. As long as the data is not deleted for some reason, you can get the saved value.

Figure 3 distributed Introduction: Get

In this way, memcached is distributed by storing different keys on different servers. When the number of memcached servers increases, keys will be dispersed. Even if a memcached server fails to connect, other caches will not be affected, and the system will continue to run.

Next we will introduce the distributed method of the Perl client function library cache: memcached implementation mentioned in 1st.

Cache: memcached distributed Method

Perl's memcached client function library cache: memcached is the work of Brad Fitzpatrick, creator of memcached. It can be said that it is the original function library.

  • Cache: memcached-search.cpan.org

This function library implements distributed functions and is a standard distributed method for memcached.

Scattered Based on remainder Calculation

Cache: The distributed method of memcached is simply to say, "distribution based on the remainder of the number of servers ". Calculate the integer Hash Value of the key, divide it by the number of servers, and select the server based on the remaining number.

The cache: memcached is simplified to the following Perl script.

Use strict; Use warnings; use string: CRC32; my @ nodes = ('node1', 'node2', 'node3 '); my @ keys = ('Tokyo ', 'kagawa ', 'kiba', 'saitama ', 'gunm'); foreach my $ key (@ keys) {my $ CRC = CRC32 ($ key ); # CRC nodes my $ mod = $ CRC % ($ # nodes + 1); my $ Server = $ nodes [$ mod]; # select the server printf "% s => % s/n", $ key, $ server;} Based on the remainder ;}

Cache: memcached uses CRC when calculating the hash value.

  • String: CRC32-search.cpan.org

First, obtain the CRC value of the string. The server is determined by dividing the CRC value by the remainder of the number of server nodes. After the above code is executed, enter the following results:

tokyo       => node2kanagawa => node3chiba       => node2saitama   => node1gunma     => node1

According to this result, "Tokyo" is distributed to node2, and "Kanagawa" is distributed to node3. When the selected server cannot be connected, cache: memcached adds the number of connections to the key, computes the hash value again, and tries to connect. This action is called rehash. If you do not want rehash, you can specify the "rehash => 0" option when generating the cache: memcached object.

Disadvantages of scattered calculation based on Remainder

The remainder calculation method is simple and data dispersion is excellent, but it also has its disadvantages. That is, when a server is added or removed, the cost of cache reorganization is huge. After a server is added, the remainder will change dramatically, so that the server that is the same as the storage cannot be obtained, thus affecting the cache hit rate. Use Perl to write code segments to verify the cost.

use strict;use warnings;use String::CRC32;my @nodes = @ARGV;my @keys = ('a'..'z');my %nodes;foreach my $key ( @keys ) {    my $hash = crc32($key);    my $mod = $hash % ( $#nodes + 1 );    my $server = $nodes[ $mod ];    push @{ $nodes{ $server } }, $key;}foreach my $node ( sort keys %nodes ) {    printf "%s: %s/n", $node,  join ",", @{ $nodes{$node} };}

This Perl script demonstrates how to save the key "A" to "Z" to memcached and access it. Save it as mod. pl and execute it.

First, when there are only three servers:

$ mod.pl node1 node2 nod3node1: a,c,d,e,h,j,n,u,w,xnode2: g,i,k,l,p,r,s,ynode3: b,f,m,o,q,t,v,z

The result is as follows: node1 stores a, c, d, e ......, Node2 stores G, I, K ......, Each server stores 8 to 10 data records.

Next we will add a memcached server.

$ mod.pl node1 node2 node3 node4node1: d,f,m,o,t,vnode2: b,i,k,p,r,ynode3: e,g,l,n,u,wnode4: a,c,h,j,q,s,x,z

Node4. It can be seen that only D, I, K, P, R, and Y are hit. In this way, after adding nodes, the keys distributed to the server will change dramatically. Only six of the 26 keys are accessing the original server, and all others are moved to other servers. The hit rate is reduced to 23%. When memcached is used in Web applications, the cache efficiency will be greatly reduced when the memcached server is added, and the load will be concentrated on the database server, which may cause failure to provide normal services.

This problem also exists in the use of Mixi web applications, resulting in the inability to add memcached servers. However, with the new distributed method, you can easily add memcached servers. This distributed method is called consistent hashing.

Consistent hashing

The idea of consistent hashing has been introduced in many places, such as the Development blog of Mixi Corporation. Here we only briefly describe it.

  • Mixi engineers 'blog-wide spread faster than ever before.
  • Consistenthashing
A brief description of consistent hashing

Consistent hashing is as follows: first obtain the hash value of the memcached server (node) and configure it to 0 ~ 232 of the circle (continuum. Then, use the same method to obtain the hash value of the key for storing the data and map it to the circle. Search clockwise from the location where the data is mapped, and save the data to the first server. If more than 232 still cannot find the server, it will be saved to the first memcached server.

Figure 4 consistent hashing: Basic Principle

Add a memcached server from the status. The residual number distributed algorithm affects the cache hit rate because the server that saves the key changes significantly. However, in consistent hashing, the keys on the first server that is added to the Continuum server in a counterclockwise direction will be affected.

Figure 5 consistent hashing: Add a server

Therefore, consistent hashing minimizes key redistribution. In addition, some consistent hashing implementation methods also adopt the idea of virtual nodes. If a common hash function is used, the server's ing locations are unevenly distributed. Therefore, the idea of virtual nodes is used to allocate 100 ~ 200 points. In this way, the distribution will be restrained unevenly, minimizing the cache redistribution when servers increase or decrease.

The memcached client function library using the consistent hashing algorithm described below is tested by the number of servers (N) and the number of servers (m) the formula for calculating the hit rate after adding a server is as follows:

(1-N/(n + M) * 100

Function libraries supporting consistent hashing

Although memcached does not support consistent hashing, several client function libraries support this new distributed algorithm. The first memcached client function library that supports consistent hashing and virtual nodes is the PHP library named libketama, developed by last. FM.

  • Libketama-a consistent hashing algo for memcache clients-RJ has already existed-users at last. fm

As for the Perl client, cache: memcached: fast and cache: memcached: libmemcached support for consistent hashing as described in the 1st serialization.

  • Cache: memcached: Fast-search.cpan.org
  • Cache: memcached: libmemcached-search.cpan.org

Both interfaces are almost the same as those of cache: memcached. If cache: memcached is being used, it can be conveniently replaced. Cache: memcached: Fast implements libketama again. When using consistent hashing to create an object, you can specify the ketama_points option.

my $memcached = Cache::Memcached::Fast->new({    servers => ["192.168.0.1:11211","192.168.0.2:11211"],    ketama_points => 150});

In addition, cache: memcached: libmemcached is a Perl module that uses the C function library libmemcached developed by brain Aker. Libmemcached supports several distributed algorithms and consistent hashing. Its Perl binding also supports consistent hashing.

  • Tangent software: libmemcached
Summary

This article introduces the memcached distributed algorithm, which is mainly implemented by the client function library and the consistent hashing algorithm that efficiently disperses data. Next, we will introduce some of Mixi's experience in memcached applications and related compatible applications.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.