Chapter 2 distributed algorithms of memcached

Source: Internet
Author: User
Tags crc32 rehash perl script
4.1 distributed memcached
As described in Chapter 1st, although memcached is called a "distributed" cache server, it does not "distribute" on the server side.
. The server only includes the memory storage function described in Chapter 2nd and Chapter 3rd, which is easy to implement. As
Memcached is distributed by the client. Program Library implementation. This type of distribution is the biggest feature of memcached.
What does memcached mean by its distributed architecture?
The term "distributed" is used many times, but it is not explained in detail. Now, let's give a brief introduction to its principles.
The implementation of the user is basically the same.
Assume that the memcached server has node1 ~ Three node3 nodes, the application needs to save the key name
Data of "Tokyo", "Kanagawa", "CHBA", "Saitama", and "Gunma.
Figure 4.1: distributed Introduction: Preparation
23 Chapter 2 distributed memcached Algorithm
First, add "Tokyo" to memcached ". After "Tokyo" is passed to the client library, the algorithm implemented by the client is based on
"Key" to determine the memcached server that stores data. After the server is selected, run the command to save "Tokyo" and its value.
Figure 4.2: distributed Introduction: when adding
Similarly, "Kanagawa", "CHBA", "Saitama", and "Gunma" are both selected first and then saved.
Next, obtain the saved data. The key "Tokyo" to be obtained is also passed to the function library. Function library and data storage
For the same algorithm, select the server according to the "key. If the algorithm used is the same, you can select the server that is the same as the storage server, and then
Send the GET command. As long as the data is not deleted for some reason, you can get the saved value.
24 Chapter 1 distributed memcached Algorithms
Figure 4.3: distributed Introduction: Getting
In this way, memcached is distributed by storing different keys on different servers. More memcached servers
The keys are scattered. Even if a memcached server fails and cannot be connected, other caches are not affected.
Can continue to run.
Next, we will introduce the distributed method of implementing the Perl client function library cache: memcached mentioned in chapter 1st.
4.2 cache: distributed memcached Method
Perl's memcached client function library cache: memcached is the work of Brad Fitzpatrick, creator of memcached,
It can be said that it is the original function library.
• Cache: memcached search.
This function library implements distributed functions and is a standard distributed method for memcached.
Scattered Based on remainder Calculation
Cache: The distributed method of memcached is simply to say, "distribution based on the remainder of the number of servers ". Returns the integer of the key.
Number hash value, divided by the number of servers, and the server is selected based on the remaining number.
The cache: memcached is simplified to the following Perl script.
Use strict;
Use warnings;
Use string: CRC32;
25 Chapter 1 distributed memcached Algorithms
My @ nodes = ('node1', 'node2', 'node3 ');
My @ keys = ('Tokyo ', 'kanagawa', 'kiba ', 'saitama', 'gunm ');
Foreach my $ key (@ keys ){
My $ CRC = CRC32 ($ key); # CRC timeout
My $ mod = $ CRC % ($ # nodes + 1 );
My $ Server = $ nodes [$ mod]; # select a server based on the remainder
Printf "% s => % s" N ", $ key, $ server;
Cache: memcached uses CRC when calculating the hash value.
• String: CRC32 search.
First, obtain the CRC value of the string. The server is determined by dividing the CRC value by the remainder of the number of server nodes. The above Code Executive
Enter the following result after the row:
Tokyo => node2
Kanagawa => node3
CHBA => node2
Saitama => node1
Gunma => node1
According to this result, "Tokyo" is distributed to node2, and "Kanagawa" is distributed to node3. When the selected server does not have
Cache: memcached adds the number of connections to the key, computes the hash value again, and tries to connect. This action
It is called rehash. If you do not want rehash, you can specify the "rehash => 0" option when generating the cache: memcached object.
Disadvantages of scattered calculation based on Remainder
The remainder calculation method is simple and data dispersion is excellent, but it also has its disadvantages. That is, when a server is added or removed,
The cost of cache reorganization is huge. After a server is added, the remainder changes greatly, so that you cannot obtain the server that is the same as the server that you saved.
Server, thus affecting the cache hit rate. Use Perl to write code segments to verify the cost.
Use strict;
Use warnings;
Use string: CRC32;
My @ nodes = @ argv;
My @ keys = ('A' .. 'Z ');
My % nodes;
Foreach my $ key (@ keys ){
My $ hash = CRC32 ($ key );
My $ mod = $ hash % ($ # nodes + 1 );
My $ Server = $ nodes [$ mod];
Push @ {$ nodes {$ server}, $ key;
Foreach my $ node (sort keys % nodes ){
Printf "% s: % s" N ", $ node, join", ",{$ nodes {$ node }};
This Perl script demonstrates how to save the key "A" to "Z" to memcached and access it. Save it as mod. pl and execute it.
First, when there are only three servers:
26 Chapter 1 distributed memcached Algorithms
$ Mod. pl node1 node2 nod3
Node1: A, C, D, E, H, J, N, U, w, x
Node2: G, I, K, L, P, R, S, y
Node3: B, F, M, O, Q, T, V, Z
The result is as follows: node1 stores a, c, d, e ......, Node2 stores G, I, K ......, Each server stores 8 to 10
Next we will add a memcached server.
$ Mod. pl node1 node2 node3 node4
Node1: D, F, M, O, T, V
Node2: B, I, K, P, R, y
Node3: E, G, L, N, U, W
Node4: A, C, H, J, Q, S, X, Z
Node4. It can be seen that only D, I, K, P, R, and Y are hit. In this way, after a node is added, the key is distributed to the server
Great changes have taken place. Only six of the 26 keys are accessing the original server, and all others are moved to other servers. Hit rate drop
As low as 23%. When memcached is used in Web applications, the cache efficiency increases instantly when the memcached server is added.
The load is concentrated on the database server, and normal services may fail.
This problem also exists in the use of Mixi web applications, resulting in the inability to add memcached servers. However, the new
The distributed method allows you to easily add memcached servers. This distributed method is called consistent.
4.3 consistent hashing
The idea of consistent hashing has been introduced in many places such as the Development blog of Mixi Corporation.
• Mixi engineers' blog posts
Zookeeper zookeeper distributed over kubernetes zookeeper
• Consistenthashing Enabled
Too many rows have been written too many rows have been written into the algorithm
A brief description of consistent hashing
Consistent hashing is as follows: first obtain the hash value of the memcached server (node) and configure it to 0 ~ 232
(Continuum. Then, use the same method to obtain the hash value of the key for storing the data and map it to the circle. Then
Data is mapped to the start clockwise search, save the data to the first server. If it exceeds 232, it cannot be found.
The server is saved to the first memcached server.
27 Chapter 1 distributed memcached Algorithms
Figure 4.4: consistent hashing: Basic Principle
Add a memcached server from the status. The remainder distributed algorithm is greatly changed because the server that saves keys.
The cache hit rate is affected. However, in the consistent hashing, the server location must be added to the continuum in a counter-clockwise direction.
The keys on the first server will be affected.
Figure 4.5: consistent hashing: Add Server
28 Chapter 1 distributed memcached Algorithms
Therefore, consistent hashing minimizes key redistribution. In addition, some consistent hashing implementers
The idea of virtual nodes is also adopted. If a common hash function is used, the server's ing locations are unevenly distributed.
Therefore, the idea of virtual nodes is used to allocate 100 ~ 200 points. This way
To minimize the cache redistribution when servers increase or decrease.
The test result of using the memcached client function library of the consistent hashing algorithm described below is:
The formula for calculating the hit rate after the number of servers (N) and the number of servers (m) are increased is as follows:
(1 N /(
N + M) * 100
Function libraries supporting consistent hashing
The cache: memcached does not support consistent hashing many times, but there are several client function libraries
With this new distributed algorithm. The first memcached client function library that supports consistent hashing and virtual nodes
It is a PHP library named libketama, developed by last. FM.
• Libketama
Consistent hashing algo for memcache clients-RJ zookeeper users
At last. fm
For the Perl client, the cache: memcached: fast and cache: memcached: libmemcached support described in Chapter 1st
Consistent hashing.
• Cache: memcached: fast search.
• Cache: memcached: libmemcached search.
Both interfaces are similar to cache: memcached. If you are using cache: memcached, it is convenient.
. Cache: memcached: Fast implements libketama again.
To specify the ketama_points option.
My $ memcached = cache: memcached: Fast>
New ({
Servers => [" 11211", " 11211"],
Ketama_points = & gt; 150
In addition, cache: memcached: libmemcached is a C function library libmemcached developed by brain Aker.
Perl module. Libmemcached supports several distributed algorithms and consistent hashing. Its Perl binding also supports
Consistent hashing.
• Tangent software: libmemcached
4.4 conclusion
This article introduces the distributed algorithm of memcached, which is mainly implemented by the client function library and
The consistent hashing algorithm that efficiently disperses data. Next, we will introduce some of Mixi's experience in memcached applications,
And compatible applications.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.