The consistent hash algorithm in memchached

Last Update:2018-12-06 Source: Internet

Author: User

Tags value store

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Currently, many large web systems use memchached as the cache system to reduce the load on database servers to increase the response speed.

Directory:

Introduction to memchached
Hash
- Modulo
- Consistent hash
- Virtual node
- Source code parsing
References

1. Introduction to memchached

Memcached is an open-source high-performance distributed memory object cache system.
In fact, the idea is relatively simple. The implementation includes two parts: Server (memcached open-source project only refers to server) and client:

The server is essentiallyIn-memory key-Value StoreMaintain a large hashmap in the memory to store small pieces of arbitrary data, and provide operations through a unified simple interface (memcached protocol.
The client is a library that handles the network communication details of memcached protocol and communicates with the memcached server. It is encapsulated with easy-to-use APIs for different implementations of various languages to achieve integration with different language platforms.
The web system uses the client library to use memcached for Object Caching.

2. Hash

Memcached is distributed mainly on the client side. For the server side, only multiple memcached servers are deployed to form a cluster. Each server maintains its own data (no communication between each other ), wait for client requests through the daemon listening port.
On the client side, the data to be stored is distributed to a specific server through a consistent hash algorithm, and the same hash algorithm can be used for subsequent reading and querying.

The client can use various hash algorithms to locate the server:

Modulo

Simplest Hash Algorithm

Targetserver = serverlist [Hash (key) % serverlist. Size]

Directly use the hash value of the key (the method for calculating the hash value of the key can be freely selected, such as CRC32, MD5, or even the local hash system, such as the hashcode of Java) to locate the target server. This algorithm is not only simple, but also has good random distribution characteristics.

But the problem is also obvious, and the total number of servers cannot be changed easily. Because if you increase/decrease the number of memcached servers, subsequent queries to all the keys originally stored will be located on other servers, so that all the cache cannot be hit and become invalid.

Consistent hash

To solve this problem, we need to use the consistent hash algorithm (Consistent hash)
Compared with the modulo algorithm, the consistent hash algorithm calculates the hash value of each server in addition to the hash value of the key, then map these hash values to a finite value range (such as 0 ~ 2 ^ 32 ). Find the smallest server with the hash value greater than the hash (key) as the target server that stores the key data. If not, the server with the minimum hash value is taken as the target server.

For ease of understanding, we can understand this finite value field as a ring, and the value increases clockwise.

As shown in, there are five memcached servers in the cluster, which are distributed to the ring using the hash value of the server.

If there is a request to write data to the cache, calculate x = hash (key), map it to the ring, and search for it clockwise from X, the first server is used as the target server to store the cache. If the cache exceeds 2 ^ 32, the first server is hit. For example, the value of X is between A and ~ Between B, then the hit server node should be B node

We can see that, using this algorithm, storage and subsequent queries for the same key are located on the same memcached server.

So how does it solve the cache hit problem caused by adding/deleting servers?
Assume that a server F is added, as shown in figure

At this time, the cache cannot hit the problem still exists, but it only exists in B ~ Positions between F (from C to F) and other positions (including F ~ C) the cache hit is not affected (similar to the case where the server is deleted ). Although there are still cache hits that cannot be hit, compared with the modulo method, the number of cache hits that cannot be hit has been greatly reduced.

Virtual node
However, this algorithm also has a defect over the modulo mode: when the number of servers is small, it is very likely that their distribution in the ring is not particularly even, as a result, the cache cannot be evenly distributed to all servers.

There are three servers-A, B, and C. The probability of hitting B is much higher than that of A and C.
To solve this problem, we need to use the idea of virtual nodes: allocate 100 ~ 200 points, so there are more nodes on the ring, which can suppress uneven distribution.
When the target server is located for the cache, if it is located on a virtual node, it means that the actual storage location of the cache is on the actual physical server represented by the virtual node.

In addition, if the load capacity of each actual server is different, you can assign different weights to different numbers of virtual nodes based on the weights.

Source code parsing:

The following describes the implementation of consistent hash based on the source code of a Java memcached client (gwhalin/memcached-Java-client.
First, let's look at the server distribution:

// Use ordered map to simulate the loop this. consistentbuckets = new treemap (); messagedigest MD5 = md5.get (); // calculate the key and server hash values using MD5 // calculate the total weight if (this. totalweight for (INT I = 0; I <this. weights. length; I ++) This. totalweight + = (this. weights [I] = NULL )? 1: This. weights [I];} else if (this. weights = NULL) {This. totalweight = This. servers. length;} // assign a virtual node for each server (INT I = 0; I <servers. length; I ++) {// calculate the weight of the current server int thisweight = 1; if (this. weights! = NULL & this. weights [I]! = NULL) thisweight = This. weights [I]; // factor is used to control the number of virtual nodes allocated by each server. // the weights are the same, and factor = 40 // the weights are different, factor = 40 * Total servers * Percentage of the server weight. // in general, the larger the weight is, the larger the factor is, the more virtual nodes can be allocated. Double factor = math. floor (double) (40 * This. servers. length * thisweight)/(double) This. totalweight); For (long J = 0; j <factor; j ++) {// each server has a factor hash value. // use the server domain name or IP address plus a number to calculate the hash value. // For example, server-"172.45.155.25: 11111 "there is a factor data used to generate the hash value: // 172.45.155.25: 11111-1, 172.45.155.25: 11111-2 ,..., 172.45.155.25: 11111-factorbyte [] d = md5.digest (servers [I] + "-" + J ). getbytes (); // each hash value generates four virtual nodes for (INT h = 0; H <4; H ++) {long k = (long) (d [3 + H * 4] & 0xff) <24) | (long) (D [2 + H * 4] & 0xff) <16) | (long) (D [1 + H * 4] & 0xff) <8) | (long) (D [0 + H * 4] & 0xff )); // Save the consistentbuckets node on the ring. put (K, servers [I]) ;}// a total of 4 factor virtual nodes are allocated to each server}

Each server obtains a virtual node Quantity Control Factor factor based on the weight. Services [I] + "-" + I is used to generate factor hash values. The MD5 algorithm is used to generate hash values.
Because the MD5 length is 16 bytes, it is divided into 4 segments, each segment is 4 bytes, so that each segment corresponds to a virtual node. LinEx-Liney assembles the 4 bytes of this segment into a continuous 32bit, which is used as a low 32-bit pull to a long.

Locate the cache server for the key:

// Use MD5 to calculate the key's hash value messagedigest MD5 = md5.get (); md5.reset (); md5.update (key. getbytes (); byte [] bkey = md5.digest (); // take the low 32 bits of the MD5 value as the key's hash value long HV = (long) (bkey [3] & 0xff) <24) | (long) (bkey [2] & 0xff) <16) | (long) (bkey [1] & 0xff) <8) | (long) (bkey [0] & 0xff ); // The first virtual node of tailmap of HV corresponds to the target serversortedmap TMAP = This. consistentbuckets. tailmap (HV); Return (TMAP. isempty ())? This. consistentbuckets. firstkey (): TMAP. firstkey ();

3. References

The paper "consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web" was first proposed"
Consistent hashing wiki-http://en.wikipedia.org/wiki/Consistent_hashing
Ketama: consistent hashing
Memcached open-source project Homepage
Memcached Google Code Homepage
Gwhalin/memcached-Java-client Homepage

From: http://www.slimeden.com/2011/09/web/memcached_client_hash

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More