Principle and Golang realization of consistent hash algorithm

Last Update:2016-09-10 Source: Internet

Author: User

Tags modulus

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is a creation in Article, where the information may have evolved or changed. Overview
There is a scenario where the key should be routed to a service when a cache service is provided by multiple server groups. If the most common way to key%n (N is the number of servers), there is no problem here at first glance, but when the number of servers sent increases or decreases, the allocation method becomes key% ( n+1) or key% (N-1). There will be a large number of key failover, if the back-end key corresponding to the state of the stored data, then there is no doubt that this practice will lead to a large number of data migration between servers, so as to the instability of services. In order to solve the class problem, the consistent hash algorithm emerges.

1. Consistency hash Algorithm features
In the distributed cache, a good hash algorithm should meet the following conditions:

Equalization (Balance)

Equalization mainly means that the nodes in the cluster should be balanced as much as possible through the allocation of algorithms.

Monotonicity (monotonicity)

Monotonicity mainly refers to when the cluster changes, has been assigned to the old node key, as far as possible to allocate to the previous node, in order to prevent a large number of data migration, where the general hash modulus is difficult to meet this, and the consistency hash algorithm can be transferred to the number of keys to control at a lower level.

Dispersion (Spread)

Dispersion is primarily for the same key, and when operating on different clients, there may be inconsistencies in the number of cache clusters that the client obtains, causing the problem of mapping keys to different nodes, which can lead to inconsistent data. A good hash algorithm should avoid dispersion as much as possible.

Payload (load)

The load is mainly for a cache, the same cache may be mapped to a different key, resulting in inconsistent state of the cache.

From the principle point of view, consistent hash algorithm for the above problems have a reasonable solution.

2. Consistency Hash detailedConsistent hash of the core idea of the key as a hash operation, and according to a certain rule to obtain the value between the 0-2^32-1, the size of the ring 2^32,key calculated integer value is the position of key on the hash ring, how to map a key to a node, here is divided into two steps.
The first step is to compute the key of the service according to the hash algorithm, and get the position of the service on the consistent hash ring.
The second step, the cache key, the same method to calculate the location of the hash ring, in a clockwise direction, find the first value greater than equal to the hash ring location of the service key, so that the key needs to allocate the server.

, each key is assigned to each node according to the hash algorithm, when a node fails to effect, such as Node 2 failure, then the key on Node 2 will be assigned to the adjacent node on the hash ring, and the other key location unchanged.

virtual nodes Improve equalization
As can be seen, because there are only 3 nodes, there are some nodes located around the location of a large number of hash points resulting in the allocation of these nodes to key more than other nodes, which will lead to the load imbalance in the cluster, in order to solve this problem, the introduction of virtual node, that is, a real node corresponding to multiple virtual nodes. When the cached key is mapped, the corresponding virtual node is found, and then the real node is corresponding. As shown, each node is virtual out of two virtual nodes, thus improving the equalization.

3. Consistency hash algorithm vs. other algorithms
For the node allocation problem of the cache class data key in the cluster, there are several solutions, such as simple hash modulus, slot mapping, and consistent hash.

Hash take mode

For the hash modulo, there is no problem with the equalization, but if a node is added to the cluster, there will be n/(n+1) data, and the higher the N value, the greater the failure rate. This is obviously unacceptable.

Slot Mapping

The idea is to do a certain operation of the key value (such as CRC16, Crc32,hash), obtain an integer value, and then take the value with a fixed number of slots modulo (slots), each node processing fixed slots. Get key in the node, first to calculate the corresponding relationship between the key and the slot, and then through the corresponding relationship between slots and nodes to find the node, where each new node, only need to migrate a certain slot corresponding key, and not the migration of the slot key value will not be effective, this method will reduce the effectiveness rate to n/(n+1). However, the disadvantage of this approach is that all nodes need to know the relationship between the slot and node, if the client side does not save the slot and the corresponding relationship between the node, it needs to implement the redirection logic.

Consistent Hash

Consistent hash as stated above, its new node's effectiveness rate is only n/(n+1), through the consistency hash to minimize the real efficiency. At the same time, compared to the method of slot mapping, no inductive groove is needed to do the intermediate correspondence, the maximum simplification of the implementation.
4. Implementation of consistent hash algorithm based on Golang
The use of Golang to achieve a consistent hash, considering the actual use of the scene, the existence of the service node between the machine configuration may not be the same, so provide a node based on the weight of the virtual node redistribution logic, so that the weight of the node as much as possible to bear some key, and low-weight nodes bear some key , of course, the weight of the calculation also involves more things, code see: Https://github.com/g4zhuj/hashring
5. SummaryThis paper analyzes the principle of consistency hash and compares it with other distributed cluster allocation algorithms, from the point of view of distributed cache, two famous distributed storage System Redis, memcached is implemented by using slot mapping and consistency hash respectively, because of different algorithms, The sequence of actions triggered by a node change in a cluster is different, with various considerations.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More