Consistent hash (consistent hashing) algorithm

Source: Internet
Author: User

Article synchronization published in the blogger's website lang degree cloud , transmission door: http://www.wolfbe.com/detail/201608/341.html

1. BackgroundWe all know that the memcached server does not provide distributed functionality, and the memcached distribution is fully implemented by the client. When deploying a memcached server cluster, we need to distribute the cache requests as far as possible to different cache servers, which can make all the cache space available and can reduce the pressure on a single cache server.
The simplest implementation is to cache the request by calculating the hash value of the key, after which it is mapped to a different memcahed server. This simple implementation is a more effective scheme without considering the dynamic change of the cluster machine, but in the distributed cluster system, there are many shortcomings in the simple modulo hashing algorithm. In a dynamically changing cache cluster, there are four criteria that determine the good or bad of a hashing algorithm:
    • balance (blance): The result of a hash is mapped to all nodes as much as possible, so that the resources of all nodes can be exploited.
    • monotonicity (monotonicity): means that if some data is assigned to the corresponding node by hashing, and a new node is added to the system, the result of the hash should be that the allocated content can be mapped to the original or new node without being mapped to the old system ( Other nodes before the new node is added.
    • dispersion (spread): In a distributed environment, the terminal may not see all the nodes, but can only see a portion. When a terminal is dispatched to a node through a hashing process, the same content will eventually be stored on a different node by different endpoints, because the node range seen by the different terminals may not be the same, causing the hash to result differently. This should be avoided because the same content will be stored in different nodes, wasting the resources of the system.
    • load: The load problem is actually looking at the dispersion problem from another perspective. Since different terminals may map the same content to different nodes, it is possible for a particular node to be mapped to different content by different terminals. As with dispersion, this situation should also be avoided, so a good hashing algorithm should be able to minimize the load on the nodes.
In a distributed cluster system, the increase or removal of machines or machine failure is often the case, if the use of simple modulo hashing algorithm, the machine changes the calculation of the algorithm to be re-modified, the location of the stored objects must be changed, external service access may not hit the original content, Causes the cache backend service to be under too much stress and crash.        Therefore, the simple modulus of this hashing algorithm does not meet the dynamic cache cluster environment, so other algorithms are needed to avoid such problems. The consistent hashing algorithm was proposed in 1997 by the MIT Karger and others to solve the distributed cache, mainly to solve hot spot problems in the Internet. At present, this idea has been extended to other fields, and has been greatly developed in practice. Many application-consistent hashing algorithms are used in dynamically changing cache clusters to meet the four criteria mentioned above. 2. PrincipleHash, also called Hash Columnis to transform the input of any length into a fixed-length output through a hashing algorithm, which is either a hash value or a hash value. A consistent hash virtual the entire hash value space into a closed hash ring, assuming that the hash value space is 2 of 32 square, that is, the hash value space is 0~ 2^32-1, as shown in:
Figure 1 Hash ring map An object to a hash ringSuppose there are four objects of Object1, Object2, Object3, Object4, and the hash algorithm calculates their key value, and then maps to the hash ring, as shown in: hash (object1) = Key1;hash (object2) = Key2;hash (OBJECT3) = Key3;hash (OBJECT4) = Key4;
Figure 2 Object hashes Map a node to a hash ringSuppose that there are cache a, cache B, and three caches nodes, the hash algorithm computes their hashes and maps to the hash ring, as shown in: hash (cache A) = Key A; Hash (Cache b) = Key B; Hash (Cache c) = Key C;
Figure 3 node hash Generally, the hash value of the compute node can use the node's IP or the alias of the node as the input value. It is important to note that the hash algorithm used by compute nodes is the same as the hash algorithm used by the computed object, so that the node is the same as the hash space of the object, and the object is stored in a clockwise direction from the node closest to itself. As can be seen from Figure 3, Object1 is stored to the cache a node, object2 and object3 are stored to the cache C node, and Object4 is stored to the cache b node. node ChangesHashing algorithm using simple modulo the biggest problem is that when the number of nodes changes, the data of the node is invalidated, then the consistency hashing algorithm can avoid these problems, whether it can meet the above four criteria?
    • Adding nodes
Add a new node in the cluster cache D, assuming that on the hash ring, the node is hashed and mapped to Object2 and OBJECT3, as shown in:
Figure 4 adds the Convention that the node is stored in a clockwise direction, and the Object2 is migrated to the cache D node, and the other objects remain in their original positions.
    • Delete a node
Assuming that the node cache B fails to hang, the contract is stored in a clockwise direction, and the Object4 is migrated to the cache C node, and the other objects remain the same, as shown in the following:
Figure 5 Delete node after the analysis of the increased node and the deletion node, the consistency hashing algorithm keeps the data migration to a minimum while maintaining monotonicity, avoiding the fact that the back-end server crashes because of the heavy pressure caused by data migration. Balanced DesignBased on the above analysis, the consistency hashing algorithm can satisfy the monotonicity, dispersion and load standard of the dynamic cache cluster system, but it can't determine whether its design can satisfy the balance.       The following explains how the consistent hashing algorithm can be used to meet the balance.       In the example of node deletion, only the two nodes of cache A and cache c are stored in the system, and the Object1 store to the cache a node, object2, OBJECT3, OBJECT4 storage to the cache C node, the distribution is obviously unbalanced. In order to solve this problem, the consistency algorithm introduces Virtual NodeConcept, virtual node is the actual node on the hash ring replicator, a real node can correspond to multiple virtual nodes. Now assume that a real point corresponds to 2 virtual nodes, after hash calculation, mapped to the hash ring, so there are 4 virtual nodes on the hash ring, as shown in:
Figure 6 The virtual node is stored in the clockwise direction of the Convention, Object1 storage to the cache A2 node, OBJECT2 storage to the cache A1 node, OBJECT3 storage to the cache C1 node, OBJECT4 storage to the cache C2 node.       Such a situation is an ideal situation, so object1, Object2 storage to the cache a node, OBJECT3, Object4 storage on the cache C node, the balance is satisfied. When a virtual node is introduced, the mapping relationship is transformed from an object-like node into a virtual node, where the mapping diagram looks like the actual node where the object resides:
Figure 7 Finding the hash input for a node's virtual node can be the IP plus digital suffix of the actual node, assuming that the IP of cache A is 192.168.1.100, then the hash value for the cache A1 and cache A2 is: hash ("192.168.1.100 #1") = key C1; Hash ("192.168.1.100 #2") = key C2;

At this point, we can determine that the consistent hashing algorithm can meet the balance, monotonicity, dispersion, load four standard of distributed cluster, and now it has been widely used in various fields. 

Consistent hash (consistent hashing) algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.