Consistent hashing algorithm

Source: Internet
Author: User

The consistency hash satisfies the following four (in the case of a multi-cache node environment):

1. equalization (Balance): The result of the hash can be distributed to all buffers as far as possible, so that all buffer space can be exploited;

2. monotonicity (monotonicity): If there is already some content through the hash to the corresponding buffer node, and a new buffer node added to the system, then the result of the hash should be able to ensure that the original allocated content (after failure) can be mapped to the new buffer node, Instead of being mapped (again) to the old buffer node. After removing or adding a cache node, there is no damage to the existing content-the mapping between nodes;

3. Avoid dispersion (Spread): Because different terminals see the buffer range may be different, resulting in inconsistent results of the hash, the end result is the same content by different terminals mapped to different buffer nodes. This obviously should be avoided, because it causes the same content to be stored in different buffers, reducing the efficiency of the system storage;

4. Avoid load: Since different terminals may map the same content to different buffer nodes, it is possible for a particular buffer node to be mapped to different content by different terminals. As with dispersion, this situation should also be avoided, so a good hashing algorithm should be able to minimize the buffering load.

Consistency Hashilu by algorithm:

1. content Mapping and node mapping : Content and nodes are mapped to the same hash space, and each has a unique ID (key) corresponding to them, so their ID (key) is in the same domain, such as the 0000 to 9999 integer collection. The content can be stored on the node closest to its obtained ID (key), such as the content of key 1001, the system has a node with ID 1000,1010,1100, and the content will be mapped to 1000 nodes;

2. virtual node : hash algorithm is not guaranteed absolute balance, if the node is small, the content can not be evenly mapped to each node. In order to meet the requirements of balance, and in the hash space to increase the balance of the virtual node, that is, the actual node in the hash space of the replica (replica), a real node corresponding to a number of virtual nodes, the corresponding number has become the number of copies, virtual node in the hash The space also has its own only corresponding key, where the content of the virtual node will be mapped to the physical node belonging to;

3. Upper and Lower nodes : The upstream node refers to the node with the ID value greater than the minimum value in the current node, and the downstream node refers to the node with the ID value less than the maximum value in the current node. A consistent hash requires each node to store its upper and lower node information (such as IP);

4. Content Search : The node that receives the query request if it finds that it has the requested target, it can return the acknowledgement directly to the node initiating the query request, and if it is not found, it can forward the request to its own upper and lower nodes;

5. node exit and join : When the node exits or joins the system, the neighboring nodes must update the routing information in time, which requires the node not only to store the directly connected up and down node information, but also to know the indirect node information of a certain depth n jump, and to maintain the node list dynamically.


Consistent hashing basically solves the most critical problem in a peer-to-peer environment-how to distribute storage and routing in a dynamic network topology. Each node only needs to maintain a small number of neighboring nodes, and when the node joins/exits the system, only a small number of related nodes participate in the maintenance of the topology. However, there are disadvantages to the consistent hashing routing algorithm, in which the query message passes through an O (n) Step (n represents the total number of nodes in the system) in order to reach the queried node. It is not difficult to imagine that when the system is very large, the number of nodes may be more than million, such query efficiency is obviously difficult to meet the needs of use.

Consistent hashing algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.