Consistent hashing algorithm

Source: Internet
Author: User

let's start with the three principles of judging the hash algorithm. 1, Balance (Balance): it means that the result of the hash should be distributed evenly to each node, thus solving the load balancing problem from the algorithm. 2, Monotonicity (monotonicity): When adding or deleting nodes, The same key is always the same value that is accessed. 3, Dispersion (Spread): In a distributed environment, the data should be scattered in the distributed cluster of nodes (the node itself can have backup), do not have to each node to store all the data.

The consistent hashing algorithm is a common algorithm in distributed systems. For example, a distributed storage system, to store data on a specific node, if the use of ordinary hash method, the data mapped to a specific node, such as Key%n,key is the data key,n is the number of machine nodes, if a machine joins or exits the cluster, then all the data map is invalid , if you are persisting the storage to do the data migration, if it is distributed cache, then the other cache will be invalidated.

The consistency hash algorithm mentioned here Ketama the practice is: Select the specific machine node is not only rely on the key to cache the hash itself, but the machine node itself is also a hash operation.

Scenario description for a consistent hash

1. Hash machine node

First find the Machine node hash value (how to calculate the Machine node hash?) IP can be used as a hash parameter. Of course there are other ways), and then distribute it to a ring in the 0~2^32 (clockwise distribution). As shown in the following:

Figure A

There are machines in the cluster: A, B, C, D, E five machines, through a certain hash algorithm we distribute it to the ring as shown.

2. Access method

If there is a write cache request where the key value is K, the calculator hash value is hash (k), the hash (k) corresponds to a point in the graph –1 ring, if the point corresponding to a specific machine node is not mapped, then look clockwise until the first time to find the node with the mapped machine, The node is the target node that is determined, and if it exceeds the 2^32 still cannot find the node, hit the first machine node. For example, the hash (K) value is between A~b, then the hit machine node should be a B node (such as).

3, increase the processing of nodes

For example, –1, on the basis of the original cluster to add a machine f, the increase process is as follows:

The hash value of the computer node that maps the machine to a node in the ring, such as:

Figure II

After adding the Machine node F, the access policy does not change, still according to (2) in the manner of access, when the cache is still unavoidable, the data that cannot be hit is the hash (K) in increasing the node before the data between c~f. Although there is still a hit problem caused by the increase of the node, but compared with the traditional method of hashing, the consistency hash has reduced the data to a minimum.

The above is the implementation of a consistent hash, based on this principle, we can see the advantages of a consistent hash

1. Monotonicity of

Because of this algorithm of consistent hashing, it solves the problem of monotonicity effectively. When the nodes in the distribution system change, only a small amount of data is involved in recalculation and migration.

2. Balance of

For the problem of data distribution equalization, the idea of virtual node is adopted to achieve the equilibrium distribution. Of course, the fewer cache server nodes we need, the more virtual nodes are needed to balance the load.

When the number of physical servers is very small, more virtual nodes are needed, whereas fewer nodes are required.

3. Dispersion

About the dispersion, people have not thought to understand, besides, big New Year's

Consistent hashing algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.