Consistent hashing algorithm-balance-Virtual node
The consistent hashing algorithm is a common algorithm in distributed systems. For example, a distributed storage system, to store data on a specific node, if the use of ordinary hash method, the data mapped to a specific node, such as Key%n,key is the data key,n is the number of machine nodes, if a machine joins or exits the cluster, then all the data map is invalid , if you are persisting the storage to do the data migration, if it is distributed cache, then the other cache will be invalidated.
Therefore, a consistent hashing algorithm is introduced:
The data is mapped into a large space using a hash function (such as MD5). When the data is stored, a hash value is obtained, corresponding to each position in the ring, such as the K1 corresponds to the position shown in the figure, then a machine node B is found clockwise, and the K1 is stored in the Node B.
If the b node goes down, the data on B falls to the C node, as shown in:
In this way, only the C node is affected and the data of other nodes a,d is not affected. However, this will create an "avalanche" situation, the C node due to bear the B-node data, so the C node load will be high, C node is easy to go down, so in turn, so that the entire cluster is hung.
To this end, the concept of "virtual node" is introduced: that is, there are many "virtual nodes" in this ring, the storage of data is to find a virtual node in the clockwise direction of the ring, each virtual node will be associated to a real node , as used:
The figure of A1, A2, B1, B2, C1, C2, D1, D2 are virtual nodes, machine a load storage A1, A2 data, machine B load Storage B1, B2 data, machine C load Storage C1, C2 data. Because these virtual nodes are large in number and evenly distributed, they do not cause "avalanche" phenomena.
======================================end======================================
Consistent hashing algorithm-balance-Virtual node