Each node has a hash value that is configured on the ring (0 to 2 of 32), which is the equivalent of the ring allocated to these nodes, each node responsible for a section. The data is also given a hash value based on the same hash function, and in which paragraph the hash value falls, indicating that the node is stored on the corresponding node.
Each new node is added or removed, only the adjacent node is affected, so a consistent hash solves the problem of data redistribution caused by node changes.
Cassandra1.2 previously used this standard consistent hashing algorithm: Assigning a token to each node, depending on the token value, determines the node's position in the cluster and the data range that the node stores.
Because of the problem of uneven distribution of data in this way, After Cassandra1.2, the idea of virtual node was adopted: Do not need to allocate token for each node, the ring into more parts, so that each node responsible for multiple parts of the data, such a node removed, it is responsible for a number of token will be hosted to multiple node processing, this idea solves the problem of uneven distribution of data.
As shown, the above section is a standard consistency hash, each node is responsible for a continuous section of the ring, if Node2 suddenly down, Node2 responsible for the data hosted to Node1, that is, Node1 responsible for EFAB four paragraphs, If Node1 inside there are a lot of hot user data produced by the NODE1 has been a little hold up, happen to B is also hot user generated data, so Node1 may follow Node2 and go, Node1 go, Node6 also hold live.
The following section is a virtual node implementation where each node is no longer responsible for contiguous parts, and the ring is divided into more parts. If Node2 suddenly down, the Node2 responsible data is not all managed to the Node1, but is managed to multiple nodes. It also retains the characteristics of a consistent hash.