Consistent hashing)

Source: Internet
Author: User
Http://blog.csdn.net/cywosp/article/details/23397179 consistent hash algorithm in 1997 by the Massachusetts Institute of Technology, a distributed Hash (DHT) algorithm, designed to solve the hot spot in the Internet) the original intention is very similar to that of carp. Consistent hash corrected the problems caused by the simple hash algorithm used by carp, so that the Distributed Hash (DHT) can be truly applied in P2P environments. Consistent hash algorithms propose four definitions for determining the quality of hash algorithms in a dynamically changing cache environment:
1. Balance: balance refers to the distribution of hash results to all buffers as much as possible, so that all buffer spaces can be used. Many hash algorithms can meet this condition.
 
2. Monotonic (monotonicity): monotonicity means that if some content is already allocated to the corresponding buffer through hash, new buffering is added to the system. The hash result should ensure that the original allocated content can be mapped to the original or new buffer, instead of mapped to other buffers in the old buffer set.
 
3. Spread: in a distributed environment, the terminal may not see all the buffers, but only some of them. When the terminal wants to map content to the buffer through the hash process, the buffering range seen by different terminals may be different, resulting in inconsistent hash results, the final result is that the same content is mapped to different buffers by different terminals. This situation should be avoided because it causes the same content to be stored in different buffers, reducing the system storage efficiency. Dispersion is defined as the severity of the above situation. A good hash algorithm should be able to avoid inconsistencies as much as possible, that is, to minimize dispersion.
 
4. Load: The load problem is actually a problem of decentralization from another perspective. Since different terminals may map the same content to different buffer zones, different users may map different content to a specific buffer zone. This situation should also be avoided like dispersibility. Therefore, a good hash algorithm should be able to reduce the buffer load as much as possible.
In a distributed cluster, operations such as adding or deleting a machine or automatically leaving the cluster after a machine fails are the most basic functions of distributed cluster management. If the common Hash (object) % n algorithm is used, after a machine is added or deleted, many original data cannot be found, which seriously violates the monotonicity principle. The following describes how to design a consistent hash algorithm: Ring hash spaceHash the corresponding key to a bucket with 2 ^ 32 power according to the common hash algorithm, that is, 0 ~ (2 ^ 32)-1. Now we can connect these numbers to the beginning and end as a closed ring. For example Map Data to the ring after processing through a certain Hash AlgorithmNow we use a specific hash function to calculate the corresponding key value for the object1, object2, object3, and object4 objects, and then hash them to the hash ring. For example, hash (object1) = key1; Hash (object2) = key2; Hash (object3) = key3; Hash (object4) = key4; Map machines to the ring using the hash algorithmAdd new machines to distributed clusters using consistent hash algorithms, the principle is to map machines to the ring by using the same hash algorithm as OSS (generally, the hash calculation for machines uses the IP address of the machine or the unique alias of the machine as the input value ). ), then, in a clockwise direction, all objects are stored in the machine closest to you. Assume that there are three machines, node1, node2, and node3, and the corresponding key value is obtained through the hash algorithm and mapped to the ring, which is as follows: Hash (node1) = key1; Hash (node2) = key2; Hash (node3) = key3; it can be seen that the object is in the same hash space as the machine, so that the clockwise rotation of object1 is stored in node1, and object3 is stored in node2, object2 and object4 are stored in node3. In such a deployment environment, the hash ring will not change. Therefore, you can quickly locate the corresponding machine by calculating the hash value of the object, in this way, you can find the real storage location of the object. Delete and add machinesThe most inappropriate part of the general hash remainder algorithm is that after a machine is added or deleted, a large number of object storage locations become invalid, which greatly does not meet the monotonicity. Next we will analyze how the consistent hash algorithm is processed. 1. for example, if node2 fails to be deleted, object3 will be migrated to node3 in the clockwise migration method, in this way, only the objecing position of object3 has changed, and other objects have not changed. Example: 2. add node (MACHINE) if you add a new node node4 to the cluster, use the corresponding hash algorithm to obtain key4 and map it to the ring. For example, by following the clockwise migration rules, then, object2 is migrated to node4, and other objects still maintain the original storage location. Through the addition and deletion of nodes, the consistent hash algorithm keeps monotonicity while minimizing data migration. This algorithm is very suitable for Distributed clusters, this avoids a large amount of data migration and reduces the pressure on the server. BalanceAccording to the graphic analysis above, the consistent hash algorithm satisfies the monotonicity, load balancing characteristics, and the dispersibility of General hash algorithms. However, it cannot be used as the reason for its wide application, because there is still a lack of balance. Next we will analyze how consistent hash algorithms meet the balance. The hash algorithm is not guaranteed to be balanced. For example, if node1 and node3 are only deployed above (node2 deleted graph), object1 is stored in node1, object2, object3, and object4 are all stored in node3, which leads to a very unbalanced state. In the consistent hash algorithm, virtual nodes are introduced to meet the balance as much as possible. -- "Virtual node" is a replica of an actual node (MACHINE) in the hash space (replica). An actual node (MACHINE) corresponds to several "virtual nodes ", the corresponding number also becomes "Number of copies", and "virtual nodes" are arranged in hash values in the hash space. In the above scenario, only node1 and node3 are deployed (node2 deleted graph). For example, the distribution of previous objects on the machine is not balanced. Now we use two replicas (number of replicas) as an example, there are four virtual nodes in the whole hash ring, and the Relationship Diagram of object ing is as follows: according to the known object ing relationship: object1-> NODE1-1, object2-> NODE1-2, object3-> the NODE3-2, object4-> the NODE3-1. By introducing virtual nodes, objects are evenly distributed. In practice, how does a real object query work? The conversion of objects from hash to virtual nodes to actual nodes. For example, the hash calculation of "virtual nodes" can use the IP address of the corresponding node and the digital suffix. For example, assume that the IP address of node1 is 192.168.1.100. Before introducing "virtual nodes", calculate the hash value of cache A: Hash ("192.168.1.100"); after introducing "virtual nodes, calculate the hash value for the virtual node NODE1-1 and NODE1-2: Hash (192.168.1.100 #1); // NODE1-1Hash (192.168.1.100 #2); // NODE1-2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.