Consistent hashing algorithm--the core problem of algorithm solving is that when the slot number changes, the data can be moved as little as possible

Source: Internet
Author: User
Tags memcached

Consistent hashing algorithm

Excerpt from: http://blog.codinglabs.org/articles/consistent-hashing.html

Algorithm Brief

The consistent hashing algorithm (consistent Hashing) was first published in the paper consistent Hashing and Random trees:distributed Caching protocols for relieving hot Spot s on the World Wide Web. In simple terms, a consistent hash organizes the entire hash value space into a virtual ring, such as assuming that the value space of a hash function h is 0-232-1 (that is, the hash value is a 32-bit unsigned shape), and the entire hash space loop is as follows:

The entire space is organized in a clockwise direction. 0 and 232-1 coincide in the direction of 0 points.

The next step is to use H to make a hash of each server, select the server's IP or hostname as the keyword hash, so that each machine can determine its location on the Hashi, this assumes that the above three servers using the IP address hash after the location of the ring space is as follows:

Next, use the following algorithm to locate the data access to the appropriate server: the Data key using the same function H to calculate the hash value H, by the H to determine the position of this data on the ring, from this position along the ring clockwise "walk", the first server encountered is the server it should be located.

For example, we have a, B, C, D four data objects, after hashing, the position on the ring space is as follows:

Based on the consistent hashing algorithm, data A is set to server 1, D is set to server 3, and B and C are set to server 2, respectively.

Fault Tolerance and Scalability analysis

The following is an analysis of the fault tolerance and extensibility of the consistent hashing algorithm. now assume that server 3 is down:

you can see that a, C, B are not affected at this point, only the D node is relocated to server 2. in general, in a consistent hashing algorithm, if a server is unavailable, the affected data is only data between the server and the previous server in its ring space (that is, the first server encountered in the counterclockwise direction), and the others are unaffected.

Consider the other case if we add a server to the system memcached Server 4:

At this point A, D, C are unaffected and only B needs to be relocated to the new server 4. In general, in a consistent hashing algorithm, if you add a server, the affected data is only the data between the new server and the previous server in its ring space (that is, the first server encountered in the counterclockwise direction), and the others are unaffected.

To sum up, the consistency hashing algorithm can only reposition a small subset of data in the ring space for the increment and decrease of the nodes, which has good fault tolerance and expansibility.

Virtual node

The consistency hashing algorithm is too young for the service node, and is prone to data skew due to uneven node division. For example, there are two servers in our system, and their rings are distributed as follows:

This inevitably results in a large amount of data being concentrated on server 1, and only a very small number will be located on server 2. In order to solve this data skew problem, the consistent hashing algorithm introduces the virtual node mechanism, that is, to compute multiple hashes for each service node, and to place a service node, called a virtual node, for each computed result location. This can be done by adding numbers to the server IP or host name. For example, we decided to compute three virtual nodes for each server, so we can calculate "Memcached server 1#1", "Memcached server 1#2", "Memcached server 1#3", "Memcached Server 2#1 "," Memcached server 2#2 "," Memcached server 2#3 "hash values, resulting in six virtual nodes:

At the same time, the data location algorithm is not changed, just one step more virtual node to the actual node mapping, such as positioning to "Memcached server 1#1", "Memcached server 1#2", "Memcached server 1#3" Data for three virtual nodes is located on server 1. This solves the problem of data skew when the service node is young. In practical applications, the number of virtual nodes is usually set to 32 or greater, so even a few service nodes can achieve a relatively uniform data distribution.

Summarize

The current consistent hash is basically a standard configuration for distributed system components, such as Memcached's various clients that provide built-in consistent hash support. This article is just a brief introduction to this algorithm, more in-depth content can be see the paper "Consistent Hashing and Random trees:distributed Caching protocols for relieving hot spots on t He world Wide Web, and also provides a C language version of the implementation for reference.

Implementation: Https://community.oracle.com/blogs/tomwhite/2007/11/27/consistent-hashing http://www.cnblogs.com/xrq730/p/ The essence of 5186728.html lookup is implemented using the Find tree sorted map.

Importjava.util.Collection;ImportJava.util.SortedMap;ImportJava.util.TreeMap; Public classConsistenthash<t> { Private Finalhashfunction hashfunction;Private Final intNumberofreplicas;Private FinalSortedmap<integer, t> circle =NewTreemap<integer, t>();  PublicConsistenthash (Hashfunction hashfunction,intNumberofreplicas, Collection<T>nodes) {    This. hashfunction =hashfunction;  This. Numberofreplicas =Numberofreplicas;  for(T node:nodes) {Add (node); } }  Public voidAdd (T node) { for(inti = 0; i < Numberofreplicas; i++) {circle.put (Hashfunction.hash (node.tostring ()+i), node); } }  Public voidRemove (T node) { for(inti = 0; i < Numberofreplicas; i++) {circle.remove (Hashfunction.hash (node.tostring ()+i)); } }  PublicT get (Object key) {if(Circle.isempty ()) {return NULL; }   inthash =Hashfunction.hash (key); if(!Circle.containskey (hash)) {SortedMap<integer, t> tailmap =Circle.tailmap (hash); Hash= Tailmap.isempty ()?Circle.firstkey (): Tailmap.firstkey (); }   returnCircle.get (hash);}}

Consistent hashing algorithm--the core problem of algorithm solving is that when the slot number changes, the data can be moved as little as possible

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.