An in-depth study of consistency hash algorithm and Java code implementation

Last Update:2017-06-01 Source: Internet

Author: User

Tags abs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Consistent hash algorithm

About the consistency hash algorithm, in my previous blog post has been mentioned many times, Memcache ultra-detailed interpretation of the "consistent hash Algorithm" section, for why the use of consistent hash algorithm, consistent hash algorithm principle of the algorithm to do a detailed interpretation.

The exact principle of the algorithm is affixed here again:

First constructs an integer ring of length 232 (this ring is called the consistency hash ring), according to the node name hash value (its distribution is [0, 232-1]) place the server node on this hash ring, then calculates the hash value according to the Data key value (its distribution also is [0, 232-1]), then on the hash ring clockwise to find the value of the hash value of the nearest server node, complete the key-to-server mapping lookup.

This algorithm solves the problem of poor scalability of the common remainder hash algorithm, and can ensure that as many requests as possible in the case of on-line and offline servers hit the original routed server.

Of course, everything can not be perfect, the consistency hash algorithm is more scalable than the ordinary remainder hash algorithm, but at the same time its algorithm implementation is more complex, this article to study how to use Java code to implement a consistent hash algorithm. Before starting, some research on some core problems in the consistency hash algorithm is carried out first.

Selection of data structure

One of the first things to consider in a consistent hash algorithm is to construct an integer loop with a length of 232, placing the server node on the hash ring based on the hash value of the node name.

So what data structures should be used for integer loops to minimize the time complexity of the runtime? First of all, regarding time complexity, the relationship between common time complexity and time efficiency has the following rules of thumb:

O (1) < O (log2n) < O (n) < O (n * log2n) < O (N2) < O (N3) < 2N < 3N < n!

Generally speaking, the first four efficiency is higher, the middle two is passable, the latter three are poor (as long as n is relatively large, the algorithm will not move). OK, continue to the previous topic, how to choose the data structure, I think there are several feasible solutions.

1. Solution One: Sort +list

The first way I think about it is to figure out the hash value of the node name to be added to the data structure into an array, and then use some sort algorithm to sort it from small to large, and finally put the sorted data into the list, using list instead of array for node expansion.

Then, the node to be routed, only need to find in the list of the first hash value larger than its server node , such as the hash value of the server node is [0,2,4,6,8,10], lead by the node is 7, only need to find the first 7 large integer, that is, 8, Is the server node that we eventually need to route past.

If the previous sort is not considered, the time complexity of this solution is:

(1) The best thing to do is to find it for the first time, with the complexity of O (1)

(2) Worst case scenario is last found, time complexity is O (N)

The average time complexity is O (0.5n+0.5), ignoring the first coefficient and constant, and the time complexity is O (N).

But considering the previous sort, I looked up a graph on the web, providing the time complexity of various sorting algorithms:

It can be seen that the sorting algorithm is stable but the time complexity is high, or the time complexity is low but unstable, the time complexity of the best merge sorting method still has O (N * logn), a little bit of performance.

2. Solution Two: Traverse +list

Since the sorting operation is more cost-intensive, can it not be sorted? Yes, so further, there's a second solution.

The solution uses the list to be the same, but it can be traversed in a way that:

(1) Server nodes are not sorted, and their hash values are all directly placed in a list

(2) leads the node, calculates its hash value, because indicated "clockwise", therefore traverses the list, is compared to the route the node hash value to calculate the travel value and records, is compared to the route node hash value small neglect

(3) After calculating all the difference, the smallest one is the node that eventually needs to route the past.

In this algorithm, look at the complexity of the time:

1, the best case is that only one server node hash value is greater than the lead by the hash value of the junction, its time complexity is O (n) +o (1) =o (n+1), ignoring the constant term, that is, O (n)

2, the worst case is that all server nodes hash value is greater than the lead by the hash value of the node, its time complexity is O (n) +o (n) =o (2N), ignoring the first factor, that is, O (n)

So, the total time complexity is O (N). In fact, the algorithm can be more improved: to a position variable x, if the new difference is smaller than the original difference, X is replaced with a new position, otherwise x will not change. The traversal is reduced by one round, but the improved algorithm time complexity is still O (N).

All in all, this solution is better than the solution as a whole.

3. Solution 3:2 Fork Find Tree

Aside from the data structure of the list, the other data structure uses a two-fork lookup tree . For the tree is not very clear friends can simply look at this article tree-shaped structure.

Of course we can't simply use a two-fork lookup tree, because there may be imbalances. Balanced binary search tree with AVL tree, red and black trees, here using red and black trees, the choice of red and black trees for the reason there are two points:

1, red black Tree's main role is to store the orderly data, which in fact and the first solution to the idea of the same, but its efficiency is very high

2, the JDK provides the red and black Tree Code implementation TREEMAP and TreeSet

In addition, with TreeMap as an example, TreeMap itself provides a tailmap (K-Fromkey) method that supports finding a collection of values larger than fromkey from a red-black tree, but does not need to traverse the entire data structure.

The use of red and black trees can reduce the time complexity of finding to O (Logn), which is significantly more efficient than the above two solutions.

To verify this, I did a test to find the first data in a large amount of data that is greater than the value between them, such as 10000 data to find the first data greater than 5000 (simulated average). Take a look at O (N) time complexity and O (LOGN) time complexity operating efficiency comparison:

	50000	100000	500000	1000000	4000000
ArrayList	1ms	1ms	4ms	4ms	5ms
Link Edlist	4ms	7ms	11ms	13ms	17ms
TreeMap	0ms	0ms	0ms	0ms	0ms

Because the memory is too large to overflow, so only test to 4000000 data. It can be seen that the efficiency of data search, TreeMap is a win-win, in fact, the same increase in data testing, the red and black tree data structure determines any one greater than n minimum data, it is only a few to dozens of times to find can be found.

Of course, to be clear, there are pros and cons, according to my other test conclusion is that in order to maintain the red and black trees, data insertion efficiency treemap in three data structures is the worst, and the insertion is slower than 5~10 times .

Hash value recalculation

Server node We definitely use a string to represent, such as "192.168.1.1", "192.168.1.2", according to the string to get its hash value, then another important problem is that the hash value to recalculate, this problem is I am testing a string of the hashcode ( ) method, you might want to take a look at why you should recalculate the hash value:

/** * String Hashcode () method operation results View * @author May Cangjie http://www.cnblogs.com/xrq730/* */public class stringhashcodetest{    public static void Main (string[] args)    {        System.out.println ("hash value for" 192.168.0.0:111: "+" 192.168.0.0:1111 ". Hashcode ());        System.out.println ("192.168.0.1:111 hash Value:" + "192.168.0.1:1111". Hashcode ());        System.out.println ("192.168.0.2:111 hash Value:" + "192.168.0.2:1111". Hashcode ());        System.out.println ("192.168.0.3:111 hash Value:" + "192.168.0.3:1111". Hashcode ());        System.out.println ("192.168.0.4:111 hash Value:" + "192.168.0.4:1111". Hashcode ());}    }

This is a big problem, [0,232-1] in the interval, 5 hashcode values are only distributed in such a small interval, what concept? There are 4,294,967,296 numbers in [0,232-1], and our interval is only 114516604, which leads to the probability that 97% of the servers to be routed are routed to the "192.168.0.0" cluster point, which is simply awful!

There is also a bad place: The specified interval is non-negative, string Hashcode () method will produce a negative number (not credit "192.168.1.0:1111" try to know). But the problem is solved, the absolute value is a solution.

In conclusion, the Hashcode () method of string rewriting has no practical value in the consistency hash algorithm, and an algorithm is used to recalculate the hashcode. This recalculation of hash value algorithm has many, such as Crc32_hash, Fnv1_32_hash, Ketama_hash, and so on, where Ketama_hash is the default Memcache recommended consistency hash algorithm, with other hash algorithm can also, For example, the computational efficiency of the FNV1_32_HASH algorithm will be higher.

Conformance hash Algorithm implementation version 1: without virtual nodes

Using the consistent hash algorithm, although the scalability of the system is enhanced, but it may also lead to uneven load distribution, the solution is to use the virtual node instead of the real node , the first code version, first to a simple, without virtual node.

Here's a look at the Java code implementation of a consistent hash algorithm without a virtual node:

/** * Consistent hash algorithm without virtual node * @author May Cangjie http://www.cnblogs.com/xrq730/* */public class consistenthashingwithoutvirtualnode{/** * Add Hash Ring Server list */private static string[] Servers = {"192.168.0.        0:111 "," 192.168.0.1:111 "," 192.168.0.2:111 "," 192.168.0.3:111 "," 192.168.0.4:111 "}; /** * Key indicates the hash value of the server, value indicates the name of the server */private static Sortedmap<integer, string> SortedMap = n        EW Treemap<integer, string> ();            /** * Program initialization, put all the servers into SortedMap */static {for (int i = 0; i < servers.length; i++) {            int hash = Gethash (servers[i]);            System.out.println ("[" + Servers[i] + "] joins the set, its hash value is" + hash ";        Sortedmap.put (hash, servers[i]);    } System.out.println (); }/** * Calculates the hash value of the server using the Fnv1_32_hash algorithm, which does not use the method of overriding Hashcode, the final effect is no different. */private static int Gethash (String str        ) {Final int p = 16777619;      int hash = (int) 2166136261L;  for (int i = 0; i < str.length (); i++) hash = (hash ^ str.charat (i)) * p;        Hash + = Hash << 13;        Hash ^= Hash >> 7;        Hash + = Hash << 3;        Hash ^= Hash >> 17;                Hash + = Hash << 5;        If the calculated value is negative, take its absolute if (hash < 0) hash = math.abs (hash);    return hash;        /** * Get the node that should be routed to, */private static string Getserver (String node) {//Get the hash value of the lead-in node        int hash = gethash (node);        Get all the map Sortedmap<integer that are larger than the hash value, string> subMap = Sortedmap.tailmap (hash);        The first key is the one that is clockwise to the nearest node, Integer i = Submap.firstkey ();    Returns the corresponding server name return submap.get (i); } public static void Main (string[] args) {string[] nodes = {"127.0.0.1:1111", "221.226.0.1:2222", "10.21        1.0.1:3333 "}; for (int i = 0; i < nodes.length; i++) System.out.println ("[" + Nodes[i] + "] hash valueFor "+ Gethash (Nodes[i]) +", routed to Node ["+ Getserver (Nodes[i]) +"] "); }}

You can run a look at the results:

.168.0.0:111] into the collection, whose hash value is 575774686[192.168.0.1:111] added to the collection, with a hash value of 8518713[192.168.0.2:111] added to the collection, Its hash value is 1361847097[192.168.0.3:111] added to the set, its hash value is 1171828661[192.168.0.4:111] added to the set, its hash value is 1764547046[ 127.0.0.1:1111] has a hash value of 380278925 and is routed to the node [192.168.0.0:111][221.226.0.1:2222] with a hash value of 1493545632, which is routed to the node [ 192.168.0.4:111][10.211.0.1:3333] has a hash value of 1393836017 and is routed to the node [192.168.0.4:111]

It is better to see the hash value after Fnv1_32_hash algorithm recalculation than the original string Hashcode () method. From the results of the operation, there is no problem, three points are routed clockwise to their hash value on the nearest server.

Using virtual nodes to improve the consistency hash algorithm

The above-mentioned consistency hash algorithm can solve the problem of poor system scalability in many distributed environments, but it will bring another problem: uneven load.

For example, there is a hash ring on a, B, c three server nodes, respectively, 100 requests will be routed to the appropriate server. Now a node D is added between A and B, which causes some nodes that were routed to B to be routed to D so that a and C are routed to more requests than B and D, and the load on the original three server nodes is broken. In a way, this loses the sense of load balancing, because the purpose of load balancing is to make the target server evenly distribute all requests .

The solution to this problem is to introduce a virtual node, which works by splitting a physical node into multiple virtual nodes, and the virtual nodes of the same physical node are distributed as evenly as possible on the hash ring . In this way, it is possible to effectively solve the problem of increasing or reducing the load imbalance at the node.

As to how many virtual nodes a physical node should be split into, you can look at a diagram first:

The horizontal axis represents the number of virtual nodes that need to be extended for each benefit server, and the vertical axes represent the actual numbers of physical servers. It can be seen that the physical servers are few and require larger virtual nodes, whereas the physical servers are more and the virtual nodes can be less. For example, there are 10 physical servers, so it is almost necessary to add 100~200 virtual nodes for each server to achieve a true load balancer.

Conformance hash Algorithm implementation version 2: With virtual nodes

After understanding the theoretical basis of using virtual nodes to improve the consistency hash algorithm, you can try to develop code. Some of the issues to be considered in programming are:

1. How does a real node correspond to multiple virtual nodes?

2, how to restore the virtual node to the true nodes?

These two problems actually have a lot of solutions, I use a simple method here, give each real node after the virtual nodes plus the suffix and then take the hash value, such as "192.168.0.0:111" to turn it into "192.168.0.0:111&&vn0" To "192.168.0.0:111&&vn4", VN is the abbreviation of virtual node, when restoring only need to intercept the string from the beginning of the "&&" position.

Here's a look at the Java code implementation of the consistent hash algorithm with virtual nodes:

/** * Consistent hash algorithm with virtual node * @author May Cangjie http://www.cnblogs.com/xrq730/*/public class consistenthashingwithvirtualnode{/ * * Add hash ring to the list of servers */private static string[] Servers = {"192.168.0.0:111", "192.168.0.1:111", "192.168.0.2:1        11 "," 192.168.0.3:111 "," 192.168.0.4:111 "}; /** * Real Node list, considering the server on-line, offline scenario, that is, add, delete the scene will be more frequent, where the use of LinkedList will be better */private static list<string> Realnodes = n        EW linkedlist<string> ();             /** * Virtual node, key represents the hash value of the virtual node, value represents the name of the virtual node */private static Sortedmap<integer, string> virtualnodes =        New Treemap<integer, string> ();        /** * Number of virtual nodes, written here dead, in order to demonstrate the need, a true node corresponds to 5 virtual nodes */private static final int virtual_nodes = 5; static {///First add the original server to the real node list for (int i = 0; i < servers.length; i++) Realnodes.add (serve                Rs[i]); Adding virtual nodes, traversing LinkedList using foreach loop efficiency will compare high for (String str:realnodes) {for (int i = 0; i < virtual_nodes;                i++) {String virtualnodename = str + "&AMP;&AMP;VN" + string.valueof (i);                int hash = Gethash (virtualnodename);                SYSTEM.OUT.PRINTLN ("Virtual node [" + Virtualnodename + "] is added, hash value is" + hash);            Virtualnodes.put (hash, virtualnodename);    }} System.out.println (); }/** * Calculates the hash value of the server using the Fnv1_32_hash algorithm, which does not use the method of overriding Hashcode, the final effect is no different. */private static int Gethash (String str        ) {Final int p = 16777619;        int hash = (int) 2166136261L;        for (int i = 0; i < str.length (); i++) hash = (hash ^ str.charat (i)) * p;        Hash + = Hash << 13;        Hash ^= Hash >> 7;        Hash + = Hash << 3;        Hash ^= Hash >> 17;                Hash + = Hash << 5;        If the calculated value is negative, take its absolute if (hash < 0) hash = math.abs (hash);    return hash; }/** * Gets the node to which it should be routed */private staticString Getserver (String node) {//Gets the hash value of the node being led by int hash = Gethash (node);        Get all the map Sortedmap<integer that are larger than the hash value, string> subMap = Virtualnodes.tailmap (hash);        The first key is the one that is clockwise to the nearest node, Integer i = Submap.firstkey ();        Returns the corresponding virtual node name, where the string is slightly truncated to Virtualnode = Submap.get (i);    Return virtualnode.substring (0, Virtualnode.indexof ("&&")); } public static void Main (string[] args) {string[] nodes = {"127.0.0.1:1111", "221.226.0.1:2222", "10.21        1.0.1:3333 "};                     for (int i = 0; i < nodes.length; i++) System.out.println ("[" + Nodes[i] + "] hash value" +    Gethash (Nodes[i]) + ", routed to Node [" + Getserver (Nodes[i]) + "]"); }}

Follow the results of the operation:

The virtual node [192.168.0.0:111&&vn0] is added, the hash value is 1686427075 virtual node [192.168.0.0:111&&vn1] is added, The hash value is 354859081 virtual node [192.168.0.0:111&&vn2] is added, the hash value is 1306497370 virtual node [192.168.0.0:111&AMP;&AMP;VN3] is added, The hash value is 817889914 virtual node [192.168.0.0:111&AMP;&AMP;VN4] is added, the hash value is 396663629 virtual node [192.168.0.1:111&&vn0] is added, The hash value is 1032739288 virtual node [192.168.0.1:111&&vn1] is added, the hash value is 707592309 virtual node [192.168.0.1:111&&vn2] is added, The hash value is 302114528 virtual node [192.168.0.1:111&AMP;&AMP;VN3] is added, the hash value is 36526861 virtual node [192.168.0.1:111&AMP;&AMP;VN4] is added, The hash value is 848442551 virtual node [192.168.0.2:111&&vn0] is added, the hash value is 1452694222 virtual node [192.168.0.2:111&&vn1] is added, The hash value is 2023612840 virtual node [192.168.0.2:111&&vn2] is added, the hash value is 697907480 virtual node [192.168.0.2:111&AMP;&AMP;VN3] is added, The hash value is 790847074 virtual node [192.168.0.2:111&AMP;&AMP;VN4] is added, the hash value is 2010506136 virtual node [192.168.0.3:111&&vn0] is added, The hash value is 891084251 virtual node [192.168.0.3:111&&vn1] is added, the hash value is 1725031739 virtual node [192.168.0.3:111&&vn2] is added, Hash value is 1127720370 virtual node [192.168.0.3:111&AMP;&AMP;VN3] is added, the hash value is 676720500 virtual node [192.168.0.3:111&AMP;&AMP;VN4] is added, the hash value is 2050578780 virtual node [192.168.0.4:111 &&vn0] is added, the hash value is 586921010 virtual node [192.168.0.4:111&&vn1] is added, the hash value is 184078390 virtual node [192.168.0.4:111 &AMP;&AMP;VN2] is added, the hash value is 1331645117 virtual node [192.168.0.4:111&AMP;&AMP;VN3] is added, the hash value is 918790803 virtual node [192.168.0.4:111 &AMP;&AMP;VN4] is added, the hash value of 1232193678[127.0.0.1:1111] is 380278925, and is routed to the node [192.168.0.0:111][221.226.0.1:2222]. A hash value of 1493545632, which is routed to the node [192.168.0.0:111][10.211.0.1:3333] with a hash value of 1393836017, is routed to the node [192.168.0.2:111]

From the results of the code run, each point routed to the server is the hash value clockwise from its nearest server node, there is no problem.

By taking the virtual node method, a real knot is no longer fixed at a point in the hash, but a large number of distribution in the entire hash ring, so that even on-line, offline server, will not cause the overall load imbalance.

Postscript

In writing this article, a lot of knowledge I also write while learning, inevitably there are a lot of poorly written, understanding of the place, and the overall code is also relatively rough, did not take into account the possible various situations. To put, on the one hand, the wrong place, but also hope that friends to correct, on the other hand, I will continue to work through their own, learning to constantly improve the above code.

An in-depth study of consistency hash algorithm and Java code implementation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More