Understanding the consistent hashing algorithm

Source: Internet
Author: User

The consistent hashing algorithm was proposed in 1997 by the MIT Karger and others in the solution of the distributed cache, designed to address hot spot issues in the Internet, with the original intent and carp very similar. The consistent hash corrects the problem caused by the simple hashing algorithm used by carp, so that DHT can actually be applied in the peer-to-peer environment.

Consistency Hash Algorithm Properties

A consistent hashing algorithm in a good distributed system needs to meet the following conditions

1. Balance

Balance means that the result of the hash can be evenly distributed to all nodes, so that all nodes can be fully utilized.

2. monotonicity

Monotonicity means that if some content has been allocated to the appropriate buffers by hashing, and a new buffer is added to the system, the result of the hash should be to ensure that the original allocated content can be mapped to the new buffer without being mapped to the other buffers in the old buffer collection. (This translation information has a negative value when the buffer size changes when the consistency hash (consistent hashing) tries to protect the allocated content from being remapped to the new buffer. )

The change in the hash result means that when the buffer space changes, all mappings need to be updated within the system. In the peer-to-peer system, the change in buffering is equivalent to the peer joining or exiting system, which occurs frequently in the peer system, resulting in great computational and transmission loads. Monotonicity is the requirement that the hashing algorithm be able to cope with this situation

3. Dispersion

In a distributed environment, the terminal may not see all of the buffers, but can only see part of it. When the terminal wants to map the content to a buffer through a hashing process, because the terminal sees inconsistent buffer ranges, resulting in inconsistencies in the results of the hash, the end result is to map the same content to different buffers, which is obviously something to avoid because it causes the same content to be stored in different buffers. Reduces the efficiency of system storage. The definition of dispersion is the severity of the above-mentioned situation. A good hashing algorithm should be able to avoid inconsistencies as far as possible, that is, to minimize dispersion.

4. Load

The load problem is actually looking at the dispersion problem from another perspective. Since different terminals may map the same content to different buffers, it is possible for a particular buffer to be mapped to different content by different users. As with dispersion, this situation should also be avoided, so a good hashing algorithm should be able to minimize the buffering load.

Consistent hash Algorithm implementation

1. Use of annular space

The data through a certain hash algorithm processing after mapping to the ring, but also the server through the hash algorithm processing after mapping to the same ring (drawing is too difficult, not dry ...) )

2. Data routed to a node

Data is buffered to the closest server based on the distance between the data and the server nodes, clockwise.

New node and delete node, consistent hashing algorithm while maintaining monotonicity, or data migration to achieve the smallest, such an algorithm is very suitable for distributed cluster, avoid a large number of data migration, reduce the pressure of the server. Package com.test;

Importjava.security.MessageDigest;Importjava.security.NoSuchAlgorithmException;ImportJava.util.SortedMap;ImportJava.util.TreeMap;/*** Created by Lin on 2017/10/17.*/ Public classConsistenthash {//Server List    Private StaticString [] listservers = {"192.168.0.0", "192.168.0.1", "192.168.0.2", "192.168.0.3", "192.168.0.4"}; //key is the server hash value, value is the server address    Private StaticSortedmap<integer,string> sortemapservers =NewTreemap<>();  Public Static voidMain (string[] args) {init ();
Add Delete node action can be manipulated manually see the result String [] keys= {"Test1", "Test2", "Test3", "test4"}; for(inti = 0; i < keys.length; i++) {System.out.println ("[" +keys[i]+ "] hash value:" +gethash (Keys[i]) + ", is routed to the node [" +getserver (Keys[i]) + "]"); } } //program initialization, add all services to Sortemap Public Static voidinit () { for(inti = 0; I < listservers.length;i++){ inthash =Gethash (Listservers[i]); System.out.println ("[" +listservers[i]+ "] joins the collection with a hash value of" +hash); Sortemapservers.put (Hash,listservers[i]); } System.out.println ("------------------------------------------------------------------"); } //Get hash value Public Static intGethash (String key) {messagedigest MD5=NULL; if(MD5 = =NULL) { Try{MD5= Messagedigest.getinstance ("MD5"); } Catch(nosuchalgorithmexception e) {Throw NewIllegalStateException ("No MD5 algrithm found"); }} md5.reset (); Md5.update (Key.getbytes ()); byte[] Bkey =md5.digest (); //Specific hash function implementation details--per byte & 0xFF re-shift intresult = ((Bkey[3] & 0xFF) << 24) | ((Bkey[2] & 0xFF) << 16 | ((Bkey[1] & 0xFF) << 8) | (Bkey[0] & 0xFF)); returnMath.Abs (Result); } //the node that gets the route Public Staticstring Getserver (String str) {inthash =Gethash (str); //get all nodes that are larger than this hashSortedmap<integer,string> SubMap =Sortemapservers.tailmap (hash); if(!Submap.isempty ()) { //gets the first key of a node collection that is larger than the hash returnSortemapservers.get (Submap.firstkey ()); }Else{ //gets the first node of all node collections without being larger than the hash node returnSortemapservers.get (Sortemapservers.firstkey ()); } }}

Note that when a node is deleted, it needs to bear the stress of the node being deleted, so it lacks balance, so this does not satisfy the equilibrium of the consistent hashing algorithm.

The introduction of virtual nodes to maintain the consistency of the hash algorithm (copy the others, it's certainly more than I write, the code is actually similar to the above)

This introduces the concept of a "virtual node", which is a memcached node and is ordered from small to large. In this case, one is broken and only the next one is responsible for the task. Imagine, if I have more than one server "virtual node", "Virtual node" calculated hash is also large and small; number 1th server compute a number of "virtual node", a bit more than the "virtual node" 2nd server, and some smaller than it. In this case, each server calculates a "virtual node", and then all the "virtual node" from small to large sort into a cluster circle will be what? In this way, each server's "virtual node" will be distributed evenly across the cluster, the server's "virtual node" on the right is not only the node of the server number 2nd, it can also be 3rd, 4th, 5th or other server nodes. Assuming there are 3 memcached servers (assuming a,b,c), each with 10 virtual nodes, then their cluster arrangement can be like this (anyway, it's a mess) c-5,c-0,c-8,b-3,c-4,c-1,c-6,c-2,c-9,b-4,a-5, A-3,a-6,c-3,b-8,a-2,a-1,a-9,a-4,a-7,b-0,b-7,b-2,a-8,b-9,b-5,c-7,a-0,b-1,b-6 use "Virtual node", when the key points, the operation is still unchanged, our "right rule" Take a node, but this time is a "virtual node", we also need to search through this "virtual node" anyway its real server information, and then you can add key or get key benefits: Use "virtual node" case, when a server is broken, cache is not hit or 1/(n-1), But the task of this broken server is evenly distributed to other servers (because the right side of its virtual node may be the virtual node of any other server).

Understanding the consistent hashing algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.