Consistent hashing algorithm

Source: Internet
Author: User

Since there is a consistent hash, there must be an inconsistent hash, why is it that no one usually says inconsistent hashing? Because the common hash is inconsistent, it is not decorated, to the consistency of the hash is special to add a description of the word decoration.

Hashing is generally a large number of the mold and then dispersed into different barrels, assuming we have only two barrels, there are 2, 3, 4, 54 numbers, then the result of the Die 2 barrels is:


At this time we think the bucket is too little to add a new bucket to the hash table expansion, when all the numbers need to determine the number of modules in which bucket, the result becomes:


You can see that the distribution of all the numbers has changed since the new bucket, which means that each expansion and contraction of the hash table will cause the recalculation of all the item distributions, which is unacceptable in some scenarios. For example, a distributed storage system, each bucket is equivalent to a machine, files distributed in which machine is determined by the hashing algorithm, the system wants to add a machine will need to stop waiting for all the files to be distributed again to provide services, and when a machine dropped off the line although only a portion of the data, But all data access routes will have problems. This way the entire service can not be smoothly expanded to become a stateful service.

To achieve a stateless, it is necessary to use a consistent hash, consistency hash hypothetical we have a lot of buckets, first set a small target such as 7, but the first real or only two barrels, numbering is 3 and 6. The hash algorithm is still the same modulo, but now the sub-barrel is likely to be a non-existent bucket, then go down to find the first real bucket to put in. Thus 2 and 3 were divided into buckets numbered 3, 4 and 5 were divided into buckets numbered 6.


Add a new bucket at this time, the number is 4, the Modulus method is unchanged or modulo 7:


Because the 3rd barrels are modulo less than or equal to 3, 4th barrels only need to take the number from the 6th barrels of its numbers can be, in this case only need to adjust the number of a bucket can be divided into redistribution. It can be imagined that even if there are 100 million barrels, increasing the reduction of a bucket will only affect the data distribution of a bucket.

This adds a machine that only needs to synchronize the data with the machine behind him to start working, and a machine needs to sync his data to the back of a machine before it goes offline. If a machine is suddenly dropped, it will only affect the data on this machine. The implementation allows each machine to synchronize the data of its own front machine so that it does not affect this part of the data service even if it is dropped.

There's a little problem here. If the barrel with the number 6 is offline, it doesn't have a bucket, what should I do with the data? In order to solve this problem, the implementation of the hash space is usually made into a ring, so that 3 is 6 of the next bucket, the data to 3 is good:


With a consistent hash can also be implemented in some of the distributed system without locking, each task has its own number, due to the certainty of the hashing algorithm, which bucket is also determined there is no scramble, there is no need for distributed locks.

Since the consistency hash has so many good features, why is the mainstream hash inconsistent? The main reason is that the search efficiency, the ordinary hash query hash calculation can find the corresponding bucket, the algorithm time complexity is O (1), and the consistency of the hash needs to be ordered to form a linked list of buckets, and then go all the way down, K bucket query time complexity is O (k), So the usual hash is still implemented in an inconsistent context.

Of course, the time complexity of O (k) is not tolerated for hashing, think of it is O (k) This magnitude of the meaning of the hash is where? Since it is in the order of the bucket query, it is natural that the idea is two points, can reduce the complexity of the time to O (LOGK), but the combination of buckets need to constantly increase or decrease, so is a chain of the realization of the list, two must not, fortunately can be used to jump table for a quick jump can also achieve O (LOGK) time


In this jump table, each bucket records the 1,2,4 distance of the number of the bucket, so no matter where the query falls on which node, the entire Hashi of the query can at least skip half of the query space, so that recursion can quickly locate the data is the existence of which bucket.

This is, of course, just one of the consistent hashing implementations, and there are many variants of the implementation. For example, choose the number of the bucket, the above introduction is to choose to follow the numbers down the first bucket, in fact, you can also choose the nearest bucket distance, so that the implementation and the following jump table rules will also change. The same jump table also has a number of different algorithm implementations, interested can go to see can,chord,tapestry,pastry these four kinds of DHT implementation, it is interesting that they are issued in 2001 paper, so 2001 years is probably the first year of peer download it.

Consistent hashing algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.