Consistent hash [translation] consistent hash by Tom White

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Http://ptsolmyr.com/2010/07/30/consistent_hash_by_tom_white

Java example recommended for http://sandaobusi.iteye.com/blog/964368

Http://martinbroadhurst.com/Consistent-Hash-Ring.html recommendation C ++ implementation

Http://www.yeeach.com/2009/10/02/consistent-hashing%E7% AE %97%E6%B3%95/

Tom White is the author of ArticleI am very miserable in English and Chinese, and I can make a lot of bricks in some improper places.

Link: http://www.lexemetech.com/2007/11/consistent-hashing.html

------------------

I recently studied consistent hash. Its paper (Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web by David karger et al) appeared 10 years ago, however, more and more services have quietly started to use consistent hash until recently. These services include Amazon's dynamo and memcached (to last. FM salute ). So what is consistent hash? Why do you need to pay attention to it?

The requirement of consistent hash comes from some restrictions encountered when running a cache cluster (such as Web Cache. If you have a cluster composed of N cache machines, the most common load balance mode is to place the incoming object o on the server numbered Hash (o) mod n. You will find this solution beautiful, until one day, you have to add or remove some cache machines for various reasons. At this time, the number of machines in the cluster has changed, each object is hashed to a new machine. This will be a disaster, because the server that actually stores the content will be dragged down by requests from the cache cluster. At this time, the entire system looks like there is no cache. This is why we care about consistent hash, because we need to use it to avoid system corruption.

If so, when a cache machine is added to the cluster, the machine only reads the expected objects from other cache machines, when a cache machine is removed from the cluster, it is best to allocate the objects it caches to other cache machines (without moving more data ). This ideal situation is what consistent hash pursues and implements: if possible, always allocate the same group of objects to the same machine.

Consistent hashAlgorithmThe most basic idea behind it is to use the same hash function for the object and cache machine. The advantage of this operation is that the cache machine can be mapped to an interval, which contains the hash value of a certain number of objects. If a cache machine is removed, the interval mapped to it is managed by a cache machine adjacent to it.

Description

Let's take a deeper look at consistent hash. Hash maps the object and cache to a value range. JavaProgramYou should be familiar with hash-The hashcode method of each object returns an int-type Integer in [-231,231-1. We map the beginning and end of the value range to a ring. Describes a group of objects (1, 2, 3, 4) and a group of cache (a, B, c) mapped to the hash ring respectively. (Image Source: Web caching with consistent hashing by David kargerEt al)

Figure 1

To determine the cache where an object will be cached, we start clockwise from this object and know that we encounter a cache point. In this case, we can see that object 1 and 4 are cache A, object 2 is cache B, and cache C caches object 3. When cache C is removed, what will happen? In this case, object 3 is cached by cache A, and no other objects need to be moved. If 2 and D are added to the cache cluster, d caches object 3 and 4 and leaves object 1 to.

Figure 2

Everything is fine, except that the spacing assigned to each cache is too random, so that the object allocation is extremely uneven. To solve this problem, we introduce the concept of "virtual nodes", that is, each cache has multiple copies on the hash ring, that is, every time we add a cache, multiple points will be added for the cache on the ring.

Under meCodeA simulation experiment is conducted to store 10,000 objects to 10 caches. You will see the impact of virtual nodes in the plot diagram below. The X axis is the number of copies of each cache (logarithm scale ). When the value of X is small, we can see that the distribution of objects in caches is unbalanced (the Y axis represents the standard deviation of objects distribution in caches as a percentage ). As the cache's replica increases, the distribution of objects tends to be more balanced. This experiment shows that the replica of each cache is about 5%-10%, which can balance the distribution of objects (standard deviation is between and)

Experiment result

Implementation

The following is a simple implementation of Java. To make the consistent hash effect obvious, it is very important to use a mix hash function. Most implementations of the hashcode method of objects in Java do not provide good mix performance. Therefore, we provide a hashfunction interface to facilitate custom hash functions. We recommend MD5.

Import Java. util. collection;
Import Java. util. sortedmap;
Import Java. util. treemap;

Public Class Consistenthash {

Private Final Hashfunction;
Private Final Int Numberofreplicas;
Private Final Sortedmap circle =
New Treemap ();

Public Consistenthash (hashfunction,
Int Numberofreplicas, collection nodes ){

This . Hashfunction = Hashfunction;
This . Numberofreplicas = Numberofreplicas;

For (T node: nodes ){
Add (node );
}
}

Public Void Add (T node ){
For ( Int I = 0 ; I < Numberofreplicas; I ++ ){
Circle. Put (hashfunction. Hash (node. tostring () + I ),
Node );
}
}

Public Void Remove (T node ){
For ( Int I = 0 ; I < Numberofreplicas; I ++ ){
Circle. Remove (hashfunction. Hash (node. tostring () + I ));
}
}

Public T get (Object key ){
If (Circle. isempty ()){
Return Null ;
}
Int Hash = Hashfunction. Hash (key );
If ( ! Circle. containskey (hash )){
Sortedmap tailmap =
Circle. tailmap (hash );
Hash = Tailmap. isempty () ?
Circle. firstkey (): tailmap. firstkey ();
}
Return Circle. Get (hash );
}

}

The above Code uses an integer sorted map to represent hash circle. WhenConsistenthashWhen a node is created, it is added to the Circle map (NumberofreplicasControl ). The position of each replica is determined by the node name plus the hash value corresponding to a digital suffix.
Find the node (GetMethod), we put the object's hash value into the map for search. In most cases, a node does not exactly overlap with this object (even if each node has a certain number of replica, the hash value space is much larger than the number of nodes ).TailmapFind the next key in the map. If the tail map is empty, we turn around and find the first key in the circle.

Use

So how should you use the consistent hash? Generally, you can use some libraries instead of writing code yourself. For example, the memcached-distributed memory cache system mentioned above already has a client that supports consisitent hash. Ketama, implemented by Richard Jones of last. FM, is the first Java implementation contributed by Dustin sallings. It is interesting that only the client needs to implement the consisitent hash algorithm, and the server code does not need to be changed. Other systems that use consisitent hash include Chord, a distributed Hash Table Implementation, Amazon dynamo, and a key-value storage system. (No open source)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Consistent hash [translation] consistent hash by Tom White

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Consistent hash [translation] consistent hash by Tom White

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support