Consistent hash [translation] consistent hash by Tom White

Source: Internet
Author: User

Http://ptsolmyr.com/2010/07/30/consistent_hash_by_tom_white

Java example recommended for http://sandaobusi.iteye.com/blog/964368

Http://martinbroadhurst.com/Consistent-Hash-Ring.html recommendation C ++ implementation

Http://www.yeeach.com/2009/10/02/consistent-hashing%E7% AE %97%E6%B3%95/

 

 

Tom White is the author of ArticleI am very miserable in English and Chinese, and I can make a lot of bricks in some improper places.

Link: http://www.lexemetech.com/2007/11/consistent-hashing.html

------------------

I recently studied consistent hash. Its paper (Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web by David karger et al) appeared 10 years ago, however, more and more services have quietly started to use consistent hash until recently. These services include Amazon's dynamo and memcached (to last. FM salute ). So what is consistent hash? Why do you need to pay attention to it?

The requirement of consistent hash comes from some restrictions encountered when running a cache cluster (such as Web Cache. If you have a cluster composed of N cache machines, the most common load balance mode is to place the incoming object o on the server numbered Hash (o) mod n. You will find this solution beautiful, until one day, you have to add or remove some cache machines for various reasons. At this time, the number of machines in the cluster has changed, each object is hashed to a new machine. This will be a disaster, because the server that actually stores the content will be dragged down by requests from the cache cluster. At this time, the entire system looks like there is no cache. This is why we care about consistent hash, because we need to use it to avoid system corruption.

If so, when a cache machine is added to the cluster, the machine only reads the expected objects from other cache machines, when a cache machine is removed from the cluster, it is best to allocate the objects it caches to other cache machines (without moving more data ). This ideal situation is what consistent hash pursues and implements: if possible, always allocate the same group of objects to the same machine.

Consistent hashAlgorithmThe most basic idea behind it is to use the same hash function for the object and cache machine. The advantage of this operation is that the cache machine can be mapped to an interval, which contains the hash value of a certain number of objects. If a cache machine is removed, the interval mapped to it is managed by a cache machine adjacent to it.

Description

Let's take a deeper look at consistent hash. Hash maps the object and cache to a value range. JavaProgramYou should be familiar with hash-The hashcode method of each object returns an int-type Integer in [-231,231-1. We map the beginning and end of the value range to a ring. Describes a group of objects (1, 2, 3, 4) and a group of cache (a, B, c) mapped to the hash ring respectively. (Image Source: Web caching with consistent hashing by David kargerEt al)

Figure 1

To determine the cache where an object will be cached, we start clockwise from this object and know that we encounter a cache point. In this case, we can see that object 1 and 4 are cache A, object 2 is cache B, and cache C caches object 3. When cache C is removed, what will happen? In this case, object 3 is cached by cache A, and no other objects need to be moved. If 2 and D are added to the cache cluster, d caches object 3 and 4 and leaves object 1 to.

Figure 2

Everything is fine, except that the spacing assigned to each cache is too random, so that the object allocation is extremely uneven. To solve this problem, we introduce the concept of "virtual nodes", that is, each cache has multiple copies on the hash ring, that is, every time we add a cache, multiple points will be added for the cache on the ring.

Under meCodeA simulation experiment is conducted to store 10,000 objects to 10 caches. You will see the impact of virtual nodes in the plot diagram below. The X axis is the number of copies of each cache (logarithm scale ). When the value of X is small, we can see that the distribution of objects in caches is unbalanced (the Y axis represents the standard deviation of objects distribution in caches as a percentage ). As the cache's replica increases, the distribution of objects tends to be more balanced. This experiment shows that the replica of each cache is about 5%-10%, which can balance the distribution of objects (standard deviation is between and)

Experiment result

Implementation

The following is a simple implementation of Java. To make the consistent hash effect obvious, it is very important to use a mix hash function. Most implementations of the hashcode method of objects in Java do not provide good mix performance. Therefore, we provide a hashfunction interface to facilitate custom hash functions. We recommend MD5.

Import Java. util. collection;
Import Java. util. sortedmap;
Import Java. util. treemap;
 
Public   Class Consistenthash {
 
Private   Final Hashfunction;
Private   Final   Int Numberofreplicas;
Private   Final Sortedmap circle =
New Treemap ();
 
Public Consistenthash (hashfunction,
Int Numberofreplicas, collection nodes ){
 
This . Hashfunction = Hashfunction;
This . Numberofreplicas = Numberofreplicas;
 
For (T node: nodes ){
Add (node );
}
}
 
Public   Void Add (T node ){
For ( Int I =   0 ; I < Numberofreplicas; I ++ ){
Circle. Put (hashfunction. Hash (node. tostring () + I ),
Node );
}
}
 
Public   Void Remove (T node ){
For ( Int I =   0 ; I < Numberofreplicas; I ++ ){
Circle. Remove (hashfunction. Hash (node. tostring () + I ));
}
}
 
Public T get (Object key ){
If (Circle. isempty ()){
Return   Null ;
}
Int Hash = Hashfunction. Hash (key );
If ( ! Circle. containskey (hash )){
Sortedmap tailmap =
Circle. tailmap (hash );
Hash = Tailmap. isempty () ?
Circle. firstkey (): tailmap. firstkey ();
}
Return Circle. Get (hash );
}
 

The above Code uses an integer sorted map to represent hash circle. WhenConsistenthashWhen a node is created, it is added to the Circle map (NumberofreplicasControl ). The position of each replica is determined by the node name plus the hash value corresponding to a digital suffix.
Find the node (GetMethod), we put the object's hash value into the map for search. In most cases, a node does not exactly overlap with this object (even if each node has a certain number of replica, the hash value space is much larger than the number of nodes ).TailmapFind the next key in the map. If the tail map is empty, we turn around and find the first key in the circle.

Use

So how should you use the consistent hash? Generally, you can use some libraries instead of writing code yourself. For example, the memcached-distributed memory cache system mentioned above already has a client that supports consisitent hash. Ketama, implemented by Richard Jones of last. FM, is the first Java implementation contributed by Dustin sallings. It is interesting that only the client needs to implement the consisitent hash algorithm, and the server code does not need to be changed. Other systems that use consisitent hash include Chord, a distributed Hash Table Implementation, Amazon dynamo, and a key-value storage system. (No open source)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.