Http://ptsolmyr.com/2010/07/30/consistent_hash_by_tom_white
Java example recommended for http://sandaobusi.iteye.com/blog/964368
Http://martinbroadhurst.com/Consistent-Hash-Ring.html recommendation C ++ implementation
Http://www.yeeach.com/2009/10/02/consistent-hashing%E7% AE %97%E6%B3%95/
Tom White is the author of ArticleI am very miserable in English and Chinese, and I can make a lot of bricks in some improper places.
Link: http://www.lexemetech.com/2007/11/consistent-hashing.html
------------------
I recently studied consistent hash. Its paper (Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web by David karger et al) appeared 10 years ago, however, more and more services have quietly started to use consistent hash until recently. These services include Amazon's dynamo and memcached (to last. FM salute ). So what is consistent hash? Why do you need to pay attention to it?
The requirement of consistent hash comes from some restrictions encountered when running a cache cluster (such as Web Cache. If you have a cluster composed of N cache machines, the most common load balance mode is to place the incoming object o on the server numbered Hash (o) mod n. You will find this solution beautiful, until one day, you have to add or remove some cache machines for various reasons. At this time, the number of machines in the cluster has changed, each object is hashed to a new machine. This will be a disaster, because the server that actually stores the content will be dragged down by requests from the cache cluster. At this time, the entire system looks like there is no cache. This is why we care about consistent hash, because we need to use it to avoid system corruption.
If so, when a cache machine is added to the cluster, the machine only reads the expected objects from other cache machines, when a cache machine is removed from the cluster, it is best to allocate the objects it caches to other cache machines (without moving more data ). This ideal situation is what consistent hash pursues and implements: if possible, always allocate the same group of objects to the same machine.
Consistent hashAlgorithmThe most basic idea behind it is to use the same hash function for the object and cache machine. The advantage of this operation is that the cache machine can be mapped to an interval, which contains the hash value of a certain number of objects. If a cache machine is removed, the interval mapped to it is managed by a cache machine adjacent to it.
Description
Let's take a deeper look at consistent hash. Hash maps the object and cache to a value range. JavaProgramYou should be familiar with hash-The hashcode method of each object returns an int-type Integer in [-231,231-1. We map the beginning and end of the value range to a ring. Describes a group of objects (1, 2, 3, 4) and a group of cache (a, B, c) mapped to the hash ring respectively. (Image Source: Web caching with consistent hashing by David kargerEt al)
Figure 1
To determine the cache where an object will be cached, we start clockwise from this object and know that we encounter a cache point. In this case, we can see that object 1 and 4 are cache A, object 2 is cache B, and cache C caches object 3. When cache C is removed, what will happen? In this case, object 3 is cached by cache A, and no other objects need to be moved. If 2 and D are added to the cache cluster, d caches object 3 and 4 and leaves object 1 to.
Figure 2
Everything is fine, except that the spacing assigned to each cache is too random, so that the object allocation is extremely uneven. To solve this problem, we introduce the concept of "virtual nodes", that is, each cache has multiple copies on the hash ring, that is, every time we add a cache, multiple points will be added for the cache on the ring.
Under meCodeA simulation experiment is conducted to store 10,000 objects to 10 caches. You will see the impact of virtual nodes in the plot diagram below. The X axis is the number of copies of each cache (logarithm scale ). When the value of X is small, we can see that the distribution of objects in caches is unbalanced (the Y axis represents the standard deviation of objects distribution in caches as a percentage ). As the cache's replica increases, the distribution of objects tends to be more balanced. This experiment shows that the replica of each cache is about 5%-10%, which can balance the distribution of objects (standard deviation is between and)
Experiment result
Implementation
The following is a simple implementation of Java. To make the consistent hash effect obvious, it is very important to use a mix hash function. Most implementations of the hashcode method of objects in Java do not provide good mix performance. Therefore, we provide a hashfunction interface to facilitate custom hash functions. We recommend MD5.
Import Java. util. collection;
Import Java. util. sortedmap;
Import Java. util. treemap;
Public Class Consistenthash {
Private Final Hashfunction;
Private Final Int Numberofreplicas;
Private Final Sortedmap circle =
New Treemap ();
Public Consistenthash (hashfunction,
Int Numberofreplicas, collection nodes ){
This . Hashfunction = Hashfunction;
This . Numberofreplicas = Numberofreplicas;
For (T node: nodes ){
Add (node );
}
}
Public Void Add (T node ){
For ( Int I = 0 ; I < Numberofreplicas; I ++ ){
Circle. Put (hashfunction. Hash (node. tostring () + I ),
Node );
}
}
Public Void Remove (T node ){
For ( Int I = 0 ; I < Numberofreplicas; I ++ ){
Circle. Remove (hashfunction. Hash (node. tostring () + I ));
}
}
Public T get (Object key ){
If (Circle. isempty ()){
Return Null ;
}
Int Hash = Hashfunction. Hash (key );
If ( ! Circle. containskey (hash )){
Sortedmap tailmap =
Circle. tailmap (hash );
Hash = Tailmap. isempty () ?
Circle. firstkey (): tailmap. firstkey ();
}
Return Circle. Get (hash );
}
}
The above Code uses an integer sorted map to represent hash circle. WhenConsistenthash
When a node is created, it is added to the Circle map (Numberofreplicas
Control ). The position of each replica is determined by the node name plus the hash value corresponding to a digital suffix.
Find the node (Get
Method), we put the object's hash value into the map for search. In most cases, a node does not exactly overlap with this object (even if each node has a certain number of replica, the hash value space is much larger than the number of nodes ).Tailmap
Find the next key in the map. If the tail map is empty, we turn around and find the first key in the circle.
Use
So how should you use the consistent hash? Generally, you can use some libraries instead of writing code yourself. For example, the memcached-distributed memory cache system mentioned above already has a client that supports consisitent hash. Ketama, implemented by Richard Jones of last. FM, is the first Java implementation contributed by Dustin sallings. It is interesting that only the client needs to implement the consisitent hash algorithm, and the server code does not need to be changed. Other systems that use consisitent hash include Chord, a distributed Hash Table Implementation, Amazon dynamo, and a key-value storage system. (No open source)