ArticleDirectory
Http://martinbroadhurst.com/Consistent-Hash-Ring.htmlConsistent hash ringintroduction
Consistent hashing was first described in a paper,Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web (1997)By David kargerEt al. It is used in distributed storage systems like Amazon dynamo, memcached, project Voldemort and Riak.
The Problem
Consistent hashing is a very simple solution to a common problem: how can you find a server in a distributed system to store or retrieve a value identified by a key, while at the same time being able to conflict with server failures and network partitions?
Simply finding a server for value is easy; just number your setSServers from 0S-1. When you want to store or retrieve a value, hash the value's key moduloS, And that gives you the server.
The problem comes when servers fail or become unreachable through a network partition. at that point, the servers no longer fill the hash space, so the only option is to invalidate the caches on all servers, renumber them, and start again. given that, in a system with hundreds or thousands of servers, failures are commonplace, this solution is not feasible.
The solution
in consistent hashing, the servers, as well as the keys, are hashed, and it is by this hash that they are looked up. the hash space is large, and is treated as if it wraps around to form a circle-hence hash ring . the process of creating a hash for each server is equivalent to placing it at a point on the circumference of this circle. when a key needs to be looked up, It is hashed, which again corresponds to a point on the circle. in order to find its server, one then simply moves round the circle clockwise from this point until the next server is found. if no server is found from that point to end of the hash space, the first server is used-this is the "wrapping round" that makes the hash space circular.
The only remaining problem is that in practice hashing algorithms are likely to result in clusters of servers on the ring (or, to be more precise, some servers with a disproportionately large space before them ), and this will result in greater load on the first server in the cluster and less on the remainder. this can be ameliorated by adding each server to the ring a number of times in different places. this is achieved by havingReplica count, Which applies to all servers in the ring, and when adding a server, looping from 0 to the Count-1, and hashing a string made from both the server and the loop variable to produce the position. this has the effect of distributing the servers more evenly over the ring. note that this has nothing to doServerReplication; each of the replicas represents the same physical server, and replication of data between servers is an entirely unrelated issue.
Implementation
I 've written an example implementation of consistent hashing in C ++. As you can imagine from the description above, It isn' t terribly complicated. Here is the main class:
Template <
Class Node,
Class Data,
Class Hash = hash_namespace: Hash <
Const
Char *>
Class Hashring
{
Public :
Typedef STD: Map <
Size_t , Node> nodemap;
Hashring (Unsigned IntReplicas)
: Replicas _ (replicas), hash _ (hash_namespace: Hash <Const Char*> ())
{
}
Hashring (Unsigned IntReplicas,ConstHash & hash)
: Replicas _ (replicas), hash _ (hash)
{
}
Size_tAddnode (ConstNode & node );
VoidRemovenode (ConstNode & node );
ConstNode & getnode (ConstData & Data)Const;
Private:
Nodemap ring _;
Const Unsigned IntReplicas _;
Hash hash _;
};
template class node, class data, class hash>
size_t hashring :: addnode ( const node & node)
{< br> size_t hash;
STD :: string nodestr = stringify (node);
for ( unsigned int r = 0; r hash = hash _ (nodestr + stringify (r )). c_str ();
ring _ [hash] = node;
}< br> return hash;
}
Template<ClassNode,ClassData,ClassHash>
VoidHashring <node, Data, hash>: removenode (ConstNode & node)
{
STD: String nodestr = stringify (node );
For(Unsigned IntR = 0; r <replicas _; r ++ ){
Size_tHash = hash _ (nodestr + stringify (R). c_str ());
Ring _. Erase (hash );
}
}
Template < Class Node, Class Data, Class Hash>
Const Node & hashring <node, Data, hash>: getnode ( Const Data & Data) Const
{
If (Ring _. Empty ()){
Throw Emptyringexception ();
}
Size_t Hash = hash _ (stringify (data). c_str ());
Typename Nodemap: const_iterator it;
// Look for the first node> = hash
It = ring _. lower_bound (hash );
If (It = ring _. End ()){
// Wrapped around; get the first node
It = ring _. Begin ();
}
Return It-> second;
}
A few points to note:
- The default hash function is
Hash
From <map>.
In practice you probably don't want to use this. Something like MD5 wowould probably be best.
- I had to define
Hash_namespace
Because G ++ puts the non-standardHash
In a different namespace than that which other compilers do.
Hopefully this will all be resolved with the widespread AvailablitySTD: unordered_map
.
- The
Node
AndData
Types need to haveOperator <
Defined forSTD: ostream
.
This is because I write them toOstringstream
In order to "stringify" them before getting the hash.
I 've also written an example program that simulates using a cluster of cache servers to store and retrieve some data.
Source code
You can browse the source code and example program here:
- Consistent. h
- Hashring_example.cpp
Here is a compressed tar archive containing the source code, example program and makefile: