Consistent hash-C ++ implementation

Source: Internet
Author: User
ArticleDirectory
    • The Problem
    • The solution
Http://martinbroadhurst.com/Consistent-Hash-Ring.htmlConsistent hash ringintroduction

Consistent hashing was first described in a paper,Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web (1997)By David kargerEt al. It is used in distributed storage systems like Amazon dynamo, memcached, project Voldemort and Riak.

The Problem

Consistent hashing is a very simple solution to a common problem: how can you find a server in a distributed system to store or retrieve a value identified by a key, while at the same time being able to conflict with server failures and network partitions?

Simply finding a server for value is easy; just number your setSServers from 0S-1. When you want to store or retrieve a value, hash the value's key moduloS, And that gives you the server.

The problem comes when servers fail or become unreachable through a network partition. at that point, the servers no longer fill the hash space, so the only option is to invalidate the caches on all servers, renumber them, and start again. given that, in a system with hundreds or thousands of servers, failures are commonplace, this solution is not feasible.

The solution

in consistent hashing, the servers, as well as the keys, are hashed, and it is by this hash that they are looked up. the hash space is large, and is treated as if it wraps around to form a circle-hence hash ring . the process of creating a hash for each server is equivalent to placing it at a point on the circumference of this circle. when a key needs to be looked up, It is hashed, which again corresponds to a point on the circle. in order to find its server, one then simply moves round the circle clockwise from this point until the next server is found. if no server is found from that point to end of the hash space, the first server is used-this is the "wrapping round" that makes the hash space circular.

The only remaining problem is that in practice hashing algorithms are likely to result in clusters of servers on the ring (or, to be more precise, some servers with a disproportionately large space before them ), and this will result in greater load on the first server in the cluster and less on the remainder. this can be ameliorated by adding each server to the ring a number of times in different places. this is achieved by havingReplica count, Which applies to all servers in the ring, and when adding a server, looping from 0 to the Count-1, and hashing a string made from both the server and the loop variable to produce the position. this has the effect of distributing the servers more evenly over the ring. note that this has nothing to doServerReplication; each of the replicas represents the same physical server, and replication of data between servers is an entirely unrelated issue.

Implementation

I 've written an example implementation of consistent hashing in C ++. As you can imagine from the description above, It isn' t terribly complicated. Here is the main class:

Template < Class Node, Class Data, Class Hash = hash_namespace: Hash < Const Char *>
Class Hashring
{
Public :
Typedef STD: Map < Size_t , Node> nodemap;

Hashring (Unsigned IntReplicas)
: Replicas _ (replicas), hash _ (hash_namespace: Hash <Const Char*> ())
{
}

Hashring (Unsigned IntReplicas,ConstHash & hash)
: Replicas _ (replicas), hash _ (hash)
{
}

Size_tAddnode (ConstNode & node );
VoidRemovenode (ConstNode & node );
ConstNode & getnode (ConstData & Data)Const;

Private:
Nodemap ring _;
Const Unsigned IntReplicas _;
Hash hash _;
};

template class node, class data, class hash>
size_t hashring :: addnode ( const node & node)
{< br> size_t hash;
STD :: string nodestr = stringify (node);
for ( unsigned int r = 0; r hash = hash _ (nodestr + stringify (r )). c_str ();
ring _ [hash] = node;
}< br> return hash;
}

Template<ClassNode,ClassData,ClassHash>
VoidHashring <node, Data, hash>: removenode (ConstNode & node)
{
STD: String nodestr = stringify (node );
For(Unsigned IntR = 0; r <replicas _; r ++ ){
Size_tHash = hash _ (nodestr + stringify (R). c_str ());
Ring _. Erase (hash );
}
}

Template < Class Node, Class Data, Class Hash>
Const Node & hashring <node, Data, hash>: getnode ( Const Data & Data) Const
{
If (Ring _. Empty ()){
Throw Emptyringexception ();
}
Size_t Hash = hash _ (stringify (data). c_str ());
Typename Nodemap: const_iterator it;
// Look for the first node> = hash
It = ring _. lower_bound (hash );
If (It = ring _. End ()){
// Wrapped around; get the first node
It = ring _. Begin ();
}
Return It-> second;
}

A few points to note:

    • The default hash function isHashFrom <map>.
      In practice you probably don't want to use this. Something like MD5 wowould probably be best.
    • I had to defineHash_namespaceBecause G ++ puts the non-standardHashIn a different namespace than that which other compilers do.
      Hopefully this will all be resolved with the widespread AvailablitySTD: unordered_map.
    • TheNodeAndDataTypes need to haveOperator <Defined forSTD: ostream.
      This is because I write them toOstringstreamIn order to "stringify" them before getting the hash.

I 've also written an example program that simulates using a cluster of cache servers to store and retrieve some data.

Source code

You can browse the source code and example program here:

    • Consistent. h
    • Hashring_example.cpp

Here is a compressed tar archive containing the source code, example program and makefile:

    • Consistent.tar.gz

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.