Consistent hash-C ++ implementation

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

ArticleDirectory

The Problem
The solution

Http://martinbroadhurst.com/Consistent-Hash-Ring.htmlConsistent hash ringintroduction

Consistent hashing was first described in a paper,Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web (1997)By David kargerEt al. It is used in distributed storage systems like Amazon dynamo, memcached, project Voldemort and Riak.

The Problem

Consistent hashing is a very simple solution to a common problem: how can you find a server in a distributed system to store or retrieve a value identified by a key, while at the same time being able to conflict with server failures and network partitions?

Simply finding a server for value is easy; just number your setSServers from 0S-1. When you want to store or retrieve a value, hash the value's key moduloS, And that gives you the server.

The problem comes when servers fail or become unreachable through a network partition. at that point, the servers no longer fill the hash space, so the only option is to invalidate the caches on all servers, renumber them, and start again. given that, in a system with hundreds or thousands of servers, failures are commonplace, this solution is not feasible.

The solution

in consistent hashing, the servers, as well as the keys, are hashed, and it is by this hash that they are looked up. the hash space is large, and is treated as if it wraps around to form a circle-hence hash ring . the process of creating a hash for each server is equivalent to placing it at a point on the circumference of this circle. when a key needs to be looked up, It is hashed, which again corresponds to a point on the circle. in order to find its server, one then simply moves round the circle clockwise from this point until the next server is found. if no server is found from that point to end of the hash space, the first server is used-this is the "wrapping round" that makes the hash space circular.

The only remaining problem is that in practice hashing algorithms are likely to result in clusters of servers on the ring (or, to be more precise, some servers with a disproportionately large space before them ), and this will result in greater load on the first server in the cluster and less on the remainder. this can be ameliorated by adding each server to the ring a number of times in different places. this is achieved by havingReplica count, Which applies to all servers in the ring, and when adding a server, looping from 0 to the Count-1, and hashing a string made from both the server and the loop variable to produce the position. this has the effect of distributing the servers more evenly over the ring. note that this has nothing to doServerReplication; each of the replicas represents the same physical server, and replication of data between servers is an entirely unrelated issue.

Implementation

I 've written an example implementation of consistent hashing in C ++. As you can imagine from the description above, It isn' t terribly complicated. Here is the main class:

Template < Class Node, Class Data, Class Hash = hash_namespace: Hash < Const Char *>
Class Hashring
{
Public :
Typedef STD: Map < Size_t , Node> nodemap;

Hashring (Unsigned IntReplicas)
: Replicas _ (replicas), hash _ (hash_namespace: Hash <Const Char*> ())
{
}

Hashring (Unsigned IntReplicas,ConstHash & hash)
: Replicas _ (replicas), hash _ (hash)
{
}

Size_tAddnode (ConstNode & node );
VoidRemovenode (ConstNode & node );
ConstNode & getnode (ConstData & Data)Const;

Private:
Nodemap ring _;
Const Unsigned IntReplicas _;
Hash hash _;
};

template class node, class data, class hash>
size_t hashring :: addnode ( const node & node)
{< br> size_t hash;
STD :: string nodestr = stringify (node);
for ( unsigned int r = 0; r hash = hash _ (nodestr + stringify (r )). c_str ();
ring _ [hash] = node;
}< br> return hash;
}

Template<ClassNode,ClassData,ClassHash>
VoidHashring <node, Data, hash>: removenode (ConstNode & node)
{
STD: String nodestr = stringify (node );
For(Unsigned IntR = 0; r <replicas _; r ++ ){
Size_tHash = hash _ (nodestr + stringify (R). c_str ());
Ring _. Erase (hash );
}
}

Template < Class Node, Class Data, Class Hash>
Const Node & hashring <node, Data, hash>: getnode ( Const Data & Data) Const
{
If (Ring _. Empty ()){
Throw Emptyringexception ();
}
Size_t Hash = hash _ (stringify (data). c_str ());
Typename Nodemap: const_iterator it;
// Look for the first node> = hash
It = ring _. lower_bound (hash );
If (It = ring _. End ()){
// Wrapped around; get the first node
It = ring _. Begin ();
}
Return It-> second;
}

A few points to note:

The default hash function isHashFrom <map>.
In practice you probably don't want to use this. Something like MD5 wowould probably be best.
I had to defineHash_namespaceBecause G ++ puts the non-standardHashIn a different namespace than that which other compilers do.
Hopefully this will all be resolved with the widespread AvailablitySTD: unordered_map.
TheNodeAndDataTypes need to haveOperator <Defined forSTD: ostream.
This is because I write them toOstringstreamIn order to "stringify" them before getting the hash.

I 've also written an example program that simulates using a cluster of cache servers to store and retrieve some data.

Source code

You can browse the source code and example program here:

Consistent. h
Hashring_example.cpp

Here is a compressed tar archive containing the source code, example program and makefile:

Consistent.tar.gz

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Consistent hash-C ++ implementation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Consistent hash-C ++ implementation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support