Implementation of C + + language based on consistent hash algorithm _c language

Source: Internet
Author: User
Tags hash md5 rand stdin strcmp

The consistency hash algorithm has two key problems to be solved, one is the choice of data structure for node storage and lookup, and the other is the choice of node hash algorithm.

First, let's talk about the data structure used to store nodes in the consistency hash algorithm. By understanding the principle of a consistent hash, we know that nodes can be imagined as being stored on a circular data structure (pictured below), where nodes A, B, C, and D are ordered in a circular distribution, that is, nodes can be stored in an ordered queue by hash value. As shown in the following illustration, when a hash value is -2^20 for a request point p to find a routing node, the hash algorithm is routed clockwise from the hash value to the first node (B), which is equivalent to the ordered structure of the storage node. Finds the node that is greater than the smallest of the key value by the key value of the query. Therefore, we should choose a data structure, it should efficiently support the node frequently additions and deletions, also must have the ideal query efficiency. So the red and black trees can meet these requirements. A red-black tree is an approximately balanced binary lookup tree, because the worst-case time for actions such as inserting, deleting, and finding a value is proportional to the height of the tree, the theoretical upper limit on the height allows the red-black tree to be efficient in the worst case, unlike the normal two-fork lookup tree. Therefore, we choose to use the red-black tree as the storage structure of the node, in addition to the need to achieve the basic red-black tree Insert, delete, look for basic functions, we should also add another query lookup function, to find more than the smallest node in the key.

Next, let's choose the hash algorithm. The consistent hash algorithm was first proposed to solve the problem of load balancing. Each entity node contains many virtual nodes, and virtual nodes are the key to load balancing. We want the virtual nodes to be evenly dispersed across the "ring", this can not only load to a different hash value of the routing request, but also when a node down, the original route to the down node of the request can be more evenly routed to other nodes without a node caused by a large number of load requests. Here, we choose to use the MD5 algorithm. Through the MD5 algorithm, a labeled string (used to mark a virtual node) can be converted to a 16-byte character array, and then the array is processed to obtain a shape-shaping hash value. Because MD5 has a high degree of dispersion, the resulting hash value will also have a large dispersion, will be evenly scattered to the "ring".

The author uses C + + language to implement the consistent hash algorithm, I will describe the following key details.

1, first defines the entity node class, the Virtual node class. An entity node corresponds to multiple virtual nodes.

Entity Node cnode_s:

Copy Code code as follows:

/* Entity node * *.
Class cnode_s
{
Public
/* Constructor/*
Cnode_s ();
cnode_s (char * piden, int pvnodecount, void * pData);

/* Get the node Mark * *
const char * Getiden ();

/* Get the Virtual node number of the entity node.
int Getvnodecount ();

/* Set entity node data value * *
void SetData (void * data);

/* Get the Entity node data value * *
void * GetData ();
Private
void setcnode_s (char * piden, int pvnodecount, void * pData);
Char iden[100];/* node Mark String * *
int vnodecount; /* Virtual Node number * *
void * data;/* Data node * *
};

Virtual node cvirtualnode_s: A virtual node has a pointer to an entity node.
Copy Code code as follows:

/* Virtual Node * *
Class cvirtualnode_s
{
Public
/* Constructor/*
Cvirtualnode_s ();
cvirtualnode_s (cnode_s * pnode);

/* Set the entity node that the virtual node points to.
void setnode_s (cnode_s * pnode);

* * Get the entity node that the virtual node points to.
cnode_s * getnode_s ();

/* Set the virtual node hash value * *
void Sethash (long phash);

/* Get virtual node hash value * *
Long Gethash ();
Private
Long Hash; /*hash Value * *
cnode_s * node; /* Virtual node point to the Entity node * *
};


2, the hash algorithm has the selectivity, defines a hash algorithm interface, facilitates later carries on other algorithm's expansion.

This creates the Md5hash class, inherits the interface, and uses the MD5 algorithm to find the hash value.

Class Diagram:

Chashfun Interface:

Copy Code code as follows:

/* Defines the hash function class interface for calculating the hash value of the node * *.

Class Chashfun
{
Public
Virtual long gethashval (const char *) = 0;
};

The Cmd5hashfun class inherits the Chashfun interface and implements the Gethashval function to get the hash value:
Copy Code code as follows:

* * Using the MD5 algorithm to compute the hash value of the node, inherit the Chashfun parent class * *.
Class Cmd5hashfun:public Chashfun
{
Public
Virtual long gethashval (const char *);
};

Long Cmd5hashfun::gethashval (const char * instr)
{
int i;
Long hash = 0;
unsigned char digest[16];

/* Call MD5 correlation function, generate InStr MD5 code, deposit digest*/
md5_state_t md5state;
Md5_init (&md5state);
Md5_append (&md5state, (const unsigned char *) instr, strlen (instr));
Md5_finish (&md5state, Digest);

/* Every four bytes constitute a 32-bit integer,
Add four 32-bit integers to get the hash value of the InStr (possibly overflow) * *
for (i = 0; i < 4; i++)
{
hash = ((long) (digest[i*4 + 3]&0xff) << 24)
| ((long) (digest[i*4 + 2]&0xff) << 16)
| ((long) (digest[i*4 + 1]&0xff) << 8)
| ((long) (digest[i*4 + 0]&0xff));
}
return hash;
}

3. Expand the Lookup function in the red-black tree structure to find the smallest node in the red-black tree that is greater than the key value.
Copy Code code as follows:

util_rbtree_node_t* Util_rbtree_lookup (util_rbtree_t *rbtree, long Key)
{
if ((Rbtree!= NULL) &&!util_rbtree_isempty (Rbtree))
{
util_rbtree_node_t *node = NULL;
util_rbtree_node_t *temp = rbtree->root;
util_rbtree_node_t *null = _null (Rbtree);
while (temp!= null)
{
if (key <= Temp->key)
{
node = temp; /* Update Node * *
temp = temp->left;
}
else if (Key > Temp->key)
{
temp = temp->right;
}
}
/* If Node==null return the minimum node * *
Return (node!= NULL) node:util_rbtree_min (Rbtree));
}
return NULL;
}

4, create a consistent hash class. That has the ability to insert, delete, and find entity nodes.

The specific algorithms and procedures are described in the code comments.

Copy Code code as follows:

Class Cconhash
{
Public
/* Constructor/*
Cconhash (Chashfun * pfunc);

/* Set Hash function * *
void Setfunc (Chashfun * pfunc);

/* Increase entity node, 0 represents success,-1 represents failure * *
int addnode_s (cnode_s * pnode);

/* Delete Entity node, 0 represents success,-1 represents failure * *
int delnode_s (cnode_s * pnode);

/* Find Entity Node * *
cnode_s * LOOKUPNODE_S (const char * object);

/* Get the number of virtual nodes for a consistent hash structure * *
int Getvnodes ();
Private
/*hash function * *
Chashfun * func;
/* Virtual Node Total number * *
int vnodes;
/* Storage Virtual node of the red and black tree * *
util_rbtree_t * VNODE_TREE;
};
/* Auxiliary function, virtual node converted to red-black tree node * *
util_rbtree_node_t * Vnode2rbnode (cvirtualnode_s * vnode);


Cconhash::cconhash (Chashfun * pfunc)
{
/* Set Hash function * *
ASSERT (Pfunc!=null);
This->func = Pfunc;
this->vnodes = 0;
/* Initialize Red-black tree * *
Vnode_tree = new util_rbtree_s ();
Util_rbtree_init (Vnode_tree);
}

int cconhash::addnode_s (cnode_s * pnode)
{
if (pnode==null) return-1;
int vcount = Pnode->getvnodecount ();
if (vcount<=0) return-1;
cvirtualnode_s * VIRTUALNODE;
util_rbtree_node_t * RBNODE;
Char str [100];
Char num[10];
strcpy (Str,pnode->getiden ());
Long hash = 0;
/* Generate virtual node and insert into the red and black tree.
for (int i=0;i<vcount;i++)
{
Virtualnode = new cvirtualnode_s (pnode);
* * Using the str+ "I" method to produce a different iden string for the subsequent hash value calculation * *
Itoa (i,num,10);
strcat (Str,num);
hash = Func->gethashval (str);
Virtualnode->sethash (hash);
if (!util_rbtree_search (Vnode_tree,hash))
{
/* Generate red and Black tree node * *
Rbnode = Vnode2rbnode (Virtualnode);
if (rbnode!=null)
{
/* Insert the node into the red and black tree.
Util_rbtree_insert (Vnode_tree,rbnode);
this->vnodes++;
}
}
}
return 0;
}

int Cconhash::d elnode_s (cnode_s * pnode)
{
if (pnode==null) return-1;
util_rbtree_node_t * RBNODE;
Char str [100];
Char num [10];
strcpy (Str,pnode->getiden ());
int vcount = Pnode->getvnodecount ();
Long hash = 0;
cvirtualnode_s * node = NULL;
/* Delete all virtual nodes generated by the entity node * *.
for (int i=0;i<vcount;i++)
{
Itoa (i,num,10);
strcat (str,num)/* Use this method to produce different iden strings * *
hash = Func->gethashval (str);
Rbnode = Util_rbtree_search (Vnode_tree,hash);
if (rbnode!=null)
{
node = (cvirtualnode_s *) rbnode->data;
if (node->getnode_s () ==pnode && Node->gethash () ==hash)
{
this->vnodes--;
/* Remove the node from the red and black tree.
Util_rbtree_delete (Vnode_tree,rbnode);
Delete Rbnode;
Delete node;
}
}
}
return 0;
}

cnode_s * CCONHASH::LOOKUPNODE_S (const char * object)
{
if (object==null| | this->vnodes==0) return NULL;
util_rbtree_node_t * RBNODE;
int key = This->func->gethashval (object);
* * Find the smallest node in the red-black tree with key value greater than key
Rbnode = Util_rbtree_lookup (Vnode_tree,key);
if (rbnode!=null)
{
Return ((cvirtualnode_s *) rbnode->data)->getnode_s ();
}
return NULL;
}

int Cconhash::getvnodes ()
{
Return this->vnodes;
}


util_rbtree_node_t * Vnode2rbnode (cvirtualnode_s * vnode)
{
if (vnode==null) return NULL;
util_rbtree_node_t *rbnode = new util_rbtree_node_t ();
Rbnode->key = Vnode->gethash ();
Rbnode->data = Vnode;
return rbnode;
}

5, create a client class, to test the consistency hash algorithm.

A GetIP function is written to simulate a randomly generated IP string.

Copy Code code as follows:

#include <iostream>
#include "Cnode_s.h"
#include "Cvirtualnode_s.h"
#include "CHashFun.h"
#include "CMD5HashFun.h"
#include "CConHash.h"
#include <string.h>
#include <time.h>

using namespace Std;

void GetIP (char * IP)
{
int a=0, b=0, c=0, d=0;
A = rand ()%256;
b = rand ()%256;
c = rand ()%256;
d = rand ()%256;
Char aa[4],bb[4],cc[4],dd[4];
Itoa (A, AA, 10);
Itoa (b, BB, 10);
Itoa (C, CC, 10);
Itoa (d, DD, 10);
strcpy (IP,AA);
strcat (IP, ".");
strcat (IP,BB);
strcat (IP, ".");
strcat (IP,CC);
strcat (IP, ".");
strcat (IP,DD);
}

int main ()
{
Srand (Time (0));
Freopen ("OUT.txt", "R", stdin);
/* Define HASH function * *
Chashfun * func = new Cmd5hashfun ();
/* Create a consistent hash object * *
Cconhash * Conhash = new Cconhash (func);

/* Definition cnode*/
cnode_s * Node1 = new cnode_s ("Machinea", 50, "10.3.0.201");
cnode_s * Node2 = new cnode_s ("Machineb", 80, "10.3.0.202");
cnode_s * node3 = new cnode_s ("Machinec", 20, "10.3.0.203");
cnode_s * node4 = new cnode_s ("machined", 100, "10.3.0.204");

conhash->addnode_s (Node1);
conhash->addnode_s (Node2);
conhash->addnode_s (NODE3);
conhash->addnode_s (NODE4);

/* Dynamic Change Node data value * *
Node1->setdata ("99999999");

int ans1, ANS2, ANS3, ANS4;
ans1=ans2=ans3=ans4=0;

Char object[100];
cnode_s * node;
/* Dynamic Delete node * *
conhash->delnode_s (Node2);
for (int i =0;i<30;i++)
{
GetIP (object);
cout<<object<<endl;
cin>>object;
node = conhash->lookupnode_s (object);
if (node!=null)
{
cout<<object<< "----->\t" <<node->getiden () << "\ T" << (char *) node->getdata () <<endl;
if (strcmp (Node->getiden (), "Machinea") ==0) ans1++;
if (strcmp (Node->getiden (), "Machineb") ==0) ans2++;
if (strcmp (Node->getiden (), "Machinec") ==0) ans3++;
if (strcmp (Node->getiden (), "machined") ==0) ans4++;
}
}

cout<< "Total test cases:" <<ans1+ans2+ans3+ans4<<endl;
cout<< "Map to Machinea:" <<ans1<<endl;
cout<< "Map to Machineb:" <<ans2<<endl;
cout<< "Map to Machinec:" <<ans3<<endl;
cout<< "Map to machined:" <<ans4<<endl;
Fclose (stdin);
return 0;
}

6, delete the node to the hash route impact test

Test Results screenshot:

Analysis: The above two images, the left side of the original four entity node routing situation, followed by the deletion of Node 2 (NODE2) after the routing situation. It is not difficult to find that, after machineb down, the original routing request was more evenly loaded to the other machine nodes and had no effect on the request originally routed to other nodes. For example 139.149.184.125 this request will still be routed to machined, and will not be affected by the reduction of the node. However, if you increase the entity nodes, it may cause inconsistencies before and after the routing, because the routing interval is narrower, but it does not have a particularly large impact. On the other hand, it can be found that the proportional distribution of virtual nodes in the entity nodes greatly affects the load routing of the settlement point, which is approximately the same as the number of virtual nodes.

Summarize:

Firstly, this paper introduces the key algorithms of the consistency hash algorithm and the choice analysis of the data structure, selects the red-black tree as the storage structure of the virtual node, and MD5 algorithm is used as the hash function to compute the hash value of the node. and using C + + language, the consistency hash algorithm is implemented, which realizes the basic functions of adding, deleting and searching the consistent hash entity nodes, and carries out the test and analysis. Due to the limited level of the author, there are many places to be improved, so this article is only for reference, discussion and study.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.