Two key issues need to be solved for implementing consistent hash algorithms. One is the selection of data structures used for node storage and search, and the other is the selection of node hash algorithms.
First, let's talk about the data structure used to store nodes in the consistent hash algorithm. By understanding the principles of consistent hash, we know that a node is stored in a circular data structure (for example ), the hash values of nodes A, B, C, and D are ordered in the Ring Distribution. That is to say, nodes can be stored in an ordered queue by hash value. As shown in, when a request point P with a hash value of-2 ^ 20 finds the route endpoint, consistent hash algorithms route the hash value clockwise to the first node (B), which is equivalent to the sequential structure of the storage node, locate the node with the smallest value greater than the key value based on the queried key value. Therefore, we should select a data structure, which should efficiently support frequent addition and deletion of nodes, and must have an ideal query efficiency. Then, the red and black trees can meet these requirements. The red/black tree is an almost balanced binary search tree, because operations such as insertion, deletion, and searching for a value must be at the same time as the height of the tree, this theoretical ceiling on height allows the red and black trees to be efficient in the worst case, unlike the ordinary Binary Search Tree. Therefore, we choose to use the red/black tree as the storage structure of the node. In addition to implementing the basic functions of inserting, deleting, and searching the red/black tree, we should also add another lookup function, used to find the smallest node in the key.
Next, let's talk about the selection of the hash algorithm. The consistent hash algorithm was initially proposed to solve the problem of Server Load balancer. Each physical node contains many virtual nodes, which are the key to load balancing. We hope that the virtual nodes can be evenly hashed across the entire "ring", so that not only the route requests with different hash values can be loaded, but also when a node is down, requests routed to the down node can also be evenly routed to other nodes without causing a large number of load requests to a node. Here, we choose to use the MD5 algorithm. Using the MD5 algorithm, you can convert a symbolic string (used to mark virtual nodes) to a 16-byte character array, and then process the array to obtain an integer hash Value. Because MD5 has a high degree of discretization, the generated hash value will also have a great degree of discretization and will be evenly hashed to the "ring.
I have implemented the consistent hash algorithm in C ++. I will describe some key details below.
1. First define the object node class and virtual node class. A physical node corresponds to multiple virtual nodes.
Entity node CNode_s:
Copy codeThe Code is as follows:/* entity node */
Class CNode_s
{
Public:
/* Constructor */
CNode_s ();
CNode_s (char * pIden, int pVNodeCount, void * pData );
/* Obtain the node ID */
Const char * getIden ();
/* Obtain the number of virtual nodes of an object node */
Int getVNodeCount ();
/* Set the object node data value */
Void setData (void * data );
/* Obtain the object node data value */
Void * getData ();
Private:
Void setCNode_s (char * pIden, int pVNodeCount, void * pData );
Char iden [100];/* node ID string */
Int vNodeCount;/* Number of virtual nodes */
Void * data;/* data Node */
};
Virtual node CVirtualNode_s: the virtual node has a pointer pointing to the object NodeCopy codeThe Code is as follows:/* virtual node */
Class CVirtualNode_s
{
Public:
/* Constructor */
CVirtualNode_s ();
CVirtualNode_s (CNode_s * pNode );
/* Set the object node to which the virtual node points */
Void setNode_s (CNode_s * pNode );
/* Obtain the object node pointed to by the virtual node */
CNode_s * getNode_s ();
/* Set the virtual node hash value */
Void setHash (long pHash );
/* Obtain the virtual node hash value */
Long getHash ();
Private:
Long hash;/* hash value */
CNode_s * node;/* entity node pointed to by the virtual node */
};
2. the hash algorithm is optional and defines a hash algorithm interface to facilitate extension of other algorithms in the future.
Create the MD5hash class and inherit the interface. Calculate the hash value using the MD5 algorithm.
Class diagram:
CHashFun interface:
Copy codeThe Code is as follows:/* defines the Hash function interface to calculate the hash value of a node */
Class CHashFun
{
Public:
Virtual long getHashVal (const char *) = 0;
};
The 5hashfun class inherits the CHashFun interface to implement the getHashVal function for obtaining hash values:Copy codeThe Code is as follows:/* use the MD5 algorithm to calculate the hash value of the node and inherit the CHashFun parent class */
Class implements 5hashfun: public CHashFun
{
Public:
Virtual long getHashVal (const char *);
};
Long enough 5hashfun: getHashVal (const char * instr)
{
Int I;
Long hash = 0;
Unsigned char digest [16];
/* Call the MD5 function to generate the MD5 code of the instr and store it to digest */
Md5_state_t md5state;
Md5_init (& md5state );
Md5_append (& md5state, (const unsigned char *) instr, strlen (instr ));
Md5_finish (& md5state, digest );
/* Each four bytes constitute a 32-bit integer,
Add four 32-bit Integers to obtain the hash value of the instr (which may overflow )*/
For (I = 0; I <4; I ++)
{
Hash + = (long) (digest [I * 4 + 3] & 0xFF) <24)
| (Long) (digest [I * 4 + 2] & 0xFF) <16)
| (Long) (digest [I * 4 + 1] & 0xFF) <8)
| (Long) (digest [I * 4 + 0] & 0xFF ));
}
Return hash;
}
3. extend the search function in the red/black tree structure to find the smallest node in the red/black tree than the key value.
Copy codeThe Code is as follows: util_rbtree_node_t * util_rbtree_lookup (util_rbtree_t * rbtree, long key)
{
If (rbtree! = NULL )&&! Util_rbtree_isempty (rbtree ))
{
Util_rbtree_node_t * node = NULL;
Util_rbtree_node_t * temp = rbtree-> root;
Util_rbtree_node_t * null = _ NULL (rbtree );
While (temp! = Null)
{
If (key <= temp-> key)
{
Node = temp;/* update node */
Temp = temp-> left;
}
Else if (key> temp-> key)
{
Temp = temp-> right;
}
}
/* If node = NULL return the minimum node */
Return (node! = NULL )? Node: util_rbtree_min (rbtree ));
}
Return NULL;
}
4. Create a consistent hash class. Allows you to insert, delete, and search for object nodes.
The specific algorithm and operation process are described in the Code annotations.
Copy codeThe Code is as follows: class CConHash
{
Public:
/* Constructor */
CConHash (CHashFun * pFunc );
/* Set the hash function */
Void setFunc (CHashFun * pFunc );
/* Add an object node. 0 indicates success, and-1 indicates failure */
Int addNode_s (CNode_s * pNode );
/* Delete an object node. "0" indicates that the object node is successfully deleted. "-1" indicates that the object node fails */
Int delNode_s (CNode_s * pNode );
/* Search for object nodes */
CNode_s * lookupNode_s (const char * object );
/* Obtain the number of all virtual nodes with consistent hash structure */
Int getVNodes ();
Private:
/* Hash function */
CHashFun * func;
/* Total Number of virtual nodes */
Int vNodes;
/* Red/black tree for storing virtual nodes */
Util_rbtree_t * vnode_tree;
};
/* Auxiliary function. The virtual node is converted to the red/black tree node */
Util_rbtree_node_t * vNode2RBNode (CVirtualNode_s * vnode );
CConHash: CConHash (CHashFun * pFunc)
{
/* Set the hash function */
Assert (pFunc! = NULL );
This-> func = pFunc;
This-> vNodes = 0;
/* Initialize the red/black tree */
Vnode_tree = new util_rbtree_s ();
Util_rbtree_init (vnode_tree );
}
Int CConHash: addNode_s (CNode_s * pNode)
{
If (pNode = NULL) return-1;
Int vCount = pNode-> getVNodeCount ();
If (vCount <= 0) return-1;
CVirtualNode_s * virtualNode;
Util_rbtree_node_t * rbNode;
Char str [100];
Char num [10];
Strcpy (str, pNode-> getIden ());
Long hash = 0;
/* Generate a virtual node and insert it into the red/black tree */
For (int I = 0; I <vCount; I ++)
{
VirtualNode = new CVirtualNode_s (pNode );
/* Use the str + "I" method to generate different iden strings for subsequent hash value calculation */
Itoa (I, num, 10 );
Strcat (str, num );
Hash = func-> getHashVal (str );
VirtualNode-> setHash (hash );
If (! Util_rbtree_search (vnode_tree, hash ))
{
/* Generate a red/black tree node */
RbNode = vNode2RBNode (virtualNode );
If (rbNode! = NULL)
{
/* Insert the node to the red/black tree */
Util_rbtree_insert (vnode_tree, rbNode );
This-> vNodes ++;
}
}
}
Return 0;
}
Int CConHash: delNode_s (CNode_s * pNode)
{
If (pNode = NULL) return-1;
Util_rbtree_node_t * rbNode;
Char str [100];
Char num [10];
Strcpy (str, pNode-> getIden ());
Int vCount = pNode-> getVNodeCount ();
Long hash = 0;
CVirtualNode_s * node = NULL;
/* Delete all virtual nodes generated by the physical node */
For (int I = 0; I <vCount; I ++)
{
Itoa (I, num, 10 );
Strcat (str, num);/* use this method to generate different iden strings */
Hash = func-> getHashVal (str );
RbNode = util_rbtree_search (vnode_tree, hash );
If (rbNode! = NULL)
{
Node = (CVirtualNode_s *) rbNode-> data;
If (node-> getNode_s () = pNode & node-> getHash () = hash)
{
This-> vNodes --;
/* Delete the node from the red/black tree */
Util_rbtree_delete (vnode_tree, rbNode );
Delete rbNode;
Delete node;
}
}
}
Return 0;
}
CNode_s * CConHash: lookupNode_s (const char * object)
{
If (object = NULL | this-> vNodes = 0) return NULL;
Util_rbtree_node_t * rbNode;
Int key = this-> func-> getHashVal (object );
/* Find the smallest node with a key value greater than the key value in the red/black tree */
RbNode = util_rbtree_lookup (vnode_tree, key );
If (rbNode! = NULL)
{
Return (CVirtualNode_s *) rbNode-> data)-> getNode_s ();
}
Return NULL;
}
Int CConHash: getVNodes ()
{
Return this-> vNodes;
}
Util_rbtree_node_t * vNode2RBNode (CVirtualNode_s * vnode)
{
If (vnode = NULL) return NULL;
Util_rbtree_node_t * rbNode = new util_rbtree_node_t ();
RbNode-> key = vnode-> getHash ();
RbNode-> data = vnode;
Return rbNode;
}
5. Create a client class to test the consistent hash algorithm.
A getIP function is written to simulate a random IP string.
Copy codeThe Code is as follows: # include <iostream>
# Include "CNode_s.h"
# Include "CVirtualNode_s.h"
# Include "CHashFun. h"
# Include "4255hashfun. h"
# Include "CConHash. h"
# Include <string. h>
# Include <time. h>
Using namespace std;
Void getIP (char * IP)
{
Int a = 0, B = 0, c = 0, d = 0;
A = rand () % 256;
B = rand () % 256;
C = rand () % 256;
D = rand () % 256;
Char aa [4], bb [4], cc [4], dd [4];
Itoa (a, aa, 10 );
Itoa (B, bb, 10 );
Itoa (c, cc, 10 );
Itoa (d, dd, 10 );
Strcpy (IP, aa );
Strcat (IP ,".");
Strcat (IP, bb );
Strcat (IP ,".");
Strcat (IP, cc );
Strcat (IP ,".");
Strcat (IP, dd );
}
Int main ()
{
Srand (time (0 ));
Freopen ("out.txt", "r", stdin );
/* Define the hash function */
CHashFun * func = new 4105hashfun ();
/* Create a consistent hash object */
CConHash * conhash = new CConHash (func );
/* Define CNode */
CNode_s * node1 = new CNode_s ("machineA", 50, "10.3.0.201 ");
CNode_s * node2 = new CNode_s ("machineB", 80, "10.3.0.202 ");
CNode_s * node3 = new CNode_s ("machineC", 20, "10.3.0.203 ");
CNode_s * node4 = new CNode_s ("machineD", 100, "10.3.0.204 ");
Conhash-> addNode_s (node1 );
Conhash-> addNode_s (node2 );
Conhash-> addNode_s (node3 );
Conhash-> addNode_s (node4 );
/* Dynamically change the node data value */
// Node1-& gt; setData ("99999999 ");
Int ans1, ans2, ans3, ans4;
Ans1 = ans2 = ans3 = ans4 = 0;
Char object [100];
CNode_s * node;
/* Dynamically delete a node */
// Conhash-> delNode_s (node2 );
For (int I = 0; I <30; I ++)
{
// GetIP (object );
// Cout <object <endl;
Cin> object;
Node = conhash-> lookupNode_s (object );
If (node! = NULL)
{
Cout <object <"-----> \ t" <node-> getIden () <"\ t" <(char *) node-> getData () <endl;
If (strcmp (node-> getIden (), "machineA") = 0) ans1 ++;
If (strcmp (node-> getIden (), "machineB") = 0) ans2 ++;
If (strcmp (node-> getIden (), "machineC") = 0) ans3 ++;
If (strcmp (node-> getIden (), "machineD") = 0) ans4 ++;
}
}
Cout <"Total test cases:" <ans1 + ans2 + ans3 + ans4 <endl;
Cout <"Map to MachineA:" <ans1 <endl;
Cout <"Map to MachineB:" <ans2 <endl;
Cout <"Map to MachineC:" <ans3 <endl;
Cout <"Map to MachineD:" <ans4 <endl;
Fclose (stdin );
Return 0;
}
6. test the impact of deleting nodes on hash Routing
Test results:
Analysis:In the preceding two images, the left side shows the routes of the original four physical nodes, and the following shows the routes after node 2 (Node2) is deleted. It is not difficult to find that after MachineB is down, the original routing requests load evenly to other machine nodes, and there is no impact on the requests originally routed to other nodes. For example, the request 139.149.184.125 will still be routed to MachineD and will not be affected due to the reduction of nodes. However, adding physical nodes may result in inconsistent routing conditions before and after the increase, because the routing interval is narrower, but there is no significant impact. On the other hand, we can find that the proportional distribution of the number of virtual nodes on the physical node greatly affects the load routing of the node. The ratio is roughly the same as the number of virtual nodes.
Summary:
This article first introduces the key algorithms and data structure selection Analysis for implementing consistent hash algorithms, and selects the red/black tree as the storage structure of virtual nodes, the MD5 algorithm is used as a Hash function to calculate the hash value of a node. The consistent hash algorithm is implemented in the C ++ language, and basic functions such as adding, deleting, and searching consistent hash entity nodes are implemented. The test analysis is also carried out. Due to my limited level, there are many areas to be improved, so this article is for your reference and discussion.