Introduction to consistent hash algorithm
The consistent hash algorithm is a distributed hash implementation algorithm proposed by MIT in 1997, and the goal of the design is to solve the hot issues in the Internet.
The consistent hash algorithm proposes four definitions for determining the hash algorithm in a dynamically changing cache environment. balance (Balance): The balance means that the result of the hash can be distributed to all nodes as far as possible, thus solving the problem of load balancing from the algorithm. monotonicity (monotonicity): Monotonicity refers to the addition or deletion of nodes, does not affect the normal operation of the system. dispersibility (spread): The dispersion refers to the data should be scattered in the distributed cluster of nodes (nodes can have backups), do not have each node to store all the data. load: Load as a problem is actually from another angle to see the dispersion problem. Since different terminals may map the same content to different nodes, it is possible for a particular node to map different content to different users. As with dispersion, this situation should be avoided as far as possible, so a good hash algorithm should minimize the load on the node.
a simple hash algorithm
Hash calculation is a common technique for data distribution, which computes the hash value by modulo operations, and then maps the data to storage space accordingly. With a storage space composed of N storage nodes, the formula for mapping a data object object to a storage space using simple hashing is: hash (object)%N. As a result of simple calculation, the simple hashing has many disadvantages: the update efficiency is low when adding and deleting nodes. When the number of storage nodes in the system increases or decreases, the mapping formula will change to the hash (object)% (n±1), which will make the mapping location of all objects change, the mapping location of the whole system data object needs to be recalculated, and the system cannot normally respond to the external access. Will cause the system to be in a crash state. Poor balance, no consideration of node performance differences. Due to the improvement of hardware performance, the newly added nodes have better load capacity, how to improve the algorithm, so that the node performance can be better utilized, is also a problem to be solved urgently. Lack of monotony.
principle of consistent hash algorithm
Consistent hash simply by removing or adding a server, this algorithm can change the mapping relationship between existing service requests and processing request servers as little as possible, and satisfy the monotony requirements as much as possible.
In a common distributed cluster, the service request and the processing request server can correspond, that is, the mapping relationship between the fixed service request and the processing server, and a request is handled by a fixed server. This approach does not load-balance the entire system and may cause some servers to be too busy to handle new requests. Others are too idle, the overall system's resource utilization is low, and when a server in a distributed cluster goes down, it directly causes some service requests to be unhandled.
Further improvements can be made by using the hash algorithm to map the relationship between service requests and processing servers to achieve the goal of dynamic allocation. The common hash algorithm adopts the method of simple modulo, and the value after modulo is the request processing server corresponding to the service request. This can achieve satisfying results in the case of node invariant, but in the case of node dynamic change, this approach obviously does not meet the monotonic requirements (when a machine is added or reduced, all stored content will be hashed).
A well-designed distributed system should have a good monotonicity, that is, the server's addition and removal will not cause a lot of hash relocation, and the consistent hash can solve this problem.
The basic principle of the consistent hash algorithm is to map the machine nodes and key values to a 0-2^32 ring with the same hash algorithm. When a write request arrives, calculate the key value K corresponding hash (k), if the value exactly corresponds to a previous machine node hash value, then directly to the machine, if there is no corresponding machine node, then look for the next node clockwise, write, if more than 2^32 has not found the corresponding node, The lookup starts at 0.
When the number of machines on the hash ring is relatively small, there may be uneven machine divisions on the ring, causing some machines to handle a lot of data, while some machines can only handle very little data. So when machine mapping, you can map an entity node to multiple virtual nodes according to the processing power of the machine.
Virtual node is a copy of the actual node (machine) in the hash Space (replica), an actual node (machine) corresponds to a number of "virtual nodes", the corresponding number is also "copy number", "Virtual node" in the hash space in the hash value arrangement.
After hashing a consistent hash algorithm, when a new machine joins, it will only affect the storage of one machine. For example, the new node h to between A and B, the data previously handled by B may be moved to H processing, and all other nodes will be treated unchanged, thus showing good monotonicity.
If you delete a machine, such as deleting the C node, the data that was originally processed by C will be transferred to the D node, and the other node's processing remains unchanged. The same hashing algorithm is used in both the machine node hashing and the cached data hashing, so the dispersibility and load are reduced well.
By introducing virtual nodes, the balance is also greatly improved.
No virtual node consistency hash Java implementation
/** * @Comment No virtual node consistency hash implementation * @Author Ron * @Date October 27, 2017 morning 11:42:09 * @return/public class Consistenthashno Virtualnode {//To be added list of servers added to the hash ring private static string[] Servers = {"192.168.1.0:111", "192.168.1.1:111", "192.16
8.1.2:111 "," 192.168.1.3:111 "," 192.168.1.4:111 "}; Key represents the hash value of the server, value indicates server private static Sortedmap<integer, string> SortedMap = new Treemap<integer, Strin
G> (); Program initialization, put all servers in SortedMap static {for (int i = 0; i < servers.length; i++) {int hash = get
Hash (Servers[i]);
System.out.println ("[" + Servers[i] + "] added to the set, its hash value is" + hash);
Sortedmap.put (hash, servers[i]);
} System.out.println (); ///using the Fnv1_32_hash algorithm to compute the hash value of the server, this does not use the method of overriding Hashcode, the final effect does not distinguish private static int gethash (String str) {FINA l int p = 16777619;//32-bit prime int hash = (int) 2166136261l;//32-bit offset basis for (int i = 0; i < str. LEngth ();
i++) hash = (hash ^ str.charat (i)) * p;
hash = = Hash << 13;
Hash ^= Hash >> 7;
hash = = Hash << 3;
Hash ^= Hash >> 17;
hash = = Hash << 5;
If the calculated value is a negative number, take its absolute number if (hash < 0) hash = math.abs (hash);
return hash; //Get the node that should be routed to private static String Getserver (string key) {//Get the key's hash value int hash = Gethash (
Key);
Get all map Sortedmap<integer larger than the hash value, string> subMap = Sortedmap.tailmap (hash);
if (Submap.isempty ()) {//If there is no larger hash value than the key, the Integer i = Sortedmap.firstkey () is started from the first node;
Returns the corresponding server return sortedmap.get (i);
The else {//The first key is the node nearest to the node clockwise past Integer i = Submap.firstkey ();
Returns the corresponding server return submap.get (i); }} public static void Main (string[] args) {string[] keys = {"Sun", "Moon", "star", "White Clouds", "Blue Sky"};
for (int i=0; i<keys.length; i++) System.out.println ("[" + Keys[i] + "] has a hash value of" + Gethash (keys[i))
+ ", is routed to the node [" + Getserver (Keys[i]) + "]"); }
}
has virtual node consistency hash Java implementation
/** * @Comment has virtual node consistency hash implementation * @Author Ron * @Date October 27, 2017 morning 11:42:51 * @return/public class Consistenthashha Svirtualnode {//To be added list of servers added to the hash ring private static string[] Servers = {"192.168.1.0:111", "192.168.1.1:111", "192.1
68.1.2:111 "," 192.168.1.3:111 "," 192.168.1.4:111 "}; Real node list, taking into account the server online, offline scene, that is, add, delete scenes will be more frequent, where the use of LinkedList will be better private static list<string> realnodes = new LinkedList
<String> (); Virtual node, key represents the hash value of the virtual node, value represents the name of the virtual node private static Sortedmap<integer, string> virtualnodes = new Treemap<i
Nteger, string> ();
The number of virtual nodes, written here dead, in order to demonstrate the need, a real node corresponds to 5 virtual nodes private static final int virtual_nodes = 5; static {///First add the original server to the real node list for (int i = 0; i < servers.length i++) Realnodes.add (serve
Rs[i]); Add virtual nodes again, traversing LinkedList using foreach loop efficiency is higher for (String str:realnodes) {for (int i = 0; i < VIRTUAL _nodes; i++) {String VIRtualnodename = str + "&&VN" + string.valueof (i);
int hash = Gethash (virtualnodename);
SYSTEM.OUT.PRINTLN ("Virtual node [" + Virtualnodename +] is added, hash value is "+ hash");
Virtualnodes.put (hash, virtualnodename);
} System.out.println (); ///using the Fnv1_32_hash algorithm to compute the hash value of the server, this does not use the method of overriding Hashcode, the final effect does not distinguish private static int gethash (String str) {FINA
l int p = 16777619;
int hash = (int) 2166136261L;
for (int i = 0; i < str.length (); i++) hash = (hash ^ str.charat (i)) * p;
hash = = Hash << 13;
Hash ^= Hash >> 7;
hash = = Hash << 3;
Hash ^= Hash >> 17;
hash = = Hash << 5;
If the calculated value is a negative number, take its absolute number if (hash < 0) hash = math.abs (hash);
return hash; //Get the node that should be routed to private static String Getserver (string key) {//Get the key's hash value int hash = Gethash (
Key); Get all map Sortedmap<integer larger than the hash value, string> subMap = Virtualnodes.tailmap (hash);
String Virtualnode;
if (Submap.isempty ()) {//If there is no larger hash value than the key, the Integer i = Virtualnodes.firstkey () is started from the first node;
Returns the corresponding server Virtualnode = Virtualnodes.get (i);
The else {//The first key is the node nearest to the node clockwise past Integer i = Submap.firstkey ();
Returns the corresponding server Virtualnode = Submap.get (i); //Virtualnode virtual node name to intercept if (Virtualnode!= null && virtualnode!= "") {return vir
Tualnode.substring (0, Virtualnode.indexof ("&&"));
return null;
public static void Main (string[] args) {string[] keys = {"Sun", "Moon", "star", "White Cloud", "Blue Sky"}; for (int i = 0; i < keys.length i++) System.out.println ("[+ keys[i] +"] has a hash value of "+ Gethash (Keys[i]) +",
is routed to the node ["+ Getserver (Keys[i]) +"]); }
}