Store the data of different numbers on different machines to disperse the pressure. If we have 1 million QQ number, 10 machines, how to divide it?
The simplest and most brutal method is to use the QQ number directly to 10, the result is 0-9 respectively corresponding to the above 10 machines. For example, the QQ number of 23900 users in the number 0 machine 23901 users in the number 1 machine, and so on. So the problem comes, now QQ users rose sharply from 1 million to 5 million, apparently 10 machines have been powerless
, so we expanded to 25 units. This time we found that the previous data were all messed up. Finished! Just run away ...
A measure of the Hash algorithm is monotonicity (monotonicity), which is defined as follows:
monotonicity refers to the addition of a new buffer to the system if some content has been allocated to the corresponding buffer by hashing. The result of the hash should be to ensure that the original allocated content can be mapped to a new buffer without being mapped to another buffer in the old buffer collection.
Easy to see, above the simple hash algorithm hash (object)%N difficult to meet the monotonicity requirements.
So in the case of reasonable dispersion, we are still able to expand. This is the consistency hash, the consistent hash algorithm is to map value to a 32-bit key value, that is, the numerical space of the 0~2^32-1; we can think of this space as a ring with a first (0) tail (2^32-1), and when there is data come in clockwise to find the most Near a point, this point, is the node machine I want. Such as:
Hash ("192.168.128.670")---->a//Generate nodes based on server IP hash
Hash ("192.168.148.670")---->c//Generate nodes based on server IP hash
Hash ("81288812")----> K1//hash out according to the QQ number generated value-----> clockwise to find the machine
Hash ("8121243812")----> K4//hash out according to the QQ number generated value-----> clockwise to find the machine
So when new machines are added, the old machines are removed, and the impact is a fraction of the data. This seems perfect, but if one of the node B data surges and hangs, all the data will fall to c--->c can not carry----> All data will fall to D ... And so on, finally all hung up! The whole world is quiet!!!
Obviously, this way the service hangs because the data is not average. So our consistency hash also needs to be balanced.
balance means that the result of the hash can be distributed to all buffers as much as possible, thus allowing all buffer space to be exploited.
To solve the balance, the consistency hash introduces the concept of virtual node. Virtual node is the actual node in the hash space of the replica (replica), a real node corresponding to a number of "virtual node", the corresponding number has become "Replication Number", "Virtual node" in the hash space in the hash value. So if we have 25 servers, each virtual 10, there are 250 virtual nodes. This ensures that the load of each node is not too large, the pressure is equally shared, something to carry!!!
Hash ("192.168.128.670#36kr01")---->a//Generate nodes based on server IP hash
Hash ("192.168.128.670#36kr02")---->b//Generate nodes based on server IP hash
Hash ("192.168.128.670#36kr03")---->b//Generate nodes based on server IP hash
......
Final Virtual node +murmurhash is our solution:
Class Shard<s> {//S classes encapsulate machine node information such as name, password, IP, port, etc. private treemap<long, s> nodes;//Virtual node privat e list<s> shards; Real machine node private final int node_num = 100; Number of virtual nodes associated with each machine node public Shard (list<s> shards) {super (); This.shards = shards; Init (); } private void Init () {//Initialize consistency hash ring nodes = new Treemap<long, s> (); for (int i = 0; I! = Shards.size (); ++i) {//each real machine node requires an associated virtual node final S shardinfo = Shards.get (i); for (int n = 0; n < node_num; n++)//A real machine node associated node_num virtual node Nodes.put (hash ("shard-" + I + "-node-" + N), shardinfo); }} public S Getshardinfo (String key) {sortedmap<long, s> tail = nodes.tailmap (hash (key)), and//found along the ring clockwise A virtual node if (tail.size () = = 0) {return nodes.get (Nodes.firstkey ()); } return Tail.get (Tail.firstkey ()); Returns the information of the Real machine node corresponding to the virtual node}/** * MurmurHash algorithm, non-cryptographic hash algorithm, high performance, * than the traditional crc32,md5,sha-1 (these two algorithms are cryptographic hash algorithm, the complexity itself is very high, resulting in the performance of the damage is inevitable) * and other hash algorithm is much faster, and it is said that the algorithm collision rate is very low. * http://murmurhash.googlepages.com/*/private Long hash (String key) {Bytebuffer buf = Bytebuffer.wra P (Key.getbytes ()); int seed = 0X1234ABCD; Byteorder Byteorder = Buf.order (); Buf.order (Byteorder.little_endian); Long m = 0xc6a4a7935bd1e995l; int r = 47; Long h = seed ^ (buf.remaining () * m); Long K; while (Buf.remaining () >= 8) {k = Buf.getlong (); K *= m; K ^= k >>> R; K *= m; H ^= K; H *= m; } if (buf.remaining () > 0) {bytebuffer finish = bytebuffer.allocate (8). Order (Byt Eorder.little_endian); For Big-endian version, does this first://Finish.position (8-buf.remaining ()); Finish.put (BUF). Rewind (); H ^= Finish.getlong (); H *= m; } h ^= h >>> R; H *= m; H ^= h >>> R; Buf.order (Byteorder); return h; }}
Conformance Hash-java Implementation version TreeMap