I. Overview
1, our memcache client (here I see the source of Spymemcache), using a consistent hash algorithm Ketama data storage node selection. Unlike the conventional hash algorithm, we only hash the key we want to store the data, and allocate it to different node storage. The consistent hash algorithm computes the hash of the server where we want to store the data, and then confirms where each key is stored.
2, the application of the conventional hash algorithm and its drawbacks
The most common way is the way of the hash modulo. For example, the number of machines available in the cluster is n, then the data request with a key value of K is simple enough to be routed to a hash (k) mod n corresponding machine. Indeed, this structure is simple and practical. But in some of the fast-growing web systems, such solutions still have some drawbacks. With the increase of system access pressure, the cache system has to increase the corresponding speed and data carrying capacity of the cluster by increasing the machine node. Increase the machine means in accordance with the way of the hash, in the time of increasing the machine node, a large number of cache, cache data need to be re-established, or even the overall cache data migration, the moment will bring a very high system load on the DB, set the DB server downtime.
3, the design of distributed cache system, the consistency hash algorithm can help us solve what problems?
The core point of distributed cache design: When designing a distributed caching system, we need to equalize the distribution of the key, and after adding the cache server, the cache will be migrated to a minimum.
The consistency hash algorithm mentioned here Ketama the practice is: Select the specific machine node is not only rely on the key to cache the hash itself, but the machine node itself is also a hash operation.
Second, consistent hash algorithm scenario description (reproduced)
1. Hash machine node
First find the Machine node hash value (how to calculate the Machine node hash?) IP can be used as a hash parameter. Of course there are other ways), and then distribute it to a ring in the 0~2^32 (clockwise distribution). As shown in the following:
Figure A
There are machines in the cluster: A, B, C, D, E five machines, through a certain hash algorithm we distribute it to the ring as shown.
2. Access method
If there is a write cache request where the key value is K, the calculator hash value is hash (k), the hash (k) corresponds to a point in the graph –1 ring, if the point corresponding to a specific machine node is not mapped, then look clockwise until the first time to find the node with the mapped machine, The node is the target node that is determined, and if it exceeds the 2^32 still cannot find the node, hit the first machine node. For example, the value of Hash (K) is between A~b, then the hit machine node should be a B node.
The data is mapped to the ring after processing it through a certain hash algorithm.
Now we will Object1, Object2, Object3, Object4 four objects through a specific hash function to calculate the corresponding key value, and then hash to the hash ring. such as: hash (object1) = Key1; hash (object2) = Key2; hash (object3) = Key3; hash (OBJECT4) = Key4;
mapping a machine to a ring through a hash algorithm
Suppose now have node1,node2,node3 three machines, through the Hash algorithm to get the corresponding key value, mapped to the ring, which is as follows: hash (NODE1) = KEY1; Hash (NODE2) = KEY2; Hash (NODE3) = KEY3;
It can be seen that the object is in the same hash space as the machine, so that the object1 is stored in the NODE1 in a clockwise rotation, object3 is stored in NODE2, object2 and Object4 are stored in NODE3. In such a deployment environment, the hash ring is not changed, so, by calculating the object's hash value can be quickly positioned to the corresponding machine, so that the object can find the real storage location.
3. Removal and addition of machines
Ordinary hash algorithm is the most inappropriate place is in the addition or deletion of the machine will be taken as a large number of object storage location invalidation, so it is not satisfied with the monotony of the big. The following is an analysis of how the consistent hashing algorithm is handled. 1. Node (machine) deletion with the above distribution as an example, if the NODE2 failure is deleted, then according to the method of clockwise migration, OBJECT3 will be migrated to NODE3, so that only the OBJECT3 mapping location has changed, the other objects do not have any changes. such as: 2. Add a node (machine) If you add a new node NODE4 to the cluster, get KEY4 by the corresponding hash algorithm and map to the ring, such as: By moving the rules clockwise, the Object2 is migrated to the NODE4, and the other objects maintain the original storage location. Through the analysis of the addition and deletion of the nodes, the consistency hashing algorithm keeps the monotonicity while the data is migrated to a minimum, so the algorithm is suitable for the distributed cluster, avoids the large amount of data migration, and reduces the pressure of the server.
After adding the machine node, the access policy does not change, still according to (2) in the manner of access, at this time the cache life is still unavoidable, cannot hit the data is hash (K) in the increase of the node before the data between c~f.
Although there is still a hit problem caused by the increase of the node, but compared with the traditional method of hashing, the consistency hash has reduced the data to a minimum.
According to the above diagram analysis, the consistency hashing algorithm satisfies the characteristics of monotonic and load balancing and the dispersion of the general hash algorithm, but it is not considered as a widely used original, because it lacks the balance. The following will analyze how the consistent hashing algorithm is balanced. Hash algorithms are not guaranteed to be balanced, such as the case where only NODE1 and NODE3 are deployed (NODE2 deleted), object1 are stored in NODE1, Object2, OBJECT3, object4 are stored in NODE3, This is a very unbalanced state. In the consistent hashing algorithm, the virtual node is introduced in order to satisfy the balance as much as possible.--"Virtual node" is the actual node (machine) in the hash space of the replica (replica), a real node (machine) corresponding to a number of "virtual node", the corresponding number also become "Replication Number", "Virtual node" in the hash The space is arranged in hash value. As an example of the above only deployed NODE1 and NODE3 (NODE2 deleted diagram), the previous objects are unevenly distributed on the machine, now we take 2 copies (copy number) as an example, so that there are 4 virtual nodes in the entire hash ring, and the graph of the final object mapping is as follows: , &NB Sp based on the mapping of known objects: Object 1->node1-1,object2->node1-2,object3->node3-2,object4->node3-1. Through the introduction of virtual nodes, the distribution of objects is more balanced. So how does a real object query work in real-world operations? Conversion of objects from hash to virtual node to actual node such as: &N Bsp The hash calculation of "Virtual node" can be based on the IP address of the corresponding node plus the digital suffix. For example, assume that the IP address of the NODE1 is 192.168.1.100. Before introducing "Virtual node", calculate the hash value of cache A: hash ("192.168.1.100"), after introducing "virtual node", calculate the hash value of "virtual section" point Node1-1 and Node1-2: hash ("192.168.1.100#1") ; Node1-1hash ("192.168.1.100#2"); Node1-2
Consistent hashing minimizes the redistribution of hash keys. In addition, in order to achieve a better load balancing effect, often in a small number of servers need to increase the virtual node to ensure that the server can be evenly distributed on the ring. Because of the general hash method, the map location of the server is unevenly distributed. using the idea of a virtual node, allocate 100~200 points on the circle for each physical node (server). This can suppress uneven distribution and minimize cache redistribution when the server is increasing or decreasing. The user data is mapped on a virtual node, which means that the user data is actually stored on the actual physical server represented by the virtual node.
Here is a diagram that describes the virtual nodes that need to be added for each physical server.
Might
The x-axis represents a virtual node multiplier (scale) that needs to be scaled for each physical server, and the y-axis is the actual number of physical servers, and it can be seen that when the number of physical servers is very small, a larger virtual node is needed, and less nodes are required, as can be seen from the graph, when the physical server has 10 It is almost necessary to add 100~200 virtual nodes to each server to achieve a true load balancer.
Third, the Spymemcache source to demonstrate the virtual node application
1, the above description of the consistency hash algorithm has a potential problem is:
(1), the node hash will be unevenly distributed on the ring, so that a large number of keys in the search for nodes, there will be key hit each node probability difference is large, unable to achieve effective load balancing.
(2), if there are three nodes node1,node2,node3, distributed in the ring when the three nodes close to the key on the ring to find the node, a large number of key clockwise is always assigned to Node2, and the other two nodes are found to be very small probability.
2, the solution of this problem can be:
Improve the hash algorithm, evenly distribute each node to the ring; [citation] Use the idea of a virtual node to allocate 100~200 points for each physical node (server) on a circle. This can suppress uneven distribution and minimize cache redistribution when the server is increasing or decreasing. The user data is mapped on a virtual node, which means that the user data is actually stored on the actual physical server represented by the virtual node.
When viewing the spy Memcached client, it was found that it adopted a hash algorithm called Ketama to solve Memcached distributed problem with the idea of virtual node.
3. Source Code Description
The client uses TreeMap to store all nodes and simulates a circular logical relationship. In this loop, the nodes are sequentially related, so the TreeMap key must implement the comparator interface.
How does that node fit into this ring?
protectedvoid Setketamanodes (List<MemcachedNode>nodes) {TreeMap<long, memcachednode> newnodemap =NewTreemap<long, memcachednode>(); int Numreps= Config.getnoderepetitions (); for(Memcachednode node:nodes) { //Ketama does some special work with MD5 where it reuses chunks. if(HashAlg = = HashAlgorithm.Ketama_hash) { for(int i=0; i<numreps/4; i++) {byte[] Digest=HASHALGORITHM.COMPUTEMD5 (Config.getkeyfornode (node,i)); for(int h=0;h<4;h++) {Long k= ((long) (DIGEST[3+H*4]&0XFF) << 24) | ((long) (DIGEST[2+H*4]&0XFF) << 16) | ((long) (DIGEST[1+H*4]&0XFF) << 8) | (Digest[h*4]&0xff); Newnodemap. put (k,node); GetLogger (). Debug ("Adding node%s in position%d", node,k); } } } Else { for(int i=0; i<numreps; i++) {Newnodemap. Put (Hashalg.hash (Config.getkeyfornode (node, i)),node); } } } assertNewnodemap.size () = = Numreps * nodes.size (); Ketamanodes= Newnodemap;
The above process can be summarized as follows: four virtual nodes for a group, the Getkeyfornode method to get this set of virtual node NAME,MD5 encoding, each virtual junction corresponding to Md5 code 16 bytes of 4, forming a Long value, As the only key in this virtual node in the ring. the 10th row K Why is a long type? This is because the long type implements the comparator interface.
After processing the distribution of the formal node on the ring, you can start the game where key is looking for nodes on the ring.
For each key, you have to complete the above steps: Calculate the MD5, according to the MD5 byte array, through the kemata hash algorithm to get the position of key in this ring.
Memcachednode Getnodeforkey (long hash) {FinalMemcachednode RV; if(!ketamanodes.ContainsKey (hash)) { //Java 1.6 Adds a Ceilingkey method, but I ' m still stuck in 1.5//In a lot of places, so I ' m doing this Mysel F.Sortedmap<long, Memcachednode> tailmap=getketamanodes ().Tailmap (hash); if(Tailmap.IsEmpty ()) {Hash=getketamanodes ().Firstkey (); } Else{Hash=tailmap.Firstkey (); }} RV=getketamanodes ().get (hash); returnRV; }
The implementation of the above code is on the ring clockwise to find, did not find the first one, and then know the corresponding physical node.
Four, the application scenario analysis
1, Memcache Add method: Through the consistency hash algorithm to confirm the current client corresponding to the Cacheserver hash value and to store data key hash to correspond, confirm cacheserver, get connection for data storage
2, Memcache get method: Through the consistent hash algorithm to confirm the current client corresponding to the Cacheserver hash value and to extract the data hash value, and then confirm the stored cacheserver, get connection for data extraction
V. Summary
1, the consistent hash algorithm is only to help us reduce the number of machines in the cache cluster increase or decrease, the cache data can be rebuilt at least. as long as the number of servers in the cache cluster changes, data hit issues inevitably occur
2, for the data distribution balance problem, through the idea of virtual node to achieve a balanced distribution. of course, the fewer cache server nodes we need, the more virtual nodes are needed to balance the load.
3, our cache client does not maintain a map to record where each key is stored, all through the hash of the key and Cacheserver (perhaps the IP can be used as a parameter) hash to calculate the current key should be stored on which node.
4 . When our cache node crashes. We must lose some of the cache data and do a new consistency match calculation based on the live cache server and key. It is possible to rebuild some data that is not lost ...
Reference:
http://blog.csdn.net/kongqz/article/details/6695417
Consistent Hash algorithm