Memcache is a distributed cache system, but it does not provide clustering capabilities, which can easily become a bottleneck in large-scale applications. However, the client can be freely extended at this time and implemented in two phases. First phase: Key must first map to a memcache server based on a certain algorithm. The second stage removes the cached value from the server. But there is a problem, such as one of the server hangs, or need to add a service, this time the first phase of the algorithm is very important, how to make the original data as far as possible to continue to be effective, reduce the expansion of nodes or the impact of reducing nodes. Here are some workarounds to consider:
One: Hash consistency algorithm:
Advantages:
When a node fails, the data of the other nodes is not compromised, and the node's data is diverted to a different node. When one node is added, only one partial data of one node is affected.
Disadvantages:
It is very easy to create an imbalance between the data volume between nodes, there may be many hotspots on one node and few hotspots on one node.
Here is a detailed description: (From: http://blog.csdn.net/sparkliang/archive/2010/02/02/5279393.aspx)
Consistent hashing algorithm was put forward in the paper consistent hashing and random trees in 1997, and is widely used in the cache system.
1 Basic Scenarios
For example, if you have n cache server (hereafter referred to as cache), then how to map an object to n cache, you are likely to use a common method like the following to calculate the hash value of object, and then map evenly to the n cache;
Hash (object)%N
Everything is running normally, consider the following two cases;
11 Cache server M down (this must be considered in the actual application) so that all objects mapped to the cache m will be invalidated, what to do, need to remove the cache m from the cache, when the cache is N-1, the mapping formula becomes HA Sh (object)% (N-1);
2 because of the access aggravating, need to add the cache, this time the cache is n+1, mapping formula into a hash (object)% (n+1);
What does 1 and 2 mean? This means that suddenly almost all of the caches are dead. For the server, this is a disaster, flood-like access will be directly rushed back to the server;
Consider the third problem, because the hardware capabilities are getting stronger, you may want to add more nodes to do more work, obviously the above hash algorithm can not be done.
Is there any way to change this situation, this is consistent hashing ...
2 hash Algorithm and monotonicity
A measure of the Hash algorithm is monotonicity (monotonicity), which is defined as follows:
Monotonicity refers to the addition of a new buffer to the system if some content has been allocated to the corresponding buffer by hashing. The result of the hash should be to ensure that the original allocated content can be mapped to a new buffer without being mapped to another buffer in the old buffer collection.
Easy to see, above the simple hash algorithm hash (object)%N difficult to meet the monotonicity requirements.
Principle of the 3 consistent hashing algorithm
Consistent hashing is a hash algorithm, in a nutshell, when removing/adding a cache, it can change the existing key mappings as small as possible, and satisfy the monotonic requirements as much as necessary.
Here are the basic principles of the consistent hashing algorithm in 5 steps.
3.1 Ring Hash Space
Consider that the usual hash algorithm is to map value to a key value of 32, which is the value space of the 0~2^32-1; we can think of this space as a ring with a first (0) tail (2^32-1), as shown in Figure 1 below.
Figure 1 Ring Hash space
3.2 Mapping objects to the hash space
Next consider 4 objects Object1~object4, the hash function calculated by the hash value of key on the ring distribution 2 is shown.
Hash (object1) = Key1;
... ...
Hash (OBJECT4) = Key4;
Figure 2 Key value distributions for 4 objects
3.3 Mapping the cache to the hash space
The basic idea of consistent hashing is to map both the object and the cache to the same hash value space, and use the same hash algorithm.
Assuming that there are currently a A, a, a and C a total of 3 caches, then its mapping results will be 3, they are in the hash space, the corresponding hash value arrangement.
Hash (cache a) = key A;
... ...
Hash (cache c) = key C;
Figure 3 Key value distributions for cache and objects
Speaking of which, by the way, the cache hash calculation, the general method can use the cache machine's IP address or machine name as a hash input.
3.4 Mapping objects to the cache
Now that both the cache and the object have been mapped to the hash value space using the same hash algorithm, the next thing to consider is how to map the object to the cache.
In this annular space, if you start from the object's key value in a clockwise direction until you meet a cache, the object is stored on the cache because the hash value of the object and cache is fixed, so the cache must be unique and deterministic. Did you find the mapping method for the object and cache?!
Continue with the above example (see Figure 3), then, according to the above method, the object Object1 will be stored on cache A; Object2 and object3 correspond to cache C; Object4 corresponds to cache B;
3.5 Review the change of the cache
Said before, through the hash and then the method of redundancy is the biggest problem is not to meet the monotony, when the cache changes, the cache will fail, and then the background server caused a huge impact, now to analyze and analyze the consistent hashing algorithm.
3.5.1 Removing the cache
Consider the assumption that cache B hangs up, and according to the mapping method described above, the only objects that will be affected are those that go counterclockwise through cache B until the next cache (cache C), which is the object mapped to cache B.
So here you only need to change the object Object4 and remap it to cache C; see Figure 4.
Figure 4 Cache Map after cache B has been removed
3.5.2 Add Cache
Consider the case of adding a new cache D, assuming that in this ring hash space, cache D is mapped between the object Object2 and Object3. The only things that will be affected are those objects that traverse the cache D counterclockwise until the next cache (cache B), which is also mapped to a portion of the object on cache C, to remap the objects to cache d.
So here you only need to change the object object2 and remap it to cache D; see Figure 5.
Figure 5 Mapping relationships after adding cache D
4 Virtual nodes
Another indicator for considering the Hash algorithm is the balance (Balance), which is defined as follows:
Balance of
Balance means that the result of the hash can be distributed to all buffers as much as possible, thus allowing all buffer space to be exploited.
The hash algorithm is not guaranteed to be absolutely balanced, and if the cache is small, the object cannot be mapped evenly to the cache, as in the example above, where only cache A and cache C are deployed, in 4 objects, cache a only stores the Object1 , while cache C stores Object2, Object3, and Object4, which are unevenly distributed.
To solve this situation, consistent hashing introduces the concept of "virtual node", which can be defined as follows:
Virtual node is the actual node in the hash space of the replica (replica), a real node corresponding to a number of "virtual node", the corresponding number has become "Replication Number", "Virtual node" in the hash space in the hash value.
In the case of deploying only cache A and cache C, we have seen in Figure 4 that the cache distribution is not uniform. Now we introduce the virtual node, and set the "number of copies" to 2, which means there will be 4 "virtual nodes", the cache A1, cache A2 represents the cache A; Cache C1, Cache C2 represents the cache C; Suppose a more ideal case, see Figure 6.
Figure 6 Mapping relationship after the introduction of "Virtual Node"
At this point, the mapping of the object to the virtual node is:
Objec1->cache A2; objec2->cache A1; Objec3->cache C1; Objec4->cache C2;
So objects Object1 and Object2 are mapped to cache a, and object3 and Object4 are mapped to cache C; The balance has improved a lot.
After the "Virtual node" is introduced, the mapping relationship is transformed from {object---node} to {Object-and-virtual node}. The mapping relationship 7 is shown when querying the cache of an object.
Figure 7 The cache where the object is queried
The hash calculation of "virtual node" can be based on the IP address of the corresponding node plus the number suffix. For example, assume that the IP address of Cache A is 202.168.14.241.
Before introducing "Virtual node", calculate the hash value of cache A:
Hash ("202.168.14.241");
After introducing "virtual node", compute the hash value of the "virtual section" point cache A1 and cache A2:
Hash ("202.168.14.241#1"); Cache A1
Hash ("202.168.14.241#2"); Cache A2
=========================================
I. Overview
1, our memcache client (here I see the source of Spymemcache), using a consistent hash algorithm Ketama data storage node selection. Unlike the conventional hash algorithm, we only hash the key we want to store the data, and allocate it to different node storage. The consistent hash algorithm computes the hash of the server where we want to store the data, and then confirms where each key is stored.
2, the application of the conventional hash algorithm and its drawbacks
The most common way is the way of the hash modulo. For example, the number of machines available in the cluster is n, then the data request with a key value of K is simple enough to be routed to a hash (k) mod n corresponding machine. Indeed, this structure is simple and practical. But in some of the fast-growing web systems, such solutions still have some drawbacks. With the increase of system access pressure, the cache system has to increase the corresponding speed and data carrying capacity of the cluster by increasing the machine node. Increase the machine means in accordance with the way of the hash, in the time of increasing the machine node, a large number of cache, cache data need to be re-established, or even the overall cache data migration, the moment will bring a very high system load on the DB, set the DB server downtime.
3, the design of distributed cache system, the consistency hash algorithm can help us solve what problems?
The core point of distributed cache design: When designing a distributed caching system, we need to equalize the distribution of the key, and after adding the cache server, the cache will be migrated to a minimum.
The consistency hash algorithm mentioned here Ketama the practice is: Select the specific machine node is not only rely on the key to cache the hash itself, but the machine node itself is also a hash operation.
Second, consistent hash algorithm scenario description (reproduced)
1. Hash machine node
First find the Machine node hash value (how to calculate the Machine node hash?) IP can be used as a hash parameter. Of course there are other ways), and then distribute it to a ring in the 0~2^32 (clockwise distribution). As shown in the following:
Figure A
There are machines in the cluster: A, B, C, D, E five machines, through a certain hash algorithm we distribute it to the ring as shown.
2. Access method
If there is a write cache request where the key value is K, the calculator hash value is hash (k), the hash (k) corresponds to a point in the graph –1 ring, if the point corresponding to a specific machine node is not mapped, then look clockwise until the first time to find the node with the mapped machine, The node is the target node that is determined, and if it exceeds the 2^32 still cannot find the node, hit the first machine node. For example, the hash (K) value is between A~b, then the hit machine node should be a B node (such as).
3, increase the processing of nodes
For example, –1, on the basis of the original cluster to add a machine f, the increase process is as follows:
The hash value of the computer node that maps the machine to a node in the ring, such as:
Figure II
After adding the Machine node F, the access policy does not change, still according to (2) in the manner of access, when the cache is still unavoidable, the data that cannot be hit is the hash (K) in increasing the node before the data between c~f. Although there is still a hit problem caused by the increase of the node, but compared with the traditional method of hashing, the consistency hash has reduced the data to a minimum.
Consistent hashing minimizes the redistribution of hash keys. In addition, in order to achieve a better load balancing effect, often in a small number of servers need to increase the virtual node to ensure that the server can be evenly distributed on the ring. Because of the general hash method, the map location of the server is unevenly distributed. Using the idea of a virtual node, allocate 100~200 points on the circle for each physical node (server). This can suppress uneven distribution and minimize cache redistribution when the server is increasing or decreasing. The user data is mapped on a virtual node, which means that the user data is actually stored on the actual physical server represented by the virtual node.
Here is a diagram that describes the virtual nodes that need to be added for each physical server.
Might
The x-axis represents a virtual node multiplier (scale) that needs to be scaled for each physical server, and the y-axis is the actual number of physical servers, and it can be seen that when the number of physical servers is very small, a larger virtual node is needed, and less nodes are required, as can be seen from the graph, when the physical server has 10 It is almost necessary to add 100~200 virtual nodes to each server to achieve a true load balancer.
Third, the Spymemcache source to demonstrate the virtual node application
1, the above description of the consistency hash algorithm has a potential problem is:
(1), the node hash will be unevenly distributed on the ring, so that a large number of keys in the search for nodes, there will be key hit each node probability difference is large, unable to achieve effective load balancing.
(2), if there are three nodes node1,node2,node3, distributed in the ring when the three nodes close to the key on the ring to find the node, a large number of key clockwise is always assigned to Node2, and the other two nodes are found to be very small probability.
2, the solution of this problem can be:
Improve the hash algorithm, evenly distribute each node to the ring; [citation] Use the idea of a virtual node to allocate 100~200 points for each physical node (server) on a circle. This can suppress uneven distribution and minimize cache redistribution when the server is increasing or decreasing. The user data is mapped on a virtual node, which means that the user data is actually stored on the actual physical server represented by the virtual node.
When viewing the spy Memcached client, it was found that it adopted a hash algorithm called Ketama to solve Memcached distributed problem with the idea of virtual node.
3. Source Code Description
The client uses TreeMap to store all nodes and simulates a circular logical relationship. In this loop, the nodes are sequentially related, so the TreeMap key must implement the comparator interface.
How does that node fit into this ring?
protected voidSetketamanodes (list<memcachednode>nodes) {TreeMap<long, memcachednode> newnodemap =NewTreemap<long, memcachednode>(); intnumreps=config.getnoderepetitions (); for(Memcachednode node:nodes) {//Ketama does some special work with MD5 where it reuses chunks. if(HashAlg = =Hashalgorithm.ketama_hash) { for(intI=0; I<numreps/4; i++) { byte[] digest=hashalgorithm.computemd5 (Config.getkeyfornode (node, i)); for(intH=0;h<4; h++) {Long k= ((Long) (digest[3+h*4]&0xFF) << -) | ((Long) (digest[2+h*4]&0xFF) << -) | ((Long) (digest[1+h*4]&0xFF) <<8) | (digest[h*4]&0xFF); Newnodemap.put (k, node); GetLogger (). Debug ("Adding node%s in position%d", node, k); } } } Else { for(intI=0; i<numreps; i++) {newnodemap.put (Hashalg.hash (Config.getkeyfornode (node, i)), node); }}} assert Newnodemap.size ()= = Numreps *nodes.size (); Ketamanodes= Newnodemap;
The above process can be summarized as follows: Four virtual nodes for a group, the Getkeyfornode method to get this set of virtual node NAME,MD5 encoding, each virtual junction corresponding to Md5 code 16 bytes of 4, forming a Long value, As the only key in this virtual node in the ring. The 10th row K Why is a long type? This is because the long type implements the comparator interface.
After processing the distribution of the formal node on the ring, you can start the game where key is looking for nodes on the ring.
For each key, you have to complete the above steps: Calculate the MD5, according to the MD5 byte array, through the kemata hash algorithm to get the position of key in this ring.
Memcachednode Getnodeforkey (Longhash) {Final Memcachednode rv; if(!Ketamanodes.containskey (hash)) { //Java 1.6 Adds a Ceilingkey method, but I ' m still stuck in 1.5//in a lot of places, so I ' m doing this myself. Sortedmap<long, memcachednode> tailmap=getketamanodes (). Tailmap (hash); if(Tailmap.isempty ()) {hash=getketamanodes (). Firstkey (); } Else{Hash=Tailmap.firstkey (); }} RV=getketamanodes ().Get(hash); returnRV; }
The implementation of the above code is on the ring clockwise to find, did not find the first one, and then know the corresponding physical node.
Four, the application scenario analysis
1, Memcache Add method: Through the consistency hash algorithm to confirm the current client corresponding to the Cacheserver hash value and to store data key hash to correspond, confirm cacheserver, get connection for data storage
2, Memcache get method: Through the consistent hash algorithm to confirm the current client corresponding to the Cacheserver hash value and to extract the data hash value, and then confirm the stored cacheserver, get connection for data extraction
V. Summary
1, the consistent hash algorithm is only to help us reduce the number of machines in the cache cluster increase or decrease, the cache data can be rebuilt at least. As long as the number of servers in the cache cluster changes, data hit issues inevitably occur
2, for the data distribution balance problem, through the idea of virtual node to achieve a balanced distribution. Of course, the fewer cache server nodes we need, the more virtual nodes are needed to balance the load.
3, our cache client does not maintain a map to record where each key is stored, all through the hash of the key and Cacheserver (perhaps the IP can be used as a parameter) hash to calculate the current key should be stored on which node.
4. When our cache node crashes. We must lose some of the cache data and do a new consistency match calculation based on the live cache server and key. It is possible to rebuild some data that is not lost ...
5, as to the normal arrival data storage node, how to find the key corresponding data, that is the cache server itself implementation of the internal algorithm, not described here.
This is simply a presentation of how the data is stored and how it is extracted.
Memcache's consistent hash algorithm uses