The use of consistent hash algorithm in memcached

Source: Internet
Author: User

I. Overview

1, our memcacheclient (here I see the source of Spymemcache). The choice of data storage nodes is Ketama using the consistent hash algorithm. Different from the conventional hash algorithm. Just hash the key we want to store the data, and allocate it to different node storage. The consistent hash algorithm computes the hash of the server where we want to store the data, confirming where each key is stored.

2, the application of the conventional hash algorithm and its drawbacks

The most common way is the way of the hash modulo. For example, the number of machines available in a cluster is n, so the data request with a key value of K is very easy and should be routed to hash (K) mod n corresponding machine.

Indeed, this structure is simple and useful. However, in some fast-growing web systems, this approach still has some drawbacks. With the increase of system access pressure, the cache system has to increase the corresponding speed and data carrying capacity of the cluster by adding machine nodes. Adding a machine means that a lot of cache life is not in the same time as the machine node is added in accordance with the hash mode. The cache data needs to be established again, even for the overall cache data migration, which can bring a very high system load to the DB in an instant. Setting causes DBServer downtime.

3, the design of distributed cache system, the consistency hash algorithm can help us solve what problems?

The core point of distributed cache design: When designing a distributed caching system, we need to equalize the distribution of key. And after adding the cache server, the cache is migrated to a minimum.


The consistency hash algorithm mentioned here Ketama: The choice of the detailed machine node is not only dependent on the hash of the key that needs to cache the data itself, but the machine node itself is also a hash operation.


Second, consistent hashing algorithm scenario description narrative (reprint)

1. Hash machine node


First find the Machine node hash value (how to calculate the Machine node hash?) IP can be used as a hash of the number of parameters.

。 Of course there are other ways), and then distribute it to a ring in the 0~2^32 (clockwise distribution). For example, as seen in:

Figure A

There are machines in the cluster: A, B, C, D, E five machines, through a certain hash algorithm we distribute it to the ring as seen.


2. Interview method

Suppose you have a request to write to the cache. The key value is K. Calculator hash value hash (k), hash (k) corresponding to a point in the graph –1 ring, assuming that the point is not mapped to a detailed machine node. Then look clockwise. Until the node with the mapped machine is found for the first time. The node is the target node that is determined, assuming that the node is still not found when the 2^32 is exceeded, the first machine node is hit. For example, the value of Hash (K) is between A~b, then the hit machine node should be a B node (for example).


3, the processing of the added node

such as –1, on the basis of the original cluster to add a machine f. Add steps such as the following:

The hash value of the computer node that maps the machine to a node in the ring. For example, with:

Figure II

After adding Machine node F, the access policy does not change and is still visited in accordance with (2). At this time the cache is still not in the situation is unavoidable, the data can not be hit is hash (K) in the addition of the node has fallen in the c~f between the data. Although there is still a hit problem caused by the addition of nodes, the traditional way of hashing. The consistent hash has minimized the data that is not hit.

Consistent hashing minimizes the hash key distribution again. In addition, in order to achieve a better load balancing effect, often in a relatively small number of servers need to add virtual nodes to ensure that the server can be evenly distributed on the ring.

Because of the general hash method, the map location of the server is unevenly distributed. The idea of using virtual nodes. Allocate 100~200 points on a circle for each physical node (server).

This can suppress uneven distribution and minimize the cache distribution when the server is increasing or decreasing.

A user data map on a virtual node indicates that the user data is actually stored on the actual physical server represented by the virtual node.
Here is a diagram describing the virtual nodes that need to be added for each physical server.


Might

The x-axis represents the virtual node multiplier (scale) that needs to be extended for each physical server. The y-axis is the actual number of physical servers. It can be seen that when the number of physical servers is very small, a larger virtual node is required, whereas fewer nodes are required, and it can be seen from the graph that there are 10 physical servers. Almost identical to the need to add 100~200 virtual nodes for each server talent to achieve true load balancing.

Third, the Spymemcache source code to demonstrate the Virtual node application

1. There is a potential problem with the consistency hash algorithm described above:
(1), the node hash will be unevenly distributed on the ring, so that a large number of keys in the search for nodes, there will be key hit each node of the probability difference between the large, unable to achieve effective load balancing.
(2), if there are three nodes node1,node2,node3, distributed on the ring when the three nodes are very close, fall on the ring key to find the node, a large number of key clockwise is always assigned to Node2, while the other two nodes are found to be very small probability.

2, the solution of such problems can be:
Improve the hash algorithm, evenly distribute each node to the ring; [citation] uses the idea of a virtual node to allocate 100~200 points on a circle for each physical node (server).

This can suppress uneven distribution and minimize the cache distribution when the server is increasing or decreasing. The user data is mapped on the virtual node. Indicates that the user data is actually stored on the actual physical server represented by the virtual node.

While viewing the spy Memcached client. It is found that it uses a hash algorithm called Ketama. With the idea of virtual nodes. Solve the distributed problem of memcached.


3. Source Code Description

The client uses TreeMap to store all nodes, simulating a ring-shaped logical relationship.

In this loop, the nodes are preceded by a sequential relationship. So the TreeMap key must implement the comparator interface.


How does that node fit into this ring?

[HTML]View Plaincopyprint?

  1. protected void Setketamanodes (List<memcachednode> nodes) {
  2. TreeMap<Long, Memcachednode> newnodemap = New TreeMap < Long , Memcachednode >  ();
  3. int numreps= config. getnoderepetitions ();
  4. for (Memcachednode node:nodes) {
  5. Ketama does some special work with MD5 where it reuses chunks.
  6. if (hashalg = = Hashalgorithm.ketama_hash) {
  7. for (int i=0; I<numreps /4; i++) {
  8. Byte[] Digest=hashalgorithm. COMPUTEMD5 (Config.getkeyfornode (node, i));
  9. for (int h=0; h<4; h++) {
  10. Long k = ((long) (DIGEST[3+H*4]&0XFF) << 24 )  
  11. | ((long) (DIGEST[2+H*4]&0XFF) << )
  12. | ((long) (DIGEST[1+H*4]&0XFF) << 8)
  13. | (DIGEST[H*4]&0XFF);
  14. Newnodemap.put (k, node);
  15. GetLogger (). Debug ("Adding node%s in position%d", node, k);
  16. }
  17. }
  18. } else {
  19. for (int i=0; I<numreps; i++) {
  20. Newnodemap.put (Hashalg.hash (Config.getkeyfornode (node, i), node);
  21. }
  22. }
  23. }
  24. Assert newnodemap.size () = = Numreps * Nodes.size ();
  25. ketamanodes  =  ;   
 protected void Setketamanodes (list<memcachednode> nodes ) {Treemap<long, memcachednode> newnodemap = new Treemap<long, memcachednode> (); int numreps= Config.getnoderepetitions (); for (Memcachednode node:nodes) {//Ketama does some special work with MD5 where it reuses chu Nks.if (HashAlg = = Hashalgorithm.ketama_hash) {for (int i=0; i<numreps/4; i++) {byte[] digest= HASHALGORITHM.COMPUTEMD5 (Config.getkeyfornode (node, i)); for (int h=0;h<4;h++) {Long k = ((Long) (digest[3+h*4]& 0xFF) << 24) | ((long) (DIGEST[2+H*4]&0XFF) << 16) | ((long) (DIGEST[1+H*4]&0XFF) << 8) | (Digest[h*4]&0xff); Newnodemap.put (k, node); GetLogger (). Debug ("Adding node%s in position%d", node, k);}} else {for (int i=0; i<numreps; i++) {newnodemap.put (Hashalg.hash (Config.getkeyfornode (node, i), node);}}} Assert newnodemap.size () = = Numreps * Nodes.size (); ketamanodes = Newnodemap; 



The above process can be summarized as follows: Four virtual nodes are a group. The NAME,MD5 encoding of this set of virtual nodes is obtained by the Getkeyfornode method. Each virtual node corresponding to the MD5 code 16 bytes of 4, forming a Long value. As the only key in this virtual node in the ring. The 10th row K Why is a long type? This is because the long type implements the comparator interface.

After processing the distribution of the formal node on the ring, it is possible to start the game where key is looking for a node on the ring.
For each key still has to complete the above steps: Calculates the MD5, according to the MD5 byte array, through the Kemata hash algorithm obtains the key in this ring position.

[HTML]View Plaincopyprint?
  1. Memcachednode Getnodeforkey (long hash) {
  2. Final Memcachednode RV;
  3. if (!ketamanodes.containskey (hash)) {
  4. Java 1.6 Adds a Ceilingkey method, but I ' m still stuck in 1.5
  5. In a lot of places, so I ' m doing this myself.
  6. SortedMap<Long, Memcachednode> tailmap= Getketamanodes  (). Tailmap (hash);
  7. if (Tailmap.isempty ()) {
  8. Hash = Getketamanodes  (). Firstkey ();
  9. } else {
  10. Hash = Tailmap  . Firstkey ();
  11. }
  12. }
  13. RV = Getketamanodes  (). get (hash);
  14. return RV;
  15. }
Memcachednode Getnodeforkey (long hash) {final Memcachednode rv;if (!ketamanodes.containskey (hash)) {//Java 1.6 adds a CEI Lingkey method, but I ' m still stuck in 1.5//in a lot of places, so I ' m doing this myself. Sortedmap<long, Memcachednode> tailmap=getketamanodes (). Tailmap (hash); if (Tailmap.isempty ()) {hash= Getketamanodes (). Firstkey ();} else {Hash=tailmap.firstkey ();}} Rv=getketamanodes (). get (hash); return RV;}


The implementation of the above code is on the ring clockwise to find, did not find the first one, and then know the corresponding physical node.

Four, the application scenario analysis

1, Memcache Add method: Through the consistent hash algorithm to confirm the current client corresponding Cacheserver hash value and to store the data key hash for the corresponding, confirm cacheserver. Get connection for data storage

2, Memcache get method: Through the consistent hash algorithm to confirm the current client corresponding Cacheserver hash value and to extract the data hash value, and then confirm the stored cacheserver, get connection for data extraction

V. Summary

1, the consistent hash algorithm is only to help us reduce the number of machines in the cache cluster increase or decrease, the cache data can be minimal reconstruction. Only the number of servers in the cache cluster has changed. Must have a data hit problem

2, for the data distribution balance problem. Through the idea of virtual node to achieve a balanced distribution. Of course, the fewer cache server nodes we have, the more virtual nodes are needed to balance the load.

3. Our cacheclient doesn't maintain a map to record where each key is stored. Both the hash of the key and the hash of the cacheserver (perhaps IP can be used as a parameter) to calculate which node the current key should be stored on.

4. When our cache node crashes. We must lose some of the cache data. And new consistency matching calculations are performed based on the live cache server and key.

It is possible to rebuild some data that is not lost ...

5, as to the normal arrival data storage node, how to find the corresponding key data, that is the cache server itself, the internal algorithm implementation. No descriptive narrative is made here.


This is simply a demonstration of how the data is stored and how it is extracted.

Reprint: http://blog.csdn.net/kongqz/article/details/6695417

The use of consistent hash algorithm in memcached

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.