Application of consistent Hash algorithm in Memcached

Source: Internet
Author: User

Preface

We should all know that Memcached can only be implemented on the client to implement distributed computing. Currently, consistent hash algorithms are widely used. the conventional method is to calculate the remainder of the hash value of the server and the total number of servers, that is, hash % N. The disadvantage of this method is that when the server is increased or decreased, there will be a large number of caches that need to be re-allocated and will result in uneven cache allocation (there may be many allocated by one server, but few others ).

Today, we share a consistent hash algorithm called ketama, which effectively suppresses uneven cache distribution through the concept of virtual nodes and different cache allocation rules, and minimize the redistribution of cache when servers increase or decrease.

Implementation

Suppose we now have N Memcached servers. If we use unified rules to perform the Set and Get operations on memcached. objects with different keys are evenly distributed and stored on these servers. The Get operation also extracts objects from the corresponding Server according to the same rules, in this way, isn't each Server a whole?

What is a rule?

As shown in, we now have five Memcached servers (A, B, C, D, E). we concatenate them to form A ring, each Server represents a point on the ring, and each point has a unique Hash value. There are 2 ^ 32 points on the ring.


How can we determine the specific distribution of each Server? Here, we use the "Ketama" Hash algorithm to calculate the Hash value of each Server. After obtaining the Hash value, we can correspond to the point on the ring. (the IP address of the Server can be used as the Key of the Hash algorithm .)

The advantage is that when I add Server F, then, I only need to re-allocate the hash value of the object between C and F from the original D to F. The cache on other servers does not need to be re-allocated, in addition, the new Server can also help buffer the pressure on other servers in a timely manner.


So far, we have solved the problem that a large number of caches need to be re-allocated when adding or removing servers. How can we solve the problem of uneven cache allocation? Because our server only occupies six points on the ring, and there are a total of 2 ^ 32 points on the ring, which can easily lead to many hot spots on a server, there are few hot spots on a certain platform.

The concept of "virtual node" solves the problem of load imbalance. by dividing each physical Server into N virtual Server nodes (N is usually determined based on the number of physical servers. Here, a better threshold is 250 ). in this way, each physical Server actually corresponds to N virtual nodes. when there are more storage points, the load of each Server needs to be balanced. just like the subway station exit, the more exits, the fewer congested each exit.

Code implementation:

Dictionary <,> hostDictionary = Dictionary <,>
         [] ketamaHashKeys =
         [] realHostArr =
          VirtualNodeNum = KetamaVirtualNodeInit (.realHostArr =
         
         = Dictionary <,> <> hostKeys = List <> (realHostArr == || realHostArr.Length == Exception ((i =; i <realHostArr.Length; i ++ (j =; j <VirtualNodeNum; j ++ [] nameBytes = Encoding .UTF8.GetBytes (.Format (
                     hashKey = BitConverter.ToUInt32 (KetamaHash (). ComputeHash (nameBytes), Exception (=
 

Assignment rules of consistent hash algorithm
At this point, we already know the hash values of all virtual nodes. Now let's see how to store it when we get an object, or how to get out the object when we get the key of an object.

       When setting an object, first use the key of the object as the key of the "Ketama" algorithm, and after calculating the hash value, we need to do the following steps.

       1: First check whether there is any Hash value equal to the current object in the virtual node, if there is, directly store the object in the node with the same Hash value, and the subsequent steps will not continue.

       2: If not, find the first node that has a larger hash value than the current object, (the hash values of the nodes are sorted in ascending order, and the corresponding circles on the ring are arranged clockwise), that is, the closest node to the object, Store the object in this node.

       3: If no Server with a larger Hash value than the object is found, prove that the Hash value of the object is between the last node and the first node, that is, between E and A on the ring. In this case Directly store the object in the first node, namely A.

  Code:  

      
         
         
         
          GetHostByHashKey ([] bytes = hash = BitConverter.ToUInt32 (KetamaHash (). ComputeHash (bytes),
             i =
             (i <
                i = ~
                 (i> ==
            
 

Get an object, also through the "Ketama" algorithm to calculate the Hash value, and then look for nodes like the Set process, and directly remove the object after finding it.

So what does this "Ketama" look like, let's take a look at the code implementation.

     
      
     
         FNV_prime = offset_basis = = = HashCore ([] array, ibStart, length = ibStart + (i = ibStart; i <length; i ++ = (hash * FNV_prime) ^ + = hash << ^ = hash >> + = hash << ^ = hash >> + = hash <<
Test performance
Finally, I compared the algorithm I wrote with reference to BeitMemcached and the old generation (Discuz! Dai Zhenjun) with SPYMemcached.

The source code is downloaded later.

Results: The time to find 5W keys is more than 100 times faster than the old generation version, but it is worse in load balancing.

Test Data:

1: Real servers are all 5

       2: Randomly generate 5W string keys (the generation method directly takes the old generation)

       3: There are 250 virtual nodes

       My version:

Old generation version:


References
BeitMemcached source code

Old generation: C # implementation of the consistent hash algorithm (KetamaHash)

Consistent Hashing

 

Test code download: Memcached-ketama

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.