A distributed cache consistency hash algorithm __ algorithm

Source: Internet
Author: User
Tags hash

Basic Scenario

For example, if you have n cache server (hereafter referred to as cache), then how to map an object to n cache, you are likely to use a common method like the following to calculate the hash value of object, and then map evenly to the n cache;


Hash algorithm for conventional redundancy

Hash (key)% N

For the cluster cache of N cache servers, numbered 0-(N-1) first hash the key to be stored, and then use the hash value to the N to obtain a number in the cache server number interval, the current key is saved to this server.

Disadvantages:

With the increase of system access pressure, the cache system has to increase the corresponding speed and data carrying capacity of the cluster by increasing the machine node. Increase the machine means in accordance with the way of the hash, in the time of increasing the machine node, a large number of cache, cache data need to be re-established, or even the overall cache data migration, the moment will bring a very high system load on the DB, set the DB server downtime.


When designing a distributed cache system, the consistent hash algorithm can help us solve the problems.

The core point of distributed cache design: When designing a distributed caching system, we need to equalize the distribution of the key, and after adding the cache server, the cache will be migrated to a minimum.

The consistency hash algorithm mentioned here Ketama the practice is: Select the specific machine node is not only rely on the key to cache the hash itself, but the machine node itself is also a hash operation.


Consistent hash Algorithm scenario description (reproduced)

1. Hash machine node

First, the hash value of the Machine node is calculated (how to calculate the hash of the Machine node). IP can be used as a hash parameter. Of course there are other ways), and then distribute it to a ring in the 0~2^32 (clockwise distribution). As shown in the following illustration:

There are machines in the cluster: A, B, C, D, E five machines, through a certain hash algorithm we distribute it to the ring as shown above.


2. Access method

If there is a write cache request where the key value is K, the calculator hash value is hash (k), the hash (k) corresponds to a point in the graph –1 ring, if the point corresponding to a specific machine node is not mapped, then look clockwise until the first time to find the node with the mapped machine, The node is the target node that is determined, and if it exceeds the 2^32 still cannot find the node, hit the first machine node. For example, the value of Hash (K) is between A~b, then the hit machine node should be a B node (pictured above).


3, increase the processing of nodes

To add a machine F on the base of the original cluster, as shown in the figure –1, the process is as follows:

The hash value of the computer node that maps the machine to a node in the ring, as shown in the following diagram:

After adding the Machine node F, the access policy does not change, still according to (2) in the manner of access, when the cache is still unavoidable, the data that cannot be hit is the hash (K) in increasing the node before the data between c~f. Although there is still a hit problem caused by the increase of the node, but compared with the traditional method of hashing, the consistency hash has reduced the data to a minimum.

Consistent hashing minimizes the redistribution of hash keys. In addition, in order to achieve a better load balancing effect, often in a small number of servers need to increase the virtual node to ensure that the server can be evenly distributed on the ring. Because of the general hash method, the map location of the server is unevenly distributed. Using the idea of a virtual node, allocate 100~200 points on the circle for each physical node (server). This can suppress uneven distribution and minimize cache redistribution when the server is increasing or decreasing. The user data is mapped on a virtual node, which means that the user data is actually stored on the actual physical server represented by the virtual node.


Here is a diagram that describes the virtual nodes that need to be added for each physical server.

The x-axis represents a virtual node multiplier (scale) that needs to be scaled for each physical server, and the y-axis is the actual number of physical servers, and it can be seen that when the number of physical servers is very small, a larger virtual node is needed, and less nodes are required, as can be seen from the graph, when the physical server has 10 It is almost necessary to add 100~200 virtual nodes to each server to achieve a true load balancer.

[References]

http://blog.csdn.net/kongqz/article/details/6695417 Memcache's consistent hash algorithm uses

http://blog.csdn.net/sparkliang/article/details/5279393

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.