Consistent Hash Algorithm

Source: Internet
Author: User

) ">

Consistent hash Algorithm (Consistent hashing)

Zhang Liang

The consistent hashing algorithm was first published in 1997.Consistent hashing and random treesHas been proposed, and is widely used in the cache system;

1. Basic scenarios

For example, if you have n cache servers (hereinafter referred to as cache), how can you map an object to N caches, you are likely to use a method similar to the following to calculate the object's hash value, and then uniformly map it to N caches;

Hash (object) % N

If everything runs normally, consider the following two situations;

1. A Cache Server m is down (this must be taken into account in actual applications), so that all objects mapped to the cache M will become invalid. What should I do, the cache M needs to be removed from the cache, when the cache is a N-1, The ing formula is changed to hash (object) % (N-1 );

2. the cache needs to be added due to increasing access volume. In this case, the cache is n + 1 and the ing formula is changed to hash (object) % (n + 1 );

What does 1 and 2 mean? This means that almost all of the cache suddenly becomes invalid. For servers, this is a disaster, and flood-like access will directly rush to the backend server;

The third problem is that, as the hardware capability is getting stronger, you may want to make the nodes added later more active. Obviously, the hash algorithm above cannot do the same.

Is there any way to change this situation? This is consistent hashing...

2. Hash Algorithm and Monotonicity

A metric of the hash algorithm is monotonicity, which is defined as follows:

Monotonicity means that if some content is already allocated to the corresponding buffer through hash, new buffering is added to the system. The hash result should ensure that the original allocated content can be mapped to the new buffer instead of other buffers in the old buffer set.

It is easy to see that the preceding simple hash algorithm Hash (object) % N is difficult to meet the monotonic requirement.

3 principle of consistent hashing algorithm

Consistent hashing is a hash algorithm. To put it simply, when a cache is removed/added, it can change the existing key ing relationship as little as possible to meet the monotonicity requirements.

The following describes the basic principles of the consistent hashing algorithm in five steps.

3.1 ring hash space

Generally, the hash algorithm maps the value to a 32-bit key value, that is, 0 ~ 2 ^ 32-1 Power numeric space; we can think of this space as a ring connected to the first (0) end (2 ^ 32-1, as shown in figure 1 below.

Figure 1 circular hash space

3.2 map objects to the hash space

Next we will consider four objects: object1 ~ Object4: Distribution of hash key values calculated by the hash function on the Ring 2.

Hash (object1) = key1;

... ...

Hash (object4) = key4;

Figure 2 key value distribution of four objects

3.3 map the cache to the hash space

The basic idea of consistent hashing is to map the object and cache to the same hash value space and use the same hash algorithm.

Assume that there are currently three caches A, B, and C in total. The ing result is as follows: 3. They are arranged in the hash space with corresponding hash values.

Hash (Cache A) = Key;

... ...

Hash (Cache c) = Key C;

Figure 3 Distribution of cache and object key values

 

Speaking of this, by the way, we can refer to the cache hash calculation. Generally, the IP address or machine name of the cache machine can be used as the hash input.

3.4 map objects to the cache

Now both the cache and the object have been mapped to the hash value space through the same hash algorithm. The next thing to consider is how to map the object to the cache.

In this circular space, if you start from the key value of the object clockwise until a cache is met, the object is stored in the cache, because the hash values of objects and caches are fixed, the cache must be unique and definite. So I can find the ing method between the object and the cache ?!

Continue with the above example (see figure 3). Then, based on the above method, the object object1 will be stored on Cache A; object2 and object3 will correspond to cache C; object4 corresponds to cache B;

3.5 check cache changes

As mentioned above, the biggest problem brought about by the hash and then remainder method is that it cannot satisfy the monotonicity. When the cache changes, the cache will become invalid, resulting in a huge impact on the backend server, now let's analyze and analyze the consistent hashing algorithm.

3.5.1 remove cache

Consider the assumption that cache B is suspended. According to the ing method mentioned above, the affected objects will only traverse the objects along cache B counterclockwise until the next cache (Cache C, that is, the objects mapped to cache B.

Therefore, you only need to change the object object4 and remap it to cache C. See figure 4.

Figure 4 cache ing after cache B is removed

3.5.2 add Cache

Consider adding a new cache D. in this Circular hash space, the cache D is mapped between the object object2 and object3. At this time, the affected objects will only traverse the objects along the cache D counterclockwise until the next cache (Cache B) (they are also part of the objects originally mapped to the cache C ), remap these objects to cache D.

 

Therefore, you only need to change the object object2 to remap it to cache D. See figure 5.

Figure 5 ing relationship after cache D is added

4. virtual nodes

Another metric of the hash algorithm is balance, which is defined as follows:

Balance

Balance means that the hash results can be distributed to all the buffers as much as possible, so that all the buffer spaces can be used.

The hash algorithm does not guarantee absolute balance. If the cache is small, objects cannot be evenly mapped to the cache. For example, in the preceding example, when only cache A and cache C are deployed, cache a only stores object1 among the four objects, while cache C stores object2, object3, and object4; the distribution is unbalanced.

To solve this problem, consistent hashing introduces the concept of "virtual node", which can be defined as follows:

"Virtual node" is a replica of the actual node in the hash space (replica), an actual node corresponds to several "virtual nodes ", the corresponding number also becomes "Number of copies", and "virtual nodes" are arranged in hash values in the hash space.

We still use the deployment of only cache A and cache C as an example. As shown in figure 4, the cache distribution is uneven. Now we introduce virtual nodes and set "Number of copies" to 2, which means there will be four "virtual nodes" in total. cache A1 and cache A2 represent cache; cache C1 and cache C2 represent cache C. For an ideal scenario, see Figure 6.

Figure 6 ing relationship after "virtual node" is introduced

 

In this case, the ing between the object and the "virtual node" is as follows:

Objec1-> cache A2; objec2-> cache A1; objec3-> cache C1; objec4-> cache C2;

Therefore, both object1 and object2 are mapped to cache A, while object3 and object4 are mapped to cache C. The balance is greatly improved.

After "virtual nodes" are introduced, the ing relationships are converted from {Object> node} to {Object> virtual node }. The ing relationship 7 shows when querying the cache where the object is located.

Figure 7 cache of the query object

 

For "virtual node" hash calculation, the IP address of the corresponding node can be added with a digital suffix. For example, assume that the IP address of cache A is 2018.14.241.

Calculate the hash value of cache a before introducing "virtual nodes:

Hash ("2018.14.241 ");

After "virtual node" is introduced, calculate the hash values of cache A1 and cache A2 at "virtual node:

Hash ("2018.14.241 #1"); // cache A1

Hash ("2018.14.241 #2"); // cache A2

5 Summary

The basic principles of consistent hashing are as follows. Theoretical analysis such as distribution should be complex, but it is generally not used.

There is a Java version example above the http://weblogs.java.net/blog/2007/11/27/consistent-hashing for reference.

The http://blog.csdn.net/mayongzhan/archive/2009/06/25/4298834.aspx reprinted the implementation of a PHP versionCode.

C language http://www.codeproject.com/KB/recipes/lib-conhash.aspx

 

References:

Http://portal.acm.org/citation.cfm? Id = 258660

Http://en.wikipedia.org/wiki/Consistent_hashing

Http://www.spiteful.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/

Http://weblogs.java.net/blog/2007/11/27/consistent-hashing

Http://tech.idv2.com/2008/07/24/memcached-004/

Http://blog.csdn.net/mayongzhan/archive/2009/06/25/4298834.aspx

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.