Hash algorithm series-application (Server Load balancer)

Source: Internet
Author: User
Document directory
  • 3.1 ring hash space
  • 3.2 map objects to the hash space
  • 3.3 map the cache to the hash space
  • 3.4 map objects to the cache
  • 3.5 check cache changes

Today, the number of website users is huge, and the times when one server is used to pack the world are gone forever. There is a problem with multiple servers. How can we switch access users to different servers, what about the number of requests received by each server? This is the problem to be solved by consistent hash algorithms. Consistent hash algorithms are implemented in the load server (ngnix, haproxy, etc.) and the K/V cache system memcache. There are many such articles, so I will extract one for you to learn.

 

Java implements a simple consistent hash algorithm: (The implementation of the spy memcached client is similar to that of the client. The treemap simulation loop is used)

import java.util.Collection;import java.util.SortedMap;import java.util.TreeMap;public class ConsistentHash<T> { private final HashFunction hashFunction; private final int numberOfReplicas; private final SortedMap<Integer, T> circle = new TreeMap<Integer, T>(); public ConsistentHash(HashFunction hashFunction, int numberOfReplicas,     Collection<T> nodes) {   this.hashFunction = hashFunction;   this.numberOfReplicas = numberOfReplicas;   for (T node : nodes) {     add(node);   } } public void add(T node) {   for (int i = 0; i < numberOfReplicas; i++) {     circle.put(hashFunction.hash(node.toString() + i), node);   } } public void remove(T node) {   for (int i = 0; i < numberOfReplicas; i++) {     circle.remove(hashFunction.hash(node.toString() + i));   } } public T get(Object key) {   if (circle.isEmpty()) {     return null;   }   int hash = hashFunction.hash(key);   if (!circle.containsKey(hash)) {     SortedMap<Integer, T> tailMap = circle.tailMap(hash);     hash = tailMap.isEmpty() ? circle.firstKey() : tailMap.firstKey();   }   return circle.get(hash); }}

Now I have copied too many articles. I don't know if the article is original. I just copied it here.

Original article: http://blog.csdn.net/sparkliang/article/details/5279393

Consistenthashing)

Zhang Liang

The consistent hashing algorithm was first published in 1997.Consistenthashing and random treesHas been proposed, and is widely used in the cache system;

1. Basic scenarios

For example, if you have n cache servers (hereinafter referred to as cache), how can you map an object to N caches, you are likely to use a method similar to the following to calculate the object's hash value, and then uniformly map it to N caches;

Hash (object) % N

If everything runs normally, consider the following two situations;

1. A Cache Server m is down (this must be taken into account in actual applications), so that all objects mapped to the cache M will become invalid. What should I do, the cache M needs to be removed from the cache, when the cache is a N-1, The ing formula is changed to hash (object) % (N-1 );

2. the cache needs to be added due to increasing access volume. In this case, the cache is n + 1 and the ing formula is changed to hash (object) % (n + 1 );

What does 1 and 2 mean? This means that almost all of the cache suddenly becomes invalid. For servers, this is a disaster, and flood-like access will directly rush to the backend server;

The third problem is that, as the hardware capability is getting stronger, you may want to make the nodes added later more active. Obviously, the hash algorithm above cannot do the same.

Is there any way to change this situation? This is consistent hashing...

2. Hash Algorithm and Monotonicity

A metric of the hash algorithm is monotonicity, which is defined as follows:

Monotonicity means that if some content is already allocated to the corresponding buffer through hash, new buffering is added to the system. The hash result should ensure that the original allocated content can be mapped to the new buffer instead of other buffers in the old buffer set.

It is easy to see that the preceding simple hash algorithm Hash (object) % N is difficult to meet the monotonic requirement.

3 principle of consistent hashing algorithm

Consistenthashing is a hash algorithm. To put it simply, when a cache is removed/added, it can change the existing key ing relationship as little as possible to satisfy the monotonic requirement.

The following describes the basic principles of the consistent hashing algorithm in five steps.

3.1 ring hash space

Generally, the hash algorithm maps the value to a 32-bit key value, that is, 0 ~ 2 ^ 32-1 refers to the numerical space to the power. We can think of this space as a ring connected to the first (0) end (2 ^ 32-1), as shown in figure 1 below.

Figure 1 circular hash space

3.2 map objects to the hash space

Next we will consider four objects: object1 ~ Object4: Distribution of hash key values calculated by the hash function on the Ring 2.

Hash (object1) = key1;

... ...

Hash (object4) = key4;

Figure 2 key value distribution of four objects

3.3 map the cache to the hash space

The basic idea of consistenthashing is to map the object and cache to the same hash value space and use the same hash algorithm.

Assume that there are currently three caches A, B, and C in total. The ing result is as follows: 3. They are arranged in the hash space with corresponding hash values.

Hash (Cache A) = Key;

... ...

Hash (Cache c) = Key C;

Figure 3 Distribution of cache and object key values

 

Speaking of this, by the way, we can refer to the cache hash calculation. Generally, the IP address or machine name of the cache machine can be used as the hash input.

3.4 map objects to the cache

Now both the cache and the object have been mapped to the hash value space through the same hash algorithm. The next thing to consider is how to map the object to the cache.

In this circular space, if you start from the key value of the object clockwise until a cache is met, the object is stored in the cache, because the hash values of objects and caches are fixed, the cache must be unique and definite. So I can find the ing method between the object and the cache ?!

Continue with the above example (see figure 3). Then, based on the above method, the object object1 will be stored on Cache A; object2 and object3 will correspond to cache C; object4 corresponds to cache B;

3.5 check cache changes

As mentioned above, the biggest problem brought about by the hash and then remainder method is that it cannot satisfy the monotonicity. When the cache changes, the cache will become invalid, resulting in a huge impact on the backend server, now let's analyze and analyze the consistenthashing algorithm.

3.5.1 remove cache

Consider the assumption that cache B is suspended. According to the ing method mentioned above, the affected objects will only traverse the objects along cache B counterclockwise until the next cache (Cache C, that is, the objects mapped to cache B.

Therefore, you only need to change the object object4 and remap it to cache C. See figure 4.

Figure 4 cache ing after cache B is removed

3.5.2 add Cache

Consider adding a new cache D. in this Circular hash space, the cache D is mapped between the object object2 and object3. At this time, the affected objects will only traverse the objects along the cache D counterclockwise until the next cache (Cache B) (they are also part of the objects originally mapped to the cache C ), remap these objects to cache D.

 

Therefore, you only need to change the object object2 to remap it to cache D. See figure 5.

Figure 5 ing relationship after cache D is added

4. virtual nodes

Another metric of the hash algorithm is balance, which is defined as follows:

Balance

Balance means that the hash results can be distributed to all the buffers as much as possible, so that all the buffer spaces can be used.

The hash algorithm does not guarantee absolute balance. If the cache is small, objects cannot be evenly mapped to the cache. For example, in the preceding example, when only cache A and cache C are deployed, cache a only stores object1 among the four objects, while cache C stores object2, object3, and object4; the distribution is unbalanced.

To solve this problem, consistent hashing introduces the concept of "virtual node", which can be defined as follows:

"Virtual node" is a replica of the actual node in the hash space (replica), an actual node corresponds to several "virtual nodes ", the corresponding number also becomes "Number of copies", and "virtual nodes" are arranged in hash values in the hash space.

We still use the deployment of only cache A and cache C as an example. As shown in figure 4, the cache distribution is uneven. Now we introduce virtual nodes and set "Number of copies" to 2, which means there will be four "virtual nodes" in total. cache A1 and cache A2 represent cache; cache C1 and cache C2 represent cache C. For an ideal scenario, see Figure 6.

Figure 6 ing relationship after "virtual node" is introduced

 

In this case, the ing between the object and the "virtual node" is as follows:

Objec1-> cache A2; objec2-> cache A1; objec3-> cache C1; objec4-> cache C2;

Therefore, both object1 and object2 are mapped to cache A, while object3 and object4 are mapped to cache C. The balance is greatly improved.

After "virtual nodes" are introduced, the ing relationships are converted from {Object> node} to {Object> virtual node }. The ing relationship 7 shows when querying the cache where the object is located.

Figure 7 cache of the query object

 

The "virtual node" hash calculation can be performed by adding the IP address of the corresponding node with a digital suffix. For example, assume that the IP address of cache A is 2018.14.241.

Before introducing "virtual nodes", calculate the hash value of cachea:

Hash ("2018.14.241 ");

After "virtual node" is introduced, calculate the hash values of cache A1 and cache A2 at "virtual node:

Hash ("2018.14.241 #1"); // cache A1

Hash ("2018.14.241 #2"); // cache A2

5 Summary

These are the basic principles of consistenthashing. Theoretical analysis such as distribution should be complex, but it is generally not used.

Http://weblogs.java.net/blog/2007/11/27/consistent-hashingthere is a Java example for reference.

Bytes.

C language http://www.codeproject.com/KB/recipes/lib-conhash.aspx

 

References:

Http://portal.acm.org/citation.cfm? Id = 258660

Http://en.wikipedia.org/wiki/Consistent_hashing

Http://www.spiteful.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/

Http://weblogs.java.net/blog/2007/11/27/consistent-hashing

Http://tech.idv2.com/2008/07/24/memcached-004/

Http://blog.csdn.net/mayongzhan/archive/2009/06/25/4298834.aspx

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.