Time: 2014.07.17
Location: Second floor of the base
Bytes ----------------------------------------------------------------------------------------
I. Why do we need a consistent hash algorithm to consider a scenario: Server Load balancer. There are n servers, such as N caches, selection of cache numbers, and Object
The matching policy can meet the requirements of the guaranteed function. If we take the following simple hash ing relationship:
Hash (object) mod n
It seems that such a system works normally, but the considerations may be as follows:
Problem 1: One day we need to add a server. In this case, we need to change the hash ing relationship:
Hash (object) mod (n + 1)
Question 2: One day we need to delete a server. In this case, we need to change the hash ing relationship:
Hash (object) mod (n-1)
Question 3: How to make server load distribution even
Now we see the trouble. due to changes in the hash relationship, almost all objects will be hashed to the new location, that is, mapped to the new server. This is a disaster, so we need consistent hash to improve this situation.
Bytes ----------------------------------------------------------------------------------------
Ii. Consistent hash consistent hash ensures that when any server is added to or deleted from the system, only the associated limited objects need to be rematched, consistent hash prevents matching between objects and servers to the maximum extent.
Bytes ----------------------------------------------------------------------------------------
Iii. hash space (hash space) generally, the hash function maps the object to a single-digit value. The hash value range is [0 ~ 2 ^ 32-1], as shown in, we combine the first and end of the hash value domain into a ring, which is also called a ring hash space.
Hash Space
Bytes ----------------------------------------------------------------------------------------
4. Map an object to a hash space. Assume that there are four objects: object1 ~ Object4. Now we use the hash function to obtain their respective key values and map them to the ring hash space, as shown in:
Hash (object1) = key1;
Hash (object2) = key2;
Hash (object3) = key3;
Hash (object4) = key4;
Map an object to a hash space
Bytes ----------------------------------------------------------------------------------------
5. We use the same hash function to map the cache to the hash space, and continue to map the server to the circular hash space,
Suppose we have three servers, A, B, C, which are hashed as follows:
Hash (cachea) = Keya;
Hash (cacheb) = keyb;
Hash (cachec) = keyc;
Map the cache to the hash space in the same way.
Bytes ----------------------------------------------------------------------------------------
6. After the previous steps of matching the object and cache, the object and cache have been successfully mapped to the ring hash space. Next, we will determine how objects maps to the cache: the policy we adopt is to clockwise move the object until the first cache is found. If the cache is available, match the object with the cache; otherwise, search for the next cache. Based on the above principles, the matching result is as follows:
Object1 --> cachea
Object2 --> cachec
Object3 --> cachec
Object4 --> cacheb
Bytes ----------------------------------------------------------------------------------------
7. add or delete cache: 1. when a cache crash is removed from the system, since object4 is mapped to cacheb, now cacheb will be removed, so now object4 has to reproduce and update this ing, we only need to find the next available cache in a clockwise direction. Here, it is cachec, instead of having to change the ing relationship.
Cacheb crash
When cached is added between object2 and object3, objects between B and D need to be re-mapped. Here object2 will be bound to the newly added cached. For example:
Add cached
Bytes ----------------------------------------------------------------------------------------
8. The above situation of virtual nodes can effectively solve the problem of a major impact on the entire system when adding or deleting server nodes. However, there is still a problem, that is, if the circular hash space has less cache, the object deployment will not be so even. So we introduced the concept of virtual nodes. , Which can better improve this shortcoming. A virtual node is a copy of the GEO cache point in the ring hash space. Each cache is associated with several virtual nodes in the ring. When we add a node, this means that we have actually added several such virtual nodes to the ring space. When we delete a cache, we will also remove the circular space.
All virtual nodes related to it. Continue to consider the above example. Now the system has cacha and cachec, and virtual nodes are introduced. Assuming there are two copies each, there are four virtual nodes in the ring space. Cachea1 and cachea2 represent cachea, and cachec1 and cachec2 represent C. For example, the ing from object to virtual node is:
Object1 --> cachea2;
Object2 --> cachea1;
Object3 --> cachec1;
Object4 --> cachec2
In this way, the distribution will appear relatively even. For example:
9. The application of the consistent hash algorithm is finally an example of practical application.
Problem description: for example, a mobile phone friend has n servers. To facilitate user access, data is cached on the server. Therefore, it is best to keep the same server for each access.
The existing method is to obtain the requested Server Based on serveripindex [qqnum % N]. This method is convenient for users to be assigned to different servers. However, if a server is dead, n is changed to n-1, so serveripindex [qqnum % N] And serveripindex [qqnum % (n-1)] are basically different, therefore, most users' requests are forwarded to other servers, resulting in a large number of access errors.
Q: How to improve or change the method:
(1) When a server dies, it will not cause access errors in large areas,
(2) The original access is basically stuck on the same server;
(3) Consider Server Load balancer as much as possible.
Obviously, the traditional method has been given, I .e., the method of using the model remainder: The method is very simple, but there are many problems that do not meet the needs. So we consider the consistent hash algorithm. As described above.
Other application scenarios:
There are many Server Load balancer algorithms available for selection, including round robin, hash, and least connection) response time and weighted. Hash algorithms are the most commonly used algorithms.
The most typical application scenario is that there are n servers providing the cache service. You need to perform load balancing on the servers and evenly distribute requests to each server, each machine is responsible for 1/N services.
The common algorithm is to obtain the remainder of the hash result (Hash () mod n): To machine number from 0 to N-1, according to the custom hash () algorithm, perform n modulo on the hash () value of each request to obtain the remainder I, and then distribute the request to the machine numbered I. However, such an algorithm has a fatal problem. If a machine goes down, requests that fall into the machine cannot be processed correctly, in this case, the server to be removed from the algorithm, there will be (N-1)/n of the server cache data needs to be re-computed; if a new machine, the cache data of a server with N/(n + 1) needs to be recalculated. This is usually unacceptable for Systems (because it means that a large amount of cache is invalid or data needs to be transferred ). So how to design a load balancing policy to minimize the number of affected requests?
The consistent hashing algorithm is used in memcached, key-Value Store, BitTorrent DHT, and LVS. It can be said that consistent hashing is the preferred algorithm for distributed system load balancing.