Strong consistency hash for Java version and strong consistency hash principle

Source: Internet
Author: User

Consistent Hash

In the distributed process, we spread the service to a number of nodes in order to improve the service through the collective force. However, for a client, which node is the service? Or what tasks did he assign to a node?

Strong hash

Given that a single server cannot be hosted, a distributed architecture is used, the initial algorithm is hash () mod n, and hash () usually takes the user id,n as the number of nodes. This approach is easy to implement and meets operational requirements. The disadvantage is that the system cannot recover automatically when a single point fails. It is also not possible to dynamically add nodes.

Weak hash

In order to solve a single point of failure, use hash() mod (n/m) ,

This allows any user to have a m server alternative, which can be selected randomly by the client.

Because users between different servers need to interact with each other, all servers need to know exactly where the user is.

Therefore, the user location is saved to memcached. When a fault occurs, the client can automatically switch to the corresponding backup, due to the other 1 before switching the session of the user, so the client needs to re-login.

    • Benefits

His advantage over strong hashing is that it solves a single point of problem.

    • Disadvantages

However, there are the following problems: Unbalanced load, especially after a single failure of the remaining one will be too much pressure, can not dynamically delete nodes, node failure requires client re-login

Consistent hash algorithm

The consistency hash algorithm proposes four definitions for determining the good or bad hash algorithm in a dynamically changing Cache environment:

Balance (Balance)

Balance means that the result of the hash can be distributed to all buffers as much as possible, thus allowing all buffer space to be exploited. Many hashing algorithms can satisfy this condition.

Monotonicity (monotonicity)

Monotonicity refers to the addition of a new buffer to the system if some content has been allocated to the corresponding buffer by hashing. The result of the hash should be to ensure that the original allocated content can be mapped to an existing or new buffer without being mapped to another buffer in the old buffer collection.

Dispersion (Spread)

In a distributed environment, the terminal may not see all of the buffers, but can see only a subset of them.

The end result is that the same content is mapped to different buffers by different endpoints when the terminal wants the content to be mapped to buffering through a hashing process, because the buffer range seen by different terminals may be different, resulting in inconsistent results for the hash.

This is obviously something that should be avoided because it causes the same content to be stored in different buffers, reducing the efficiency of the system's storage. The definition of dispersion is the severity of the above-mentioned situation. A good hashing algorithm should be able to avoid inconsistencies as far as possible, that is, to minimize dispersion.

Payload (load)

The load problem is actually looking at the dispersion problem from another perspective. Since different terminals may map the same content to different buffers, it is possible for a particular buffer to be mapped to different content by different users.

As with dispersion, this situation should also be avoided, so a good hashing algorithm should be able to minimize the buffering load.

The common hashing algorithm (also called hard hashing) uses a simple modulo method to hash the machine, which can achieve satisfying results without changing the cache environment, but when the cache environment changes dynamically,
This mode of static modulus obviously does not satisfy the requirement of monotonicity (when one machine is added or reduced, almost all of the stored content is re-hashed into another buffer).

Implementing logic for code implementation

There are many concrete implementations of the consistent hashing algorithm, including Chord algorithm, KAD algorithm and so on, the above algorithm is more complicated.

This paper introduces the basic realization principle of a consistent hashing algorithm which is widely circulated on the Internet, and interested students can search for more detailed information according to the links above or to the Internet.

The basic implementation of the consistent hashing algorithm is to map the machine nodes and key values to a ring based on the same hash algorithm 0~2^32 .

When a request to write to the cache arrives, calculate the hash value of the Key value K (k), if the value corresponds exactly to the hash value of a previous machine node, it is written directly to the machine node,
If there is no corresponding machine node, look for the next node clockwise, write, if more than the 2^32 corresponding node has not been found, then starting from 0 to find (because it is a ring structure).

1 is shown below:

The hash value of Key K in Figure 1 is between A and B, so K is handled by Node B.

In addition to the specific machine mapping, you can also map an entity node to multiple virtual nodes depending on the processing power.

After the hash of the consistent hashing algorithm, when a new machine is added, it will only affect the storage of a single machine.

For example, when the hash of the newly added node H is between B and C, some data originally processed by C may be moved to H processing,
The processing of all other nodes will remain unchanged, thus showing good monotonicity.

If you delete a machine, such as deleting a C node, the data that was processed by C is moved to the D node, and the other nodes remain in the same processing condition.

The same hashing algorithm is used to hash and buffer the content of the machine node, so the dispersion and load are also reduced.

By introducing virtual nodes, the balance is greatly improved.

Implementation code

Consitent-hashing

Original address

Consitent-hashing

Strong consistency hash for Java version and strong consistency hash principle

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.