Consistency hashing algorithm and its PHP implementation detailed parsing _php techniques

Source: Internet
Author: User
Tags memcached modulus value store

There are many algorithms available for load Balancing in server load equalization, including: round Robin (Round Robin), hash algorithm (hash), least join algorithm (least Connection), Response speed algorithm (Response time), Weighted method (weighted) and so on. The hash algorithm is the most common algorithm.

A typical scenario is that there are n servers that provide caching services, need to load balance servers, distribute requests evenly to each server, and each machine is responsible for 1/n services.

The common algorithm is to get the remainder of the hash result (hash () mod N ): For machine numbering from 0 to N-1, according to the custom hash () algorithm, the hash () value of each request is modulo n, the remainder I is obtained, and the request is distributed to the machine numbered I. But this algorithm has a fatal problem, if a machine down, then the request should fall on the machine can not get the correct processing, then need to remove the server from the algorithm removed, this time there will be (N-1)/N of the server's cached data needs to be recalculated; If you add a new machine, there will be N/( N+1), the cached data for the server needs to be recalculated. For systems, this is usually an unacceptable bump (because that means a lot of cache failures or data needs to be transferred). So how do you design a load-balancing strategy to make the affected requests as few as possible?

Consistent hashing algorithm is used in memcached, Key-value Store, Bittorrent DHT and LVs, which can be said consistent hashing is the preferred algorithm for distributed system load balancing.

1, consistent hashing algorithm description

The following example is illustrated by the Consisten hashing algorithm in memcached.
As the result of the hash algorithm is generally unsigned int, so the result of the hash function should be evenly distributed between [0,232-1], if we put a ring with 232 points for uniform cutting, first according to the hash (key) function to calculate the server (node) hash value, and distributes it to the 0~232 circle.

Use the same hash (key) function to find the hash value of the key that needs to store the data and map it to the circle. Then start looking clockwise from where the data maps to, saving the data to the first server (node) found.

schematic diagram of consistent hashing

When a new node is added, the data of the first node that is added counterclockwise to the new node on the ring is affected. When a node is deleted, only the data of the first node in the clockwise direction on the ring is affected, so the hash value bump problem caused by the new node and the deletion node in the load balancing is solved well by consistent hashing.

Consistent hashing Add server diagram

Virtual nodes: The reason for introducing virtual nodes is because in the case of fewer servers (for example, only 3 servers), the hash value of a node is not evenly distributed on the ring (sparse), There is still a problem of uneven load on each node. A virtual node can be considered a replica of the actual node (replicas), essentially the same as the actual node (key is not the same). After the virtual node is introduced, the number of each actual server (node) is enlarged and its hash (key) value is calculated evenly to the ring by a certain proportion (for example, 200 times times). When load balancing is carried out, the hash value falling to the virtual node actually falls on the actual node. Because all the actual nodes are replicated in the same proportion to the virtual node, the problem of the uniform distribution of the hash value on the ring is solved with a small number of nodes.

Effect of virtual node on consistent hashing results

As can be seen from the above figure, in the case of 10 nodes, the virtual node of each actual node is 100-200 times the actual node, the result is very balanced.

The words in the 3rd paragraph are: "But such an algorithm has a fatal problem, if a machine down, then the request should fall on the machine can not be properly processed, then need to be removed from the algorithm from the solution, this time there will be (N-1)/n Server cache data needs re-calculation ; "

Why is (N-1)/n? Explained as follows:

For example, there are 3 machines, the hash value of 1-6 on the 3 table distribution is:
Host 1: 1 4
Host 2: 2 5
Host 3:3 6
If you hang one, only two left, modulus 2, then the distribution becomes:
Host 1: 1 3 5
Host 2: 2 4 6

As you can see, there are only 2 data locations that are unchanged: 1, 2, the location of the change of 4, accounted for a total of 6 data ratio is 4/6 = 2/3 in this case, the affected data is too much, it is bound to too much data needs to be reloaded from DB to cache, seriously affecting performance

"Consistent hashing"
the above mentioned hash modulus, the relatively small modulus, is generally the number of load, and consistent hashing is the essence of the larger modulus, for   2 is 32 times minus 1, that is, one of the largest 32-bit integers. Then, you can calmly arrange the data-oriented, the diagram is quite intuitive.
The following sections are a PHP implementation of a consistent hashing algorithm. Click to download

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.