Two optimization schemes of consistent hashing algorithm __ algorithm

Source: Internet
Author: User
Brief Introduction

Last blog I briefly introduced the basic idea of the consistent hashing algorithm. However, the consistent hashing algorithm is also accompanied by the emergence of a new problem, that is, when a server node is dead, its task will be assigned to its next server node, then this is against the distributed system needs to meet the requirements of balance. problem Avalanche Effect

Some of the data on the server will be accessed more frequently than other data, so the data is called hotspot data, and the server that hosts the hotspot data on the distributed server is more than the other servers. When the amount of access to hotspot data exceeds the server's tolerance, the server hangs.
According to the consistent hashing algorithm, the server's data will be hosted on the next server, and the next server, of course, will not be able to afford such a large request, and will hang it, and then the next server will be hung up until the end of the entire server is dead. This is an avalanche . Optimization Scheme

Here are two optimization scenarios, the first one is simply rough, which is to increase the number of servers that host hotspot data. Another better approach is to use virtual node technology, the principle of which is to split a physical node to ask multiple virtual nodes, so that these virtual nodes evenly distributed over the Hashi. This solves the problem that when a node is deleted, its data resource allocation is unbalanced.

If the figure Red Node 3 is equivalent to the server that hosts the hotspot data, the right image divides each physical node into two virtual nodes, and distributes evenly over the Hashi.
The advantage of this solution is that at the same time, when the node is increased, the distribution of node resources is caused by the new nodes pulling the resources from the other points.
For example, there are ABD three nodes that allocate 100 resources, when you want to add a node C between BD, C node will pull the corresponding node directly from the D node, which will cause AB to allocate more resources than the CD allocated resources, which does not meet the server load balancing requirements. The problem can be solved by splitting a physical node into several virtual nodes and distributing them evenly over the Hashi. Complexity of Time

Doing this seems like a consistent hash is perfect, but we also ignore the problem that the consistency hash lookup time complexity. The consistency hash is not like a normal hash, and the time complexity is 0 (1) because the normal hash is based on an array, and the consistent hash typically chooses a linked list as the underlying data structure to satisfy scalability. Then the complexity of Time becomes O (N). Optimization Scheme

The time complexity of O (N) Here is intolerable for hashing algorithms, where we use a technique called a jump table to solve this problem.
As shown in figure

As shown above in this jump table, each node records the node of the number that is 1,2,4 distance from itself, in this way, no matter which node the query falls on, the entire Hashi to take the meaning of the query can skip at least half of the query space, so that recursion will soon be able to locate the data is where the node. So the complexity of time falls to O (Logn). But there is also the problem of consuming the server's storage space.

The second solution is to use a two-fork lookup tree structure without using a linked list as the underlying data structure. Because the process of hash lookup is actually to find the smallest number of values in the binary tree, we can select the AVL tree and the red and black tree as the basic data structure. This can also reduce time complexity to O (Logn).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.