Consistent hash and hash

Source: Internet
Author: User

Consistent hash and hash

Consistent hash is a special hash algorithm. After the consistent hash algorithm is used, the number of slots (size) in the hash table only needs to be re-mapped to K/n keywords on average, where K is the number of keywords, n indicates the number of slots. However, to add or delete a slot in a traditional hash table, almost all keywords need to be remapped.

Consistent hashing was proposed by MIT's Karger and its collaborators. Now this idea has been extended to other fields. In this academic paper published in 1997, we introduced how "consistent hash" can be applied to distributed Web services that are easy to change. Each node in the hash table represents a node in the distributed system. To add or delete a node in the system, you only need to move K/n items.

Core of distributed cache design: when designing a distributed cache system, we need to balance the distribution of keys and minimize the number of cache migration after the cache server is added.

When using n cache servers, a common load balancing method is to map resource requests to a cache server using hash (K) mod N. When a cache server is added or removed, this method may change the hash value of all resources, that is, all the caches are invalid, this allows the cache server to centrally update the cache to the original content server. To avoid such problems, we need a consistent hash algorithm.

The main idea of consistent hash algorithm:
Associate each Cache Server with one or more hash value ranges. The interval boundary is determined by calculating the hash value corresponding to the cache server. (The hash function for defining the interval is not necessarily the same as the function for calculating the hash value of the cache server, but the return values of the two functions must match .) If a cache server is removed, it is merged from the corresponding interval to the adjacent interval. Other cache servers do not need to be changed.

Implementation:
Consistent hash maps each object to a point on the edge of the ring, and the system maps available node machines to different positions of the ring. When finding the machine corresponding to an object, you need to use the consistent hash algorithm to calculate the location of the object corresponding to the edge of the ring, and search clockwise along the ring until the first node machine is encountered, this machine is the location where the object should be saved.

When a machine is added to a point on the edge of the ring, the next machine at this point needs to move the corresponding objects before the node to the new machine.

   

When deleting a node machine, all objects saved on the machine must be moved to the next machine.

    

Virtual node:
When the number of service nodes is too small, the consistent hash algorithm is prone to data skew due to uneven node segments, as shown below:

    

The load on node A is heavy.
In this regard, we divide each server into v virtual nodes, and then randomly allocate all virtual nodes (n * v) to the ring of consistent hash, in this way, all users obtain the first vnode clockwise from the position on their own ring, that is, their own node. When the node fails, the next node is taken clockwise as an alternative node.

    

Refer:

Https://zh.wikipedia.org/wiki/%E4%B8%80%E8%87%B4%E5%93%88%E5%B8%8C
Http://opensource.plurk.com/LightCloud/Design_spec/

Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.