Consistent hashing algorithm

Source: Internet
Author: User

I read memcached a few days ago. When I saw the distributed algorithm of memcached, I found a consistent hashing hash algorithm. I searched the internet and got a rough idea about this algorithm and made a record.

The balanced data distribution technology is very important in distributed storage systems. The more even the data distribution, the better the overall performance of the system.

 

Simple hash algorithm: the K-based remainder algorithm is simple, but it is difficult to meet the monotonic requirement and has poor balance. It is inefficient to update when adding or deleting nodes. When the number of storage nodes in the system increases or decreases, the Data Object ing position of the system must be re-calculated, seriously affecting the cache hit rate, this may cause the system to fail to respond to the outside world.

 

Consisteng hashing: First, it abstracts the bucket into a ring and configures the storage node to the ring. All nodes in the ring have a value. Then, hash the data,Map it to the nearest node clockwise. In this way, when a node fails, onlyThe distance from the faulty node to the previous node counterclockwiseAffected. When a storage node is added, the newly added storage node is also affected to the previous node. This effectively solves the problem of low efficiency caused by adding, deleting, and re ing all data by simple hash algorithms.

The consistent hash algorithm basically solves a key problem in the storage environment represented by P2P-how to distribute data and select routes in a dynamic network topology. In this algorithm, only a small number of adjacent nodes need to be maintained on each storage node. When a node is added or exited, only a small number of nodes are involved in topology maintenance, this makes the consistent hash algorithm a practical DHT (distribute hash table) algorithm.

 

Disadvantages of consistent hash algorithms: 1. the query process is summarized. The query message must go through the O (n) Step (n is the total number of nodes in the system) to reach the queried node. If the system scale is very large, such query efficiency may not meet practical needs.

Improvement of consistent hash algorithm: divides the ring into M equal parts. If n is added to a physical node, every physical node has V = M/N node books. When a physical node is offline, because the virtual nodes corresponding to the node are evenly distributed, the nodes nearby it will evenly share the original attachment of the origin, when a new node is added, the burden on other nodes is evenly transferred to the node. At the same time, the number of virtual nodes in the ring can be allocated to the physical nodes based on the physical node's timing, which also solves the performance difference of storage nodes.

 

References: http://tech.idv2.com/2008/07/24/memcached-004/

Research on consistent Hash Algorithms in Distributed Storage Systems

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.