Memcached (eight) consistent hash advanced application

Source: Internet
Author: User
Tags hash memcached

Brief introduction

The consistent hashing algorithm was presented by MIT in 1997 (see expanded reading [1]), designed to address hot spot problems in the Internet, with a similar intent to carp. Consistent hashing fixes the problem with the simple hashing algorithm used by carp, enabling DHT to be truly applied in peer-to-peer environments.

English explanation

Consistent hashing is a scheme so provides hash table functionality in a way so the addition or removal of one slot do Es not significantly change the mapping's keys to slots.

Hashing algorithm

The consistent hash presents 4 adaptive conditions that the hashing algorithm should meet in a dynamically changing cache environment:

Balance (Balance)

Balance means that the result of a hash can be distributed to all buffers as much as possible, so that all buffer space is exploited. Many hashing algorithms can satisfy this condition.

Monotonicity (monotonicity)

Monotonicity means that if something has already been allocated to the appropriate buffer by hashing, a new buffer is added to the system, and the result of the hash should ensure that the original allocated content can be mapped to the new buffer without being mapped to other buffers in the old buffer set. (This translation information has a negative value, when the size of the buffer changes consistent hashing try to protect the allocated content will not be remap to the new buffer.) )

Simple hashing algorithms often do not meet the requirements of monotonicity, such as the simplest linear hash:

X→ax + b mod (P)

In the upper-style, p represents the size of the total buffer. It is not difficult to see that when the size of the buffer changes (from P1 to P2), all of the original hash results will be changed, so that does not meet the requirements of monotonicity.

The change in the hash result means that when the buffer space changes, all mapping relationships need to be updated all over the system. In Peer-to-peer systems, the change in buffering is equivalent to peer joining or exiting the system, which occurs frequently in peer-to-peer systems, thus resulting in significant computational and transmission loads. Monotonicity is the need for hashing algorithms to handle this situation.

Dispersibility (spread)

In a distributed environment, a terminal may not see all the buffers, but only a subset of them. When a terminal wants to map content to a buffer through a hashing process, because the buffer ranges seen by different terminals may be different, resulting in inconsistent results of the hash, the end result is that the same content is mapped to different buffers by different terminals. This situation is clearly to be avoided because it causes the same content to be stored in different buffers, reducing the efficiency of system storage. The definition of dispersibility is the severity of the above situation. A good hashing algorithm should be able to avoid inconsistencies as much as possible, that is, to minimize dispersibility.

Payload (load)

The load problem is actually looking at the dispersion problem from another angle. Since different terminals may map the same content to different buffers, it is possible for a particular buffer to be mapped by different users to different content. As with dispersibility, this should be avoided, so a good hashing algorithm should be able to minimize the load on the buffer.

Conclusion

Consistent hashing basically solves the most critical problem in peer-to-peer environments-how to distribute storage and routing in a dynamic network topology. Each node is required to maintain only a small number of neighboring nodes, and only a few of the relevant nodes are involved in the maintenance of the topology when the node joins/exits the system. All of this makes the consistent hash the first practical DHT algorithm.

However, the routing algorithm for a consistent hash is still deficient. In the query process, the query message passes through O (n) Steps (o (n) is proportional to N, and N represents the total number of nodes in the system) to reach the queried node. It is not difficult to imagine that when the system is very large, the number of nodes may be more than million, such query efficiency is obviously difficult to meet the needs of the use. In other words, even if the user can endure a long delay, the query process generated a large number of messages will also bring unnecessary load on the network.

Memcache client-side distributed

Memcached's client-side distributed uses a consistent hashing algorithm, the process is as follows:

See more highlights of this column: http://www.bianceng.cnhttp://www.bianceng.cn/webkf/tools/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.