My view on [miscellaneous] consistent hash (consistent hash)

Last Update:2018-07-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The first time with consistent hash (hereinafter referred to CH) is in the network principle of a book, he applied in the Peer-to-peer network, mainly solves the problem of distributing storage resources. At that time is not very clear his scope of application and Chinese name, PS network principle of a book is in English, of course, also did not appear ch words.

The second meeting is in the use of memcache, under the guidance of high man with CH to achieve the expansion of demand.

What the CH can do and what problems it solves.

Answer: The problem of storing averages

Let's take a look at the principle of CH first:

Ch first divides a ring into 2 N-second-order nodes, each node is a value V (i), and then the stored node is hashed according to a hash function f, which falls in the ring.

When a value A is stored, the hash value of a is taken with the same hash function f, which must fall within the ring, but not necessarily the value of V (i). At this point, look clockwise for the next V (i) node, which is the storage node for a.

Such an algorithm makes the storage value very average and guarantees consistency--a value on only one node.

In this, we will find that this hash function is very important.

Now let's look at the remainder hash function, which means that the hash function f above is related to the number of nodes. Then each additional server, you must do a library processing. The result is what we don't want to see.

Does using a consistent hash solve the problem of adding nodes? The answer is obviously not, but using a consistent hash can reduce the cost of moving the library. This means that the key value after the hash is very average.

Let's take a look at exactly what the consistency hash is.

These are pure mathematics things, see I still dizzy, do not explain what (also explain what), put an address for everyone to refer to the next bar

A general hash function algorithm: http://www.partow.net/programming/hashfunctions/

Here are some of my ideas for adding nodes to a consistent hash function without moving libraries (untested):

The idea is derived from the idea of cardinality ordering---space to change time, the specific contents are as follows:

If we have 4 nodes on the ring O A,b,c,d, that is, O-a-b-c-d-o, at this point, there are some data on the 4 nodes.

We insert the node E and insert it between the D-o: O-a-b-c-d-e-o

What we were going to do was move the key value between the D-E and the data from O to E, but we don't need to do this if we use space to change time:

We just need to record a certain server in a certain time period T, which corresponds to a key value K, and the mathematical representation is Server = f (t, K).

Then we query the value of a key value of K1, if the value is originally on O, but after the point of T1, because of the increase in the node, it has landed on E.

According to the time T1 and the key value K1 we can quickly find out its value. But such a practice wastes a lot of space to store these meta DATA.

If the nodes increase indefinitely, the redundant information will increase infinitely, and eventually the library will be moved.

Whether this is done depends on the real-time requirements of our system, if the real time is not high, then the following practices are also not:

If we have 4 nodes on the ring O A,b,c,d, that is, O-a-b-c-d-o, at this point, there are some data on the 4 nodes.

We insert the node E and insert it between the D-o: O-a-b-c-d-e-o

What we were going to do was move the key value between the D-E and the data from O to E, but we don't need to do this if we use space to change time:

When we query a key value of K2, this value is originally on O, time T2, it falls on E, if you do not find on the E node on the next node of the e-clockwise to find, and so on.

In this case, if we increase the node indefinitely, the end result will be a value corresponding to a segment of the key value to traverse the entire increased node loop.

n is the increased number of nodes

The worst degree of time is O (n)

The average time degree is O (n*n)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

My view on [miscellaneous] consistent hash (consistent hash)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

My view on [miscellaneous] consistent hash (consistent hash)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support