My view on [miscellaneous] consistent hash (consistent hash)

Source: Internet
Author: User

The first time with consistent hash (hereinafter referred to CH) is in the network principle of a book, he applied in the Peer-to-peer network, mainly solves the problem of distributing storage resources. At that time is not very clear his scope of application and Chinese name, PS network principle of a book is in English, of course, also did not appear ch words.

The second meeting is in the use of memcache, under the guidance of high man with CH to achieve the expansion of demand.

What the CH can do and what problems it solves.

Answer: The problem of storing averages

Let's take a look at the principle of CH first:

Ch first divides a ring into 2 N-second-order nodes, each node is a value V (i), and then the stored node is hashed according to a hash function f, which falls in the ring.

When a value A is stored, the hash value of a is taken with the same hash function f, which must fall within the ring, but not necessarily the value of V (i). At this point, look clockwise for the next V (i) node, which is the storage node for a.

Such an algorithm makes the storage value very average and guarantees consistency--a value on only one node.

In this, we will find that this hash function is very important.

Now let's look at the remainder hash function, which means that the hash function f above is related to the number of nodes. Then each additional server, you must do a library processing. The result is what we don't want to see.

Does using a consistent hash solve the problem of adding nodes? The answer is obviously not, but using a consistent hash can reduce the cost of moving the library. This means that the key value after the hash is very average.

Let's take a look at exactly what the consistency hash is.

These are pure mathematics things, see I still dizzy, do not explain what (also explain what), put an address for everyone to refer to the next bar

A general hash function algorithm: http://www.partow.net/programming/hashfunctions/

Here are some of my ideas for adding nodes to a consistent hash function without moving libraries (untested):

The idea is derived from the idea of cardinality ordering---space to change time, the specific contents are as follows:

If we have 4 nodes on the ring O A,b,c,d, that is, O-a-b-c-d-o, at this point, there are some data on the 4 nodes.

We insert the node E and insert it between the D-o: O-a-b-c-d-e-o

What we were going to do was move the key value between the D-E and the data from O to E, but we don't need to do this if we use space to change time:

We just need to record a certain server in a certain time period T, which corresponds to a key value K, and the mathematical representation is Server = f (t, K).

Then we query the value of a key value of K1, if the value is originally on O, but after the point of T1, because of the increase in the node, it has landed on E.

According to the time T1 and the key value K1 we can quickly find out its value. But such a practice wastes a lot of space to store these meta DATA.

If the nodes increase indefinitely, the redundant information will increase infinitely, and eventually the library will be moved.

Whether this is done depends on the real-time requirements of our system, if the real time is not high, then the following practices are also not:

If we have 4 nodes on the ring O A,b,c,d, that is, O-a-b-c-d-o, at this point, there are some data on the 4 nodes.

We insert the node E and insert it between the D-o: O-a-b-c-d-e-o

What we were going to do was move the key value between the D-E and the data from O to E, but we don't need to do this if we use space to change time:

When we query a key value of K2, this value is originally on O, time T2, it falls on E, if you do not find on the E node on the next node of the e-clockwise to find, and so on.

In this case, if we increase the node indefinitely, the end result will be a value corresponding to a segment of the key value to traverse the entire increased node loop.

n is the increased number of nodes

The worst degree of time is O (n)

The average time degree is O (n*n)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.