Consistent hash (consistent hashing algorithm)

Source: Internet
Author: User

First, create a background

Today I do not go to the long-winded special detailed explanation consistent hash, I strive to use the most relaxed way to tell you consistent hash algorithm is what, if need to go deeper, Google a bit ~.

Give me a chestnut:

For example, there are n cache servers, you need to map an object to n cache, we can calculate the hash value of object in a similar way, and then map evenly to the N cache:

Hash (object)%N

For example, object is "Hello", hash (object) is 100,n to 3,100%3=1, this data will be stored on the 1th cache (0, 1, 23 cache). This will solve the problem of putting a bunch of data on the N cache.

Now there is an unexpected situation, 0, 1, 23 cache 1 Damage!

What to do, the data on cache 1 first needs to be migrated, the number of caches available to 2, this time we need to recalculate the hash value of all the data, the above formula becomes this:

Hash (object)% (N-1)

This recalculation means that the results of every hash of the data are almost changed! That means there will be a large cache failure in the cache system, the data access will directly impact the server behind the cache, fun, and collapsed.

Why did do it, consistent hash appeared!

Second, the principle of the algorithm

Consistent hash simply said, is to change the number of caches, the smallest possible change in the existence of a key mapping relationship, that is, to add a cache or a cache server to reduce the existing cache data do not have a big impact. How do we do that,,,

We first think of a chain, the first lattice is 0, the last lattice is 2^32-1, this chain is an address space for the 2^32 hash value space, now the chain is the first connected to form a ring, like this (here is too troublesome to draw, the following ring related graph from the Internet, Invasion delete it):

Then consider our four data from OBJECT1~OBJECT4, which is mapped to the ring "like this: hash (object1) =key1" via the hash algorithm

It should be well understood here that 4 data gets 4 keys, and then we will map the cache server to this ring, assuming that 3 cache servers are cache A, cache B, Cache C, and then hash algorithm: hash (cache a) = Key A, the 3 cache server dropped to the ring, you can use the following results (calculate the hash value of the cache server with its IP and other information):

Then follow this ring, collect the object data, throw in when the cache is encountered, so that you can match the data with the cache, so we can get the following results:

Object1:cache A

Object4:cache B

Object2/object3:cache C

OK, this completes the data map, then we consider the previous common hash algorithm encountered the number of cache server changes in the case, to see if the ring hash solves this problem.

Let's say cache B is out of line

This time you can see that only the OBJECT4 that were originally mapped to cache B received an impact and need to be transferred to cache C.

What if I add a cache node? Suppose you add a cache D:

This time only need to transfer the OBJECT2 to cache D, the others are not changed. Isn't it amazing?

Here may be aware of a problem, when the number of cache servers, this algorithm can easily lead to uneven distribution of data, so there is a concept of a virtual node in the ring hash, there are only the cache a and cache C in the example of a on a there are 1 data, and c is stored in 3, We add a number to the cache (assuming the IP value) and then hash it to get more virtual cache nodes:

Hash ("192.168.0.1#1")

Hash ("192.168.0.1#2")

Hash ("192.168.0.2#1")

Hash ("192.168.0.2#2")

Similar to the above, a cache server corresponds to 2 hash values, so that when we drop to the ring, we will get the following result:

This time Object1 will fall on the cache A2, Object2 will fall on the cache A1, physically A1 and A2 are actually the cache a, through this way intuitively see we will the original 1, 3 distribution into 2, 2 Distribution (A on 1 data C on 3 data into a on 2 data B on 2 data)

With the introduction of the "Virtual node", the mapping relationship is transformed from {object---node} to {Object-by-virtual node}, so the algorithm is even better balanced.

OK, the principle of the consistent hash algorithm is introduced here, the following can be seen Groupcahce in the consistent hash how to achieve.

Third, the consistent hash in Groupcache

Finally to see the code!!!

Source code is mainly the following several package, today we look at the first package: "Package Consistenthash" content

From what we can see, here we just need to focus on consistenthash.go This source file, there are 2 types: hash and map,1 function New,map type has 4 non-exportable properties to follow 3 binding methods.

Here's a look at it ~

1, type hash (need to remember hash is a function type OH).

2, the map type (the first property of the map type is the above hash type variable Hash,replicas property represents the number of replicas, remember the above mentioned in order to solve the problem of balance of the concept of the virtual node introduced? These virtual nodes are the number of replicas described here.

3, the New () function (this function is obviously used to obtain the above map type variable instance, initialize the number of copies, hash function, HashMap table, HashMap table key is a cache server or copy of the hash value, Value is a specific cache server, which completes the ability to map all copies of cache A, cache A1, cache A2 to cache a.

4, IsEmpty () function (this function has nothing to say, non-empty judgment)

5, add () function (add the cache server to the map, such as caches a, cache B as keys, if the number of replicas specified is 2, then the data stored in the map is the cache a#1, cache a#2, cache b#1, cache b#2 hash results)

6, Get () function (this function is relatively complex, such as having a data: Name= "Zhang San" This data needs to be saved, this time through this function to select a cache server, the return string type can be a server IP, such as: "192.168.0.1", Thus the caller is able to save Name= "Zhang San" to "192.168.0.1")

OK, finished, today's content is a bit more, may need to spend a little more time to digest ~

Next, we introduce GROUPCACHEPB this package, of course, inevitably to introduce protocol buffers, line, today to talk about here!

149 Reads
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.