Consistency Hash algorithm Learning (distributed or equalization algorithm) __ algorithm

Source: Internet
Author: User
Tags hash php class
Introduction:

Consistent hashing algorithm a distributed hash (DHT) implementation algorithm, proposed by MIT in 1997, aims to solve the hot spot problem in the Internet, the original intention and carp very similar. Consistent hashing fixes the problem with the simple hashing algorithm used by Carp, which allows distributed hashing (DHT) to be truly applied in peer-to-peer environments. Scenario Introduction:

For example, you have n cache server (behind the cache), then how to map an object to the n cache, you are likely to use similar to the following common method to compute the hash value of object, and then evenly mapped to N cache:

Hash (object)%N

The method of modulus above is generally called a simple hash algorithm. It is true that distributed layout (mapping) can be implemented fairly evenly through simple hash algorithms, but we consider the following two scenarios:

1 A cache server m down (in practical applications must consider this situation), so that all mapped to the cache m object will be invalidated, how to do, need to remove the cache m from the cache, this time cache is N-1, mapping formula into H Ash (object)% (N-1);

2 because the access is heavier, need to add cache, this time cache is n+1 platform, mapping formula into a hash (object)% (n+1);

1) and 2 mean something. This means that virtually all of the cache is invalidated, either by adding or removing the cache server. For the server, this is a disaster, flood-like access will directly flush back to the server;

To solve the above problem, we introduce a consistent Hash algorithm (consistent hashing). hash algorithm and monotonicity

One metric of the Hash algorithm is monotonicity (monotonicity), defined as follows:

Monotonicity is the addition of new buffers to the system if some content has been allocated to the corresponding buffer by hashing. The result of the hash should be that the original allocated content can be mapped to the new buffer without being mapped to other buffers in the old buffer set.

Simply put, monotonicity requires that when removing/adding a cache (machine, IP), it can change as little as possible the existence of a key mapping relationship.

It is easy to see that the above simple hash algorithm hash (object)%N difficult to meet the monotonic requirements. Because of the change of N, the result of the modulo is changed. consistent Hash algorithm principle:

The consistent Hash algorithm simply says that when removing/adding a cache, it can change as little as possible the existence of a key mapping relationship, as much as possible to meet the requirements of monotonicity.

Here are 6 steps to simply explain the basic principles of a consistent Hash algorithm.

Step one: annular hash space

Considering the usual hash algorithm is to map the value to a 32-bit key value (and then modulo), that is, the 0~2^32-1 of the second square of the numerical space; we can think of this space as a circle of the first (0) tail (2^32-1). As shown in the following illustration:

Step Two: Process the object into an integer and map to the ring hash space

For example, now we have four object object1~object4, which processes four objects into integer keys through the hash function:

Key1 = hash (object1);
Key2 = hash (object2);
Key3 = hash (OBJECT3);
Key4 = hash (OBJECT4);

These objects are then mapped to the ring hash space according to the value of the key:

Step Three: Map the cache to the annular hash space

The basic idea of the consistent hash algorithm is to map both the object and cache to the same hash value space and use the same hash algorithm.

Suppose there are three cache servers: Cachea, Cacheb, CACHEC, to get the corresponding key value through the hash function:

Keya = hash (Cachea);
keyb = hash (Cacheb);
KEYC = hash (CACHEC);

The three cache servers are mapped to the ring hash space according to the value of key:

Speaking of which, incidentally, the cache hash calculation, the general method can use the cache machine IP address or machine name as a hash input.

With the above steps, we map both the object and the cache server to the same ring hash space. The next consideration is how to map the object to the cache server.

Step Four: Map the object to the cache server

We proceed along the clockwise object key of the ring (Key1 in the figure) until a cache server (Cacheb) is encountered, mapping objects corresponding to the object key to the server. Because the hash value of the object and cache is fixed, the cache must be unique and certain. In this way, you can conclude that object 1 maps to Cacheb,object2, OBJECT3 maps to CACHEC,OBJECT4 mappings to Cachea. As shown in figure:

As I said before, Common hash algorithm (through the method of hashing and then residual) brings the biggest problem is not to meet the monotony, when the cache number changes (Add/Remove), almost all of the cache will be invalidated, and then to the background server caused a huge impact, and then analysis of the consistent hash algorithm.

Step Five: Add cache server

Now if the access is heavier, you need to increase the CacheD server, after the hash function calculation (Keyd = hash (CacheD)) found that the value between Key3 and Key2, that is, the position on the ring is also between them. This is affected by keyd the objects back to the next cache server (KEYB) (those objects originally mapped to CACHEC) and then remap them to CacheD.

In our case only object2 (Key2) needs to be changed to remap it to CacheD:

Step Six: Remove the cache server

or according to the original diagram (before five) analysis, if the Cacheb server down now, you need to remove the Cacheb server, this time is only those who follow the keyb counterclockwise to know the next server (Cachea) between the object, which is originally mapped to The objects on the Cacheb.

Our example is just Object1 (key1) needs to be changed to remap it to CACHEC:

balance and Virtual node:

Another metric for hashing algorithms is balance (Balance), defined as follows:

of Balance

Balance means that the result of a hash can be distributed to all buffers as much as possible, so that all buffer space is exploited.

The hash algorithm is not guaranteed absolute balance, if the cache is less, the object can not be evenly mapped to the cache, for example, in the above example, only the deployment of cache A and cache C, in 4 objects, cache a only store object1, While cache C stores Object2, Object3 and Object4, the distribution is very uneven.

To address this situation, the consistent Hash algorithm introduces the concept of "virtual node", which can be defined as follows:

Virtual Node

Virtual node is the actual node in the hash space of the replica (replica), an actual node corresponding to a number of "virtual nodes", the corresponding number also become "copy number", "Virtual node" in the hash space in the hash value arrangement.

Still, for example, to deploy cache A and cache C only, we have seen that the cache distribution is not evenly distributed in the Cacheb server diagram of removal. Now we introduce the virtual node, and set the "number of copies" for 2, which means that there will be 4 "virtual nodes", Cache A1, cache A2 represents cache A; Cache C1, Cache C2 represents cache C; Suppose a more ideal feeling Condition, as shown in figure:

At this point, the mapping between the object and the virtual node is:

Objec1->cache C2; objec2->cache A1; Objec3->cache C1; Objec4->cache A2;

So the objects Object4 and Object2 are mapped to cache a, and object3 and Object1 are mapped to cache C; The balance has been greatly improved.

After the virtual node is introduced, the mapping relationship is converted from the {Object-> node} to the {Object-> virtual node}. The mapping relation of the object's cache is shown in the figure.

The hash calculation of "virtual node" can be based on the IP address of the corresponding node plus the digital suffix. For example, suppose cache A has an IP address of 202.168.14.241.

Before introducing the "virtual node", compute the hash value of cache A:

Hash ("202.168.14.241");

After introducing the virtual node, compute the hash value of the "virtual section" point cache A1 and cache A2:

Hash ("202.168.14.241#1"); Cache A1

Hash ("202.168.14.241#2"); Cache A2 An example of PHP:

<?php class flexihash{Private $serverList = Array ();  Server List Private $isSorted = false;
    Whether the server list has been ordered//hash function Mhash ($key) {$MD 5 = substr (MD5 ($key) 0,8);
    $seed = 31;
    $hash = 0;
    for ($i = 0; $i < 8; $i + +) {$hash = $hash * $seed + ord ($md 5{$i});
  return $hash & 0x7fffffff;
    }//Add Server function Addserver ($server) {$hash = $this->mhash ($server);
    if (!isset ($this->serverlist[$hash])) {$this->serverlist[$hash] = $server;
    } $this->issorted = false;
  return true;
    }//Remove server function Removeserver ($server) {$hash = $this->mhash ($server);
    if (Isset ($this->serverlist[$hash])) {unset ($this->serverlist[$hash]);
    } $this->issorted = false;
  return true;
    //Lookup Mapping Server (algorithm implementation function) function lookup ($key) {$hash = $this->mhash ($key);
      if (! $this->issorted) {krsort ($this->serverlist,sort_numeric);
    $this->issorted = true; } FOREACH ($this->serverlist as $pos => $server) {if ($hash >= $pos) {return $server;  } return Reset ($this->serverlist);
 return first element}}

The above is only a simple implementation of the consistent Hash algorithm example, and did not consider the virtual node, interested students can try.

This blog reference from http://blog.csdn.net/sparkliang/article/details/5279393 and book "PHP Core technology and best algorithm"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.