A consistent hashing algorithm for distributed algorithms

Source: Internet
Author: User
Tags crc32 modulus nginx server

In the Web development of large Web sites, the word distributed often appears before us. Such as:

    • Load balancing of cache servers such as memcache, Redis servers (distributed cache),
    • MySQL's distributed cluster (distributed DB),
    • A large number of session shared storage (distributed files, or session servers, etc.),

These will be used in distributed thinking, the root cause, to understand the distributed algorithm. Forgive me for pulling a bunch of crap. The subject of this article, the consistent hashing algorithm. Next, we discuss the consistent hashing algorithm with load balancing of the cache server.

Traditional algorithm defects

For server distribution, we have to consider the following three points: the average distribution of data, find accurate positioning, reduce the impact of downtime .

The traditional algorithm is usually the key of the data map out the numbers, the number of servers with the model , and according to the results of the selection of the server to be stored. It can achieve the average distribution of data and locate accurate requirements, and the advantages of the algorithm is simple, access to the computational amount is relatively small (when the data is very large).

However, it has a fatal disadvantage, that is, a server outage after the impact of a large, we can calculate the impact of a server outage:

    • Most of the original data is lost: the number of servers reduced by one, take modulus minus 1 resulting in the modulus of confusion, if there were n servers, then the data only 1/(n (n-1)) after the outage can be accurately found.
    • Load imbalance leads to collective downtime: If the server is not processed in time, then his storage task will be accumulated to its next server in sequence, then the next server will quickly be crushed down, so that the server group will quickly collective downtime.
Algorithmic thinking

The consistent hashing algorithm uses a certain hashing algorithm to map a large amount of data to different storage targets, while ensuring the accuracy of the search, and also considering that when one of the storage targets fails, the other storage targets load balance the contents of their responsible storage.

The realization of the consistent hashing algorithm is not difficult to understand,

    1. With a certain hash algorithm (hash function, etc.), a group of servers (number of self-set) node random mapping scattered to 0-232, due to its random distribution, to ensure the characteristics of its average distribution of data ;
    2. Using the same algorithm to calculate the key to store the data, according to the server node to determine its storage server nodes, because each time with the same algorithm calculation, so the results are the same, so that they find the correct location ;
    3. When the data is found, the key is calculated again using the same algorithm, and the data node of the server is found;
    4. If there is a server outage, eliminate its server node, and put the data on the next node, due to random node location randomness, so the data is the average load of other servers, but also reduce the impact of downtime .

Note that this ring space is only a virtual space , only represents the scope of the server storage and data landing, in the storage, we also have to find the landing point, the data into the corresponding server for the search.

Algorithm implementation

Programming language we use PHP to implement a consistent hashing algorithm:

We mainly use the following functions:

int Crc32 (String $str)
Generates a 32-bit cyclic redundancy check code polynomial for str. This is typically used to check the integrity of the data being transmitted.

String sprintf (String $format [, Mixed $args [, mixed $ ...])
Produces a specific format pattern for a string by passing in the format.

The implementation is as follows:

Class consistance{protected $num = 24;    Set the number of nodes per server, the more the number of outages, the server load will be distributed more evenly, but also increase the data lookup consumption.   Protected $nodes =array ();    A list of nodes for the current server group.    Computes the hash value of a data to determine the location public function Make_hash ($data) {return sprintf ('%u ', CRC32 ($data));        }//traverse the node list of the current server group to determine the server public function Set_loc ($data) {$loc =self::make_hash ($data) that needs to be stored/looked up; foreach ($this->nodes as $key = + $val) {if ($loc <= $key) {return $val            ;    }}}//Add a server and add its nodes to the list of nodes in the server group. Public Function Add_host ($host) {for ($i =0; $i < $this->num; $i + +) {$key =sprintf ('%u ', CRC32 ($host. '            _ '. $i));           $this->nodes[$key]= $host;        } ksort ($this->nodes);    Sort the nodes so that they are easy to find.    }//Delete a server and remove its corresponding node from the list of nodes in the server group. Public Function Remove_host ($host) {for ($i =0; $i < $this->num; $i + +) {$key =sprintf ('%u ', CR C32 ($host. '            _ '. $i)); UnSet ($this->nodes[$key]); }    }}

We use the following code to test:

The results are as follows:

Summary

The implementation of the algorithm to this, we can also optimize the algorithm, such as the number of servers and the number of nodes per server is a lot of cases to find the node optimization process, because the sort of good, you can use the dichotomy to find, speed up the query efficiency, these, yen Che see.

In addition, although the NGINX server has a consistency algorithm plug-ins, memcache and Redis have corresponding plug-ins, MySQL middleware has a corresponding integration, but the understanding of the consistent hashing algorithm is also very meaningful. Also, we can use it flexibly, such as distributed management of files and so on.

If you think this blog is helpful to you, you can recommend or follow me, if you have any questions, you can leave a comment under the discussion, thank you.

A consistent hashing algorithm for distributed algorithms

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.