Have you ever wondered how Redis, memcache, etc. can achieve cluster load balancing?
In fact, they are all through the consistent hash algorithm to achieve node scheduling.
Before we talk about the consistency hash algorithm, we will briefly summarize the algorithm of solving the remainder hash:
Hash (object)%N
- A cache server down, so that all the objects mapped to this server will be invalidated, we need to remove the cache belonging to the server, when the cache server is N-1, mapping formula into a hash (object)% (N-1);
- Because of the increased QPS, we need to add one more server, when the server is n+1, mapping formula into a hash (object)% (n+1).
Changes in 1 and 2 will occur for all servers that require data migration.
Consistent hash algorithm
The emergence of the consistent hash algorithm effectively solves the problem that the above common redundancy algorithm faces the whole cache failure after the node changes:
type Consistent struct { numOfVirtualNode int hashSortedNodes []uint32 circle map[uint32]string nodes map[string]bool}
Simply put, a consistent hash organizes the entire hash value space into a virtual ring, such as assuming that the value space of a spatial hash function h is 0-2^32-1 (that is, the hash value is a 32-bit unsigned shape), and the entire hash space is as follows:
Image
The next step is to use the hash algorithm to calculate the location of each machine, using the server's IP address or hostname as a keyword, and in a clockwise order:
//这里我选择crc32,具体情况具体安排func hashKey(host string) uint32 { scratch := []byte(host) return crc32.ChecksumIEEE(scratch)}
Here we assume that the Santai node memcache is calculated after the following position:
Image
//add the nodec.Add("Memcache_server01")c.Add("Memcache_server02")c.Add("Memcache_server03")func (c *Consistent) Add(node string) error { if _, ok := c.nodes[node]; ok { return errors.New("host already existed") } c.nodes[node] = true // add virtual node for i := 0; i < c.numOfVirtualNode; i++ { virtualKey := getVirtualKey(i, node) c.circle[virtualKey] = node c.hashSortedNodes = append(c.hashSortedNodes, virtualKey) } sort.Slice(c.hashSortedNodes, func(i, j int) bool { return c.hashSortedNodes[i] < c.hashSortedNodes[j] }) return nil}
Next, use the same algorithm to calculate the hash value of the data, and thus determine the location of the data on this Hashi
If we have data A, B, C, and D, the hash is calculated after the position as follows:
Image
According to the consistent hashing algorithm, data A is bound to the Server01, D is bound to the Server02, B, C on the SERVER03, is to follow the clockwise to find the nearest service node method
This method of Hashi scheduling has high fault tolerance and scalability:
Suppose server03 down
Image
You can see that a, C, B are not affected at this point, but that the B and C nodes are relocated to server 1. In general, in a consistent hashing algorithm, if a server is unavailable, the affected data is only data between the server and the previous server in its ring space (that is, the first server encountered in the counterclockwise direction), and the others are unaffected.
Consider another scenario if we add a server to the system memcached Server 04:
Image
At this point A, D, C are unaffected and only B needs to be relocated to the new server 4. In general, in a consistent hashing algorithm, if you add a server, the affected data is only the data between the new server and the previous server in its ring space (that is, the first server encountered in the counterclockwise direction), and the others are unaffected.
To sum up, the consistency hashing algorithm can only reposition a small subset of data in the ring space for the increment and decrease of the nodes, which has good fault tolerance and expansibility.
Full code
Http://www.nextblockchain.top/posts/32