In the Internet Distributed system, many services are data storage-related, mass access, direct access to storage media is not resistant, need to use Cache,cache cluster load balancing algorithm becomes an important topic, here to the existing load balancing algorithm to some summary. BTW: Although the cache load balancing algorithm summary, in fact, it can be said that the load balancing algorithm summary, just for the cache application scenario. Load balancing algorithm mainly includes: static algorithm random algorithm round robin algorithm hash algorithm carp algorithm consistent hash algorithm static algorithm load balanced Stone Age, for a service to specify multiple Ip:port, backup mode, It always returns the first server of the server group (as long as the first server is available), and the next available server is returned when the first server is unavailable. In this case, each machine includes a full amount of data, the query will usually fall to the first machine, the first machine cache hit rate is high, but when the failure, landed on the second machine, then the cup, the cache hit the low Ah!!!
Random algorithm Earth people know the algorithm, for the stateless service compared to the use of a random machine can be selected.
Idx=rand ()%M
In actual use, as with the static algorithm, is the module to maintain the full amount of data, this fortunately each machine's cache hit ratio theoretically should be similar, but are not high, why? Because the same request will fall to machine A, one will fall on machine B. The black sheep, waste memory ah, memory is limited, the cache will be eliminated, frequent elimination, of course, make the hit rate low AH.
Round Robin algorithm Typical egalitarianism, general, the emperor took turns to do ah, sequentially select the server. idx= (idx+1)%m the same module to maintain the full amount of data, like random cups, basically the same reason. The same request will be dropped on different machines, resulting in a low cache hit ratio. hash algorithm called the residual algorithm, the query key after the hash, according to the number of machines to take the remainder, select a machine to connect services. Idx=hash (query_key)%m remainder calculation method is simple, the dispersion of data is very good, but also has its shortcomings. That is, when the server is added or removed, the cost of the cache reorganization is significant. After the server is added, the remainder can be transformed so that the same server as the save is not available, affecting the cache hit ratio. The CARP algorithm carp accurate is not an algorithm, but rather a protocol, the Cache Array Routing protocol,cache Group Routing protocol. Calculate the Idx_key=hash (QUERY_KEY+SERVER_IDX) of all servers, where the Idx_key maximum server_idx is the IDX required. Assuming the start of 3 backend servers, the request with the flag string req = "ABCD" to flag, the server with S1, S2, S3 to flag, then, by merging req + Sx to calculate the signature can be a value for each server: (req = "ABCD" + S1) = K1
(req = "ABCD" + S2) = K2
(req = "ABCD" + S3) = K3
The method of calculation can use CRC, you can also use MD5, the purpose of getting a * hash * number, so in K1,K2,K3 must have a maximum value, assuming that K2, then the request req can be thrown to S2, so that later on the same request, the same server group, The calculated result must be K2 maximum, so as to achieve the effect of hash distribution.
The clever place is that when adding or deleting a server, does not cause the cache of the existing server to fail large, assuming that a new server S4, then the S1,S2,S3 calculation of the K values are exactly the same, then the S4 can be calculated to get a new value K4, if the algorithm of K calculation is enough hash, Then the original calculation to S1,S2,S3 request, theoretically there will be 1/4 of the request for the new calculation K4 than the original K, then the 1/4 request will be transferred to S4, so that the new S4 server will bear 1/4 of the request, the original S1,S2,S3 will only bear the original 3/4. Consistent hash algorithm consistent hash algorithm is: first find the server (node) hash value, and configure it to the 0~2^32 Circle (Continuum). It then uses the same method to find the hash value of the key that stores the data and maps it to the circle. It then searches clockwise from where the data is mapped, saving the data to the first server found. If more than 2^32 still cannot find the server, it will be saved to the first server. Idx=firstmaxserveridx (hash (query_key)) When adding nodes: The basic idea behind the consistent hash algorithm is to use the same hash function for object and cache machine " The core of the DHT algorithm, the theoretical cornerstone of peer-to, resource and address nodes in the unified address space for addressing. Consistent hash is suitable for each node to hold only part of the data, rather than the previous algorithms, each node holds the full amount of data. The advantage of this is that the cache machine can be mapped to a interval, and this interval will contain a certain number of object hash values. If a cache machine is removed, then the interval it maps to is hosted by a cache machine adjacent to it, and all other cache machines are not changed.
The consistent hashing algorithm minimizes the redistribution of key on the list of service nodes, and the other improvements that come with it are the consistency hashing algorithm, which also increases the method of the Virtual service node, that is, a service node has multiple mapping points on the ring, which can suppress uneven distribution. Minimize cache redistribution When the service node increases or decreases. Reference: http://icp.ircache.net/carp.txthttp://hi.baidu.com/fdwm_lx/blog/item/f670e73582c8411d90ef3950.htmlhttp:// Blog.csdn.net/sparkliang/archive/2010/02/02/5279393.aspx Turn from: http://blogold.chinaunix.net/u/12592/showart.php?id =2537201
Load Balancing algorithm