LVS cluster Load Scheduling Algorithm

Source: Internet
Author: User
Tags new set domain name server

Source: http://tech.ccidnet.com/art/302/20050609/265435_1.html

Note: LVS-Linux virtual server (Simple Server Load balancer in Linux)

 

1. Connection Scheduling Algorithm in the kernel

In the connection Scheduling Algorithm in the kernel, ipvs has implemented the following eight scheduling algorithms:

Round-Robin Scheduling)

Weighted Round Scheduling (weighted round-robin scheduling)

Last-connection Scheduling)

Weighted last-connection Scheduling)

Locality-based least connection Scheduling)

Locality-based least connection with replication Scheduling)

Destination hashing Scheduling)

Source hashing Scheduling)

 

1.1. Scheduling

The round-robin scheduling algorithm schedules requests to different servers in turn in the round-robin mode, that is, each scheduling execution
I = (I + 1) mod nAnd select the I-th server. The advantage of the algorithm is its conciseness. It does not need to record all the current connection states, soIt is a kind of stateless scheduling..

During system implementation, an additional condition is introduced. When the server's weight is zero, it indicates that the server is unavailable and not scheduled. This aims to switch servers out of services (such as shielding server faults and system maintenance) and maintain consistency with other weighting algorithms. Therefore, the algorithm must be modified accordingly. The algorithm flow is as follows:

Assume there is a group of servers s = {S0, S1,..., Sn-1}, and an indication variable I represents the server selected last time, w (SI) represents the server Si's weight. Variable I is initialized to n-1, where n> 0.

 

   j = i;   do{       j = ( j+1 ) mod n;       if( W(Sj) > 0 ){           i = j;           return Si;       }   } while( j != i );   return NULL;

The scheduling algorithm assumes that the processing performance of all servers is the same, regardless of the current number of connections and response speed of the server. This algorithm is relatively simple and does not apply to situations where the processing performance of the server is different. When the request service time changes greatly, the scheduling algorithm of the wheel call can easily lead to load imbalance between servers.

Although the round-robin DNS method is alsoResolve a domain name to multiple IP addresses by calling SchedulingBut the scheduling granularity of the DNS method is based on each Domain Name Server. The Domain Name Server's cache for domain name resolution will prevent the domain name resolution to take effect, this can cause severe load imbalance between servers. Here, the granularity of the ipvs scheduling algorithm is based on each connection. Different connections of the same user will be scheduled to different servers, therefore, this fine-grained scheduling is much better than DNS scheduling.

 

1.2. Weighted Round Scheduling

The Weighted Round Scheduling (weighted round-robin scheduling) algorithm can solve the problem of different performance between servers. It uses the corresponding weights to indicate the processing performance of servers, the default server weight is 1. If the weight of server a is 1 and that of server B is 2, the processing performance of server B is twice that of server. The Weighted Round Robin Scheduling Algorithm distributes requests to each server based on the weights and call methods.Servers with higher weights receive connections first. servers with higher weights process more connections than servers with lower weights. servers with the same weights process the same number of connections.. The Weighted Round scheduling algorithm process is as follows:

Suppose there is a group of servers s = {S0, S1 ,..., sn-1}, w (SI) indicates the weight of the server Si, an indication variable I indicates the server selected last time, the indication variable CW indicates the weight of the current scheduling, max (s) indicates the maximum weight of all servers in the Set S, and gcd (s) indicates the maximum public weight of all servers in the Set S. Variable I is initialized to-1, and variable CW is initially initialized to zero.

while(true){    i = ( i+1 ) mod n;    if( i==0 ){        cw = cw - gcd(S);        if( cw<=0 ){            cw = max(S);            if( cw==0 ) // (max(S)==0)                return NULL;        }    }    if( W(Si) >= cw )        return Si;}

(Note: When scheduling is started, one or more servers with the highest weight (Cw = max (s) are returned first. After a traversal, the server with the highest ownership value is returned, and re-start traversal (I = 0), and reduce the current weight to a lower level (Cw = CW-gcd (s )), several servers with the highest and Low weights are returned ..., in the end, we can ensure that the selected server frequency is proportional to its weight (gcd (s ). For example, W [Si] = {3, 4, 2, 1} is used in the test, and 1, 0,
1, 0, 1, 2, 0, 1, 2, 3
... (Repeated loops, different colors represent servers that can be retrieved by a traversal ))

For example, if three servers A, B, and C have the values of 4, 3, and 2 respectively, they are in the same scheduling cycle (mod sum (w (SI ))) the scheduling sequence is aababcabc (the sum of weights is 9, so 9 is the cycle ). The Weighted Round scheduling algorithm is relatively simple and efficient.When the requested service time changes greatlyThe separate Weighted Round scheduling algorithm will still cause load imbalance between servers.

From the above algorithm flow, we can see that when the server's weight is zero, the server is not scheduled; when the weight of all servers is zero, that is, for any I has W (SI) if this parameter is set to 0, no server is available. If the algorithm returns NULL, all new connections are lost. Weighted Round scheduling does not need to record the status of all current connections, soIt is also a kind of stateless scheduling..

 

1.3. Minimal connection Scheduling

The least-connection scheduling algorithm is used to allocate new connection requests to servers with the smallest number of connections. The minimum connection scheduling isA Dynamic Scheduling AlgorithmIt estimates the server load by the number of active connections on the server. The scheduler needs to record the number of established connections on each server. When a request is scheduled to a server, the number of connections increases by 1. When the connection ends or times out, the number of connections decreases by one.

In system implementation, when the server's weight is zero, it indicates that the server is unavailable but not scheduled. The algorithm flow is as follows:

Suppose there is a group of servers s = {S0, S1,..., Sn-1}, w (SI) indicates the server Si weight, C (SI) indicates the current number of connections to the server Si.

for(m=0; m<n; m++){    if( W(Sm)>0 ){        for(i=m+1; i<n; i++){            if( W(Si)<=0 )                continue;            if( C(Si) < C(Sm) )                m = i;        }        return Sm;    }}return NULL;

(Note: This algorithm starts from 0 each time and traverses the server with the minimum number of connections and returns the result)

When each server has the same processing performance, the minimum connection scheduling algorithm can smoothly distribute requests with large load changes to each server, all requests that have been processed for a long time cannot be sent to the same server. However, when the processing capability of each server is different, this algorithm is not ideal,Because the TCP connection enters the time_wait status after processing the request, the TCP time_wait status is generally 2 minutesIn this case, the connection still occupies server resources, so the high-performance server has processed the received connection and the connection is in the time_wait status, low-performance servers are busy processing the received connections and constantly receiving new connection requests.

 

1.4. weighted least connection Scheduling

The weighted least connection Scheduling (weighted least-connection scheduling) algorithm is the superset of the minimum connection scheduling. Each server uses the corresponding weights to indicate its processing performance. The default server weight is 1. the system administrator can dynamically set the server weight.Weighted Least connection scheduling tries its best to make the established connections of the server proportional to the weights of the new connections during scheduling.. The process of the weighted least link scheduling algorithm is as follows:

Suppose there is a group of servers s = {S0, S1,..., Sn-1}, w (SI) indicates the server Si weight, C (SI) indicates the current number of connections to the server Si. The total number of current connections on all servers is csum = sum C (SI) (I = 0, 1,... N-1 ). The current new connection request will be sent to the server SM,

If and only when the server SM meets the following conditions

(C (SM)/csum)/w (SM) = min {(C (SM)/csum)/w (SM)} (I = 0, 1 ,..., n-1 ),Where W (SI) is not zero

(Note: equivalent to the most cost-effective)

Because csum is a constant in this round of search, the judgment condition can be simplified

C (SM)/w (SM) = min {C (Si)/w (SI)} (I = 0, 1,... N-1 ),Where W (SI) is not zero

The division operation requires more CPU cycles than multiplication, and the floating point division is not allowed in the Linux kernel. The server's weight is greater than zero, so the judgment condition is C (SM)/w (SM)> C (Si)/w (SI) can be further optimized to C (SM) * w (SI)> C (SI) * w (SM ), at the same time, ensure that the server's weight is zero and the server is not scheduled. Therefore, the algorithm only needs to execute the following process.

for(m=0; m<n; m++){    if( W(Sm) > 0 ){        for(i=m+1; i<n; i++){                        if( C(Sm)*W(Si) > C(Si)*W(Sm) )                m = i;        }        return Sm;    }}return NULL;

1.5. Local-based minimum connection Scheduling

The locality-based least connection Scheduling (lblc) algorithm isServer Load balancer scheduling for the target IP address of the Request MessageCurrently, it is mainly used in the cache cluster system because the target IP address of the customer request message in the cache cluster is changed. Assume that any backend server can process any request,The algorithm is designed to schedule requests with the same IP address to the same server when the server load is basically balanced, to improve the access locality and master cache hit rate of each server, this improves the processing capability of the entire cluster system.

The lblc scheduling algorithm first finds the Server recently used by the target IP address of the request. If the server is available and is not overloaded, the request is sent to the server. If the server does not exist, alternatively, if the server is overloaded and has half of the server's workload, select an available server based on the "least connection" principle and send the request to the server. The detailed process of this algorithm is as follows:

Suppose there is a group of servers s = {S0, S1,..., Sn-1}, w (SI) indicates the server Si weight, C (SI) indicates the current number of connections to the server Si. Servernode [dest_ip] is an association variable, indicating the server node corresponding to the target IP address,Generally, it is implemented through the hash table.. Wlc (s) indicates the weighted least connected server in the Set S, that is, the preceding weighted least connected scheduling. Now is the current system time.

If (servernode [dest_ip] is null) Then {// The Destination IP node does not exist n = wlc (s); // select weighted least join scheduling if (N is null) then return NULL; servernode [dest_ip]. server = N;} else {n = servernode [dest_ip]. server; If (n is dead) or (C (n)> W (N) and there is a node m with C (m) <W (m)/2 )) then {n = WLS (n); If (N is null) then return NULL; servernode [dest_ip]. server = n ;}} servernode [dest_ip]. lastuse = now; return N;

In addition, periodic garbage collection (garbage collection) is required for the servernode [dest_ip] Association variable, and expired target IP addresses are collected to the server Association items. The expiration Association items refer to those associated items that are the current time (the system clock beat count jiffies is used for implementation) Minus those that have been recently used and have exceeded the preset expiration time, the default expiration time is 24 hours.

 

1.6. Local least join Scheduling with replication

The locality-based least connection with replication Scheduling (lblcr) algorithm is also used for load balancing of the target IP address. It is mainly used in the cache cluster system. It differs from the lblc Algorithm in that it maintainsGroupServer ing, while the lblc algorithm maintainsOneServer ing. For service requests from a "popular" site, a cache server may be too busy to process these requests. At this time, the lblc scheduling algorithm selects a cache server based on the "minimum connection" principle from all the cache servers and maps the "hot" site to this cahce server, soon this cache server will be overloaded and will repeat the above process to select a new cache server. In this way, the "hot" site may be mapped to a group of cache servers (server sets). When the request load of the "hot" site increases, it will increase the cache servers in the set to handle the increasing load. When the request load of the "popular" site is reduced, it will reduce the number of reasonable cache servers. In this way, the images of the "popular" site are unlikely to appear on all the cache servers, thus improving the usage efficiency of the cache cluster system.

The lblcr algorithm first finds the corresponding IP address based on the target IP address of the request.Server GroupSelect a server from the server group according to the "minimum connection" principle. If the server is not overloaded, send the request to the server. If the server is overloaded, select a server from the cluster according to the "minimum connection" principle, add the server to the server group, and send the request to the server. At the same time, when the server group has not been modified for a period of time, delete the busiest server from the server group to reduce the degree of replication. The lblcr scheduling algorithm process is as follows:

Suppose there is a group of servers s = {S0, S1,..., Sn-1}, w (SI) indicates the server Si weight, C (SI) indicates the current number of connections to the server Si. Serverset [dest_ip] is an association variable that indicates the server set corresponding to the target IP address. Generally, it is implemented through the hash table. Wlc (s) indicates the weighted least connection server in the Set S, that is, the weighted least connection Scheduling in front; WGC (s) indicates the weighted maximum connection server in the Set S. Now indicates the current system time, lastmod indicates the latest modification time of the Set, and t indicates the set time to adjust the set.

if( ServerSet[dest_ip] is NULL ) then {    n = WLC(S);    if( n is NULL ) then return NULL;    add n into ServerSet[dest_ip];}else{    n = WLC(ServerSet[dest_ip]);    if( (n is NULL) OR (n is dead) OR ( C(n)>W(n) AND there is a node m with C(m)<W(m)/2 ) ) then {        n = WLC(S);        if( n is NULL ) then return NULL;        add n into ServerSet[dest_ip];    }else if( |ServerSet[dest_ip]| > 1 AND Now - ServerSet[dest_ip].lastmod > T ) then{        m = WGC(ServerSet[dest_ip]);        remove m from ServerSet[dest_ip];    }}ServerSet[dest_ip].lastuse = Now;if( ServerSet[dest_ip] changed ) then    ServerSet[dest_ip].lastmod = Now;return n;

In addition, the serverset [dest_ip] Association variable also needs to be periodically recycled (garbage collection), and the expired target IP address is recycled to the server Association item. Expiration items refer to those associated items whose current time (the system clock beat count jiffies is used for implementation) minus the lastuse time exceeds the specified expiration time, the default expiration time is 24 hours.

 

1.7. Target address hash Scheduling

The destination hashing scheduling algorithm is also used for load balancing of the target IP address,But it is a static ing algorithm.To map a target address to a server using a hash function.

The target address hash scheduling algorithm first uses the target IP address of the request as the hash key to find the corresponding server from the static allocation hash list, if the server is available and is not overloaded, send the request to the server. Otherwise, null is returned. The process of this algorithm is as follows:

Suppose there is a group of servers s = {S0, S1,..., Sn-1}, w (SI) indicates the server Si weight, C (SI) indicates the current number of connections to the server Si. Servernode [] is a hash table with 256 buckets. Generally, the number of servers is much smaller than 256. Of course, the table size can also be adjusted. Algorithm initialization places all servers in sequence and cyclically into servernode. If the number of connections on the server is greater than 2 times, the server is overloaded.

n = ServerNode[hashkey(dest_ip)];if( (n is dead) OR (W(n)==0) OR ( C(n)>2*W(n) ) ) then     return NULL;return n;

In implementation, we use the multiplication hash function of prime numbers,Multiply by the prime number to make the hash key value more evenly distributed as much as possible.. ThePrime multiplication Hash FunctionAs follows:

static inline unsigned hashkey( unsigned int dest_ip ){    return (dest_ip*2654435761UL) & HASH_TAB_MASK;}

Where 2654435761ul is a prime number from 2 to 2 ^ 32 (4294967296) which is indirectly near the golden division,

(SQRT (5)-1)/2 = 0.618033989

2654435761/4294967296 = 0.618033987

 

1.8. Source Address hash Scheduling

The source address hash Scheduling (scheduling) algorithm is the opposite of the target address hash scheduling algorithm. It acts as a hash key based on the source IP address of the request) find the corresponding server from the static allocation hash. If the server is available and is not overloaded, send the request to the server; otherwise, null is returned. It uses the same hash function as the target address hash scheduling algorithm. Its algorithm flow is basically similar to the target address hash scheduling algorithm. In addition to switching the target IP address of the request to the source IP address of the request, this is not described here.

In practical applications, source address hash scheduling and target address hash scheduling can be used in a firewall cluster to ensure the unique entrance and exit of the entire system.

2. Dynamic Feedback Load Balancing Algorithm

The server Load balancer algorithm dynamically reflects the server's real-time load and response, and constantly adjusts the proportion of requests processed between servers to prevent some servers from receiving large numbers of requests when they are overloaded, this improves the throughput of the entire system. Displays the working environment of the algorithm, runs the monitor daemon process on the load scheduler, and monitors and collects the load information of each server. Monitor daemon can calculate a comprehensive load value based on multiple load information. Monitor daemon calculates a new set of weights based on the combined load value of each service and the current weight value. If the difference between the new weight value and the current weight value is greater than the set threshold value, monitor daemon sets the server's weight to ipvs scheduling in the kernel. In the kernel, connection scheduling generally uses the Weighted Round call scheduling algorithm or the weighted least connection scheduling algorithm.

 

 

 

2.1. Connection Scheduling

When a customer accesses the network through a TCP connection, the time required by the server and the computing resources to be consumed vary widely, depending on many factors. For example, it depends on the service type gently, the current network bandwidth, and the current server resource utilization. Some heavy-load requests require computing-intensive queries, database access, and a long response data stream. requests with a lighter load ratio only need to read an HTML page or perform simple calculations.

Different request processing times may lead to server utilization skew (Skew), that is, server load imbalance. For example, a web page contains A, B, C, and D files, where D files are large image files,The browser needs to establish four connections to read these files.. When multiple users access the page simultaneously through a browser, the most extreme situation is that all requests to D files are sent to the same server. Therefore, some servers are overloaded, while other servers are basically idle. At the same time, some servers are too busy to have a long request queue and are constantly receiving new requests. In turn, this will cause the customer to wait for a long time and feel that the service quality of the system is poor.

2.1.1 simple connection Scheduling

Simple connection scheduling may cause server skew. In the preceding example, if a scheduling algorithm is used and there are four servers in the cluster, one server must always receive requests from the D file. This scheduling policy will lead to low utilization of system resources, because some resources are exhausted, resulting in long waiting for the customer, while other resources are idle.

2.1.2 actual TCP/IP traffic characteristics

The document [1] shows that network traffic occurs in a wave. After a long period of small traffic, there will be a large access traffic, followed by a small traffic, which will occur cyclically like a wave. The articles [2, 3, 4, and 5] reveal the characteristics of self-similar network traffic on the WAN and LAN, as well as self-similar web access flows. This requires a dynamic feedback mechanism that uses the status of the server group to cope with the self-similarity of the access stream.

2.1.3 dynamic feedback to Server Load balancer

The characteristics of TCP/IP traffic are generally composed of many short transactions and some long transactions,The workload of long transactions accounts for a large proportion of the total workload.. Therefore, we need to design a Server Load balancer algorithm to avoid the requests of long transactions being distributed to some machines. Instead, we should try to add burst as much as possible) is split into relatively even distributions.

We propose a dynamic feedback-based load balancing mechanism to control the distribution of new connections and control the loads of each server. For example, the weighted round-robin scheduling algorithm is used in the kernel of the ipvs scheduler to schedule new request connections; run monitor daemon in the user space of the load scheduler. Monitor daemon regularly monitors and collects the load information of each server, and calculates a comprehensive load value based on multiple load information. Monitor daemon calculates a new set of weights based on the overall load value of each server and the current weight value. When the overall load value indicates that the server is busy, the new weight will be smaller than its current weight, so that the number of requests allocated to the server will be less. When the overall load value indicates that the server is in a low utilization rate, the new weight is greater than its current weight to increase the number of requests allocated to the server. If the difference between the new weight and the current weight is greater than the set threshold value, monitor
Daemon sets the server's weight to ipvs scheduling in the kernel. After a certain interval (for example, 2 seconds), monitor daemon queries the status of each server and adjusts the server's weight accordingly. It can be said that this is a negative feedback mechanism, so that the server maintains a good utilization rate.

In the Weighted Round call scheduling algorithm, when the server's weight is zero, the established connection will continue to get the service of the server, and the new connection will not be allocated to the server. The system administrator can set the weight of a server to zero so that the server is quiet. When all existing connections are completed, the system administrator can cut out the server and maintain it. Maintenance work is indispensable to the system, such as hardware upgrades and software updates. The zero-weight value makes the server quiet function very important. Therefore, we need to ensure this function in the dynamic feedback load balancing mechanism. When the server's weight is zero, we do not adjust the server's weight.

 

References:

William stalling, viewpoint: self-similarity upsets data traffic assumptions, IEEE spectrum, January 1997.
Kihong Park, gitae Kim, Mark crovella, "on the effect of traffic self-similarity on network performance", in Proceedings of the 1997 SPIE International Conference on performance and control of network systems, 1997.
Nicolas D. georganas, self-similar ("Fractal") Traffic in ATM Networks, in Proceedings of the 2nd International Workshop on advanced teleservices and high-speed communications ubuntures (iwaca '94 ), pages 1-7, Heidelberg, Germany, September 1994.
Mark crovella and Azer besavros, explaining World Wide Web traffic self-similarity. Technical Report, Boston University, October 1995, TR-95-015.
Bruce A. Mah. An empirical model of HTTP network traffic. In Proceedings of Infocom 97, Kobe, Japan, limit l 1997.
Red Hat high availability server project, http://ha.redhat.com/the Linux virtual server project, http://www.LinuxVirtualServer.org/

 

 

 

 

 

 

 

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.