Linux cluster Load Balancing principles and implementation algorithms

Source: Internet
Author: User

With the decline in computer hardware prices and the development of computer network topology, distributed computer systems provide users with a rich set of resources. When studying distributed systems, people have noticed the following problem: in a network-Connected Multi-computer environment, some computers have heavy load at a certain time, the load on other computers is relatively light. Balancing the load between computers is a major goal of task allocation and scheduling. It can improve the performance of the entire system.

In order to improve the system performance, load balancing is achieved by rationally distributing loads among multiple computers, which is usually called load balancing or load sharing. In general, the goal of "Server Load balancer" is to achieve basic load balancing between computers, while "server load sharing" means simply re-distributing loads.

Load Balancing includes two types: static load balancing and dynamic load balancing. The method that ignores the Current Load Status of the system is called static load balancing. The method of adjusting task division based on the current load status of the system is called dynamic load balancing.

Load imbalance is mainly caused:

  1. The iteration size of some algorithms is not fixed, but the iteration size can be obtained during compilation;
  2. The iteration size of some algorithms is not fixed, and the iteration size depends on the processed data, which cannot be obtained during compilation;
  3. Even if the iteration size is fixed, there are many uncertainties that may lead to different computing speeds.

The three reasons are investigated. In the first case, the workload of each iteration can be estimated during compilation, and the iteration is distributed based on the processing capability of the processing node. This is the static load balancing method. In the second and third cases, dynamic load balancing measures must be used to dynamically migrate tasks based on the tasks completed by each processing node during the operation to achieve dynamic load balancing. The processing capability of the processing node needs to be investigated for Dynamic Load Balancing. The basic basis of the processing node is to review the future processing speed based on the previous processing speed of the processing node.

Load Balancing Algorithm

An SLB algorithm consists of the following three parts:

  • Information Policy: sets the load, task volume, and information allocation methods used by the creator of the task placement policy.
  • Transfer Policy: Based on the task and Computer load, determine whether to send a task to another computer for processing.
  • Placement policy: for tasks that are suitable for being transferred to other computers for processing, select the target computer for which the task will be transferred.
  • The above three parts of Server Load balancer interact in different ways. The placement policy uses the load information provided by the information policy. It takes action only when the task is determined by the transfer policy as appropriate for transfer.

    In short, the goal of Server Load balancer is to provide the shortest average task response time, adapt to changing loads, and provide a reliable load balancing mechanism.

    Information Policy

    Parameters used to describe the load information include:

  • Number of tasks in the running queue;
  • System Call rate;
  • CPU context switching rate;
  • Percentage of idle CPU time;
  • The size of idle memory is K bytes );
  • Average load within 1 minute. For these single load description parameters, 1st), that is, using the number of tasks in the running queue as the parameter describing the load proved to be the most effective, that is, its average task response time is the shortest, and has been widely used. However, if more parameters are collected to make the system information more comprehensive, the expected performance improvement is often not achieved due to the additional overhead. For example, the average response time obtained by combining one OR two of the six parameters is worse than the average response time of a single parameter.
  • Transfer Policy

    For the sake of simplicity, when selecting a transfer policy, multiple select a threshold policy. For example, Eager and others do not need to exchange status information between computers when determining whether to process a task locally, once the length of the service queue or wait queue exceeds the threshold value, this task is sent and the task just received is sent. Process Migration can migrate ongoing tasks, which is an improvement for transferring only the tasks just received.

    When Zhou simulates seven Load Balancing algorithms, its transfer policies adopt threshold policies. Its Threshold Policy is based on two thresholds: Computer Load threshold and task execution threshold TCPU. If the computer Load exceeds Load and the task execution time exceeds TCPU, the task is transferred to other computers for execution.

    Placement policy

    After summary, there are four placement policies.

  • Centralized policy. Every P seconds, one of the computers is specified as the "load information center" LIC), accept the change values of all other loads, and aggregate them into a "load vector, then broadcast the load vector to all other computers. When a computer determines that a task is suitable for being transferred to another computer for execution, it sends a request to LIC and informs the current load value. LIC selects a computer with the minimum length of the running queue and notifies the computer where the task is located to send the task to it. It also increases the load value of the target host by 1.
  • Threshold Policy. Select a computer randomly and determine if the task queue length of the computer exceeds the threshold value after the task is transferred to that computer. If the threshold value is not exceeded, the task is sent. Otherwise, another computer is randomly selected and determined in the same way. continue until a suitable target computer is found, or the number of probes exceeds a static value limit LP. When a task reaches a computer, it must be processed regardless of its status.
  • The shortest task queue policy. Select LP computers at random, view the task queue length of each computer, and transfer the task to a computer with the shortest task queue length. When a task arrives at a computer, the target computer must process the task regardless of its status. When a simple improvement to this policy is made, a computer with a queue length of 0 will no longer be detected, as it can be determined that the computer is an acceptable target computer.
  • Retention policy. When a task leaves a computer, the computer checks the local load. If the load is smaller than the threshold value T1, it detects other computers, register the name of the computer on which the R server load is greater than T1, and keep the registration content in a stack. When a task arrives at an overloaded computer, it is sent to the computer at the top of the Computer stack. If the load on a computer is lower than T1, all computer names retained in the stack will be cleared.
  • In Zhou's thesis, two and three strategies are compared, and the conclusion is: using a small amount of State information in a simple calculation method is not expensive), 2nd) medium methods often get better results than 3rd. 3rd) the method is complicated. It must use performance improvement to compensate for extra costs, so the effect is slightly worse [2].

    Algorithm Implementation

    Currently, common algorithms include central task scheduling policies, gradient model policies, sender startup policies, and receiver startup policies.

  • Central Task Scheduling Policy

    As the name suggests, the central task scheduling policy is to assign a specific processing node as the computing task distributor. We call it a scheduling node, and call other processing nodes that complete computing tasks a computing node. To master the task distribution, the scheduling node must maintain a task distribution table.

    When a task is started, the scheduling node loads the pre-distribution table of the task, and the computing node begins to complete the computing task assigned to it. During execution, the compute node submits computing task completion information to the scheduling node according to a certain cycle. The scheduling node determines the processing capability of the processing node based on the information and Issues Task Migration instructions, the control task flows from a heavy-load node to a light-load node, and the task distribution table on the scheduling node is also updated accordingly.

    The advantages of central task scheduling are obvious: the scheduling node can master the processing capability and task distribution of all nodes, so it can find the best task migration method based on various factors. However, when there are many computing nodes, all computer points must communicate with the central node, which will inevitably lead to serious conflicts, which will greatly reduce the efficiency of dynamic load balancing, it even offsets all the benefits of dynamic load balancing.

    In addition, it should be noted that when the scheduling node determines whether to perform task migration, the computing node must wait, which undoubtedly wastes the processing capability of the computing node. An improved method is that the scheduling node stores the current task completion status of the computing node immediately when it receives it, and returns the judgment made based on the previous information to the computing node to reduce latency, when you are idle, you can judge based on the information you receive this time and leave it for the next use. In this way, the scheduling node's task migration decision time overlaps with the computing node's computing task completion time, reducing the overhead of dynamic load balancing.

  • Gradient Model Policy

    In the central task scheduling policy, the more computing nodes there are, the more conflicts there are. To reduce conflicts and adapt to the needs of a large number of processing nodes. No dedicated scheduling nodes are set for the gradient model policy, sender start policy, and receiver start policy. Each node only interacts with some nodes to reduce conflicts caused by load balancing.

    In the gradient model policy, all processing nodes only interact with directly adjacent nodes, and two thresholds are introduced: M 1 and Mh. During the running process, we divide the node into three types: Set the remaining task volume of a node to t. If t <M 1 is called a light load node; if M 1 <t <M h, it is called a medium load node; if t> M h, it is called a heavy load node. In addition, the shortest distance from a node to the nearest light load node is defined as the gradient of the node.

    At startup, each node is a heavy-load node, and the gradient is infinite. In the calculation process, check whether the remaining load is greater than M1 periodically. When a node finds that its remaining load is less than M1, it sets its gradient to zero, and then sends the distance between its current gradient and adjacent nodes to adjacent nodes. When the adjacent node receives the new gradient, it compares it with its current gradient. If the gradient is smaller than or equal to the new gradient, nothing is done. If the gradient is greater than the new gradient, the gradient is set as a new gradient, the new gradient is transmitted to the node that sends the new gradient information. In this way, the gradient information is actually when the computing task is completed. The heavy load node can send its excessive load to the node with the smallest gradient to the critical point. Therefore, the computing task flows from a heavy-load node to a light-load node under the guidance of a gradient.

    However, in this way, too many computing tasks may flow to a light-load node. If a light-load node receives too many new tasks, it may become a load node or even a heavy-load node again. Once this happens, the gradient information will no longer guide the computing task correctly. Therefore, the light-load node must view the computing task to be accepted. If the node is not in the light-node state, the node is rejected. This delay in receiving the task, but over time, the light-load node will be able to accept the task that it rejects soon, while maintaining the effectiveness of the node gradient.

  • Sender startup Policy

    The sender launch policy also introduces a threshold value M to divide all processing nodes into light and heavy load nodes, all the nodes with the current remaining load t> M are called heavy load nodes, and the nodes with t <M are called Light Load nodes. The sender's startup policy also defines a related domain for each node. The node only interacts with nodes in the relevant domain and transmits tasks. An intuitive definition of the relevant domain is to regard all adjacent nodes as the relevant domain.

    At startup, all nodes start to execute computing tasks. After a period of execution, the node starts to check whether it is a heavy-load node. If it is a heavy-load node, it tries to evenly distribute tasks in the relevant domain. Specifically: Set the load of this heavy load node to l p. There are K nodes in the relevant domain, and their loads are l 1 ,......., L k, the average load of L avg is:


    In order to achieve even distribution, we should obtain the load volume m k that the heavy load node should pass to each node in the relevant domain. We first introduce the h k to avoid migrating the load to the heavy load node with the heaviest load in the relevant domain. If L avg> l k, otherwise h k = 0. Then m k is


    Then the node can send tasks to related nodes according to m k.

  • Receiver startup Policy

    In addition to being started by a light-load node, the receiver's start policy and the sender's start policy require other nodes to send tasks to it. The other policies are basically the same.

    The receiver launch policy also introduces M to distinguish between light and heavy load nodes, and introduces related domains to determine the interaction scope.

    At startup, all nodes start to execute computing tasks. After a period of time, once a node discovers itself as a light-load node, it tries to evenly distribute the load in its related domain. Specifically: Set the load of the Light-load node to l p. There are K nodes in the relevant domain, and their loads are l 1 ,......., L k, the average load of L avg is:


    In order to achieve even distribution, the load volume m k that the nodes in the relevant domain should pass to the light-load nodes should be obtained. We first introduce the permission h k to prevent the load from being migrated from nodes in the more load-intensive domain to this node. If L avg <l k, otherwise h k = 0. Then m k is:


    Then the node can send a request to accept the task according to m k.


      1. Contact Us

        The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

        If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

        A Free Trial That Lets You Build Big!

        Start building with 50+ products and up to 12 months usage for Elastic Compute Service

        • Sales Support

          1 on 1 presale consultation

        • After-Sales Support

          24/7 Technical Support 6 Free Tickets per Quarter Faster Response

        • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.