Load balancing based on L3/4 load

Source: Internet
Author: User

L3/4 Load Balancing
  
Another common load balance is L3/4 load balancing. The L3/4 actually means that the load balancing server is load-balanced based on the data contained in the third layer of the network layer and layer fourth Transport layer (Transport layer) in the OSI model. In this load-balancing server, this data primarily contains the IP header of the packet and the protocol headers for the TCP, UDP, and other protocols:

The L3/4 load Balancing server works very simply: When data arrives, the load balancing server determines and forwards the service instance that needs to process the data, based on its own algorithm and the data contained in the OSI model layer three or four.
  
The entire load balancing operation consists of three things: the load balancing server needs to know what the current valid service instance is, and according to its own dispatch algorithm, decides the service instance that needs to process the data, and sends the data to the target service instance according to the calculation result of the dispatch algorithm.
  
Let's start by looking at how the load balancing server determines the validity of the service instance. To ensure that packets distributed from a load-balanced server can be handled properly by the server cluster behind it, the load-balancing server needs to periodically send status query requests to detect which service instances are working effectively. This status query request often goes beyond the awareness of many: if the service instance crashes but the operating system that hosts it works correctly, the operating system will still respond gracefully to the ping command issued by the load-balanced server, but the TCP connection will fail at this point, and if the service instance does not crash, but just hangs, Then it can still accept the TCP connection, but cannot receive the HTTP request.
  
Because this status query request is actually specific to the implementation of a service instance, many load-balancing servers allow users to add custom scripts to execute queries that are specific to the service instance. These status query requests often contain many tests and even try to return data from the service instance.
  
Once the load-balancing server discovers that a service instance it manages is no longer valid, it will no longer forward any data to that service instance until the service instance returns to its normal state. In this case, each of the other service instances would need to share the work that the failed server had originally undertaken.
  
One thing to note here is that after a service instance fails, the entire system should still have enough total capacity to handle the load. For example, if a load-balancing server manages three service instances with the same capabilities, and the three service instances each have a load of about 80%. If one of the service instances fails, then all the load needs to be done by the other two service instances. Each service instance needs to assume 120% of the load, far exceeding the load capacity it has. The direct consequence of this is that the service appears very unstable and often has a system timeout and the application does not work properly.
  
Ok. Now, assuming that our load-balancing server has a well-designed stateful query request, it allocates the load for the working service instance based on the load-balancing algorithm it uses. The most common misconception for people who first come into load balancing is that the load-balancing server determines the service instances that the request needs to reach based on the response speed or load conditions of each service instance.
  
Typically, the Round Robin algorithm is the most common and best performing load balancing algorithm. If the capacity of each service instance is not the same, the load balancing server uses the weighted Round Robin algorithm, which distributes the load Lai an proportionally to the actual capabilities of each service instance. In some commercial load-balancing servers, it is true that these allocations are automatically slightly adjusted based on factors such as the load of the current service instance and response time, but they are not a decisive factor.
  
If you use the round robin algorithm purely, each request with an associated relationship may be assigned to a different service instance. As a result, many load-balancing servers allow these loads to be allocated based on specific characteristics of the data, such as using a hashing algorithm to compute the IP of the user and determining the service instance that needs to be allocated as a result of the calculation.
  
Similarly, we also need to consider the case where a server instance fails. If a server instance in the load-balancing system fails, the hash space in the hashing algorithm changes, causing the original service instance assignment result to be no longer valid. In this case, all requests are reassigned to the server instance. Also, in some cases, the user's IP may change between requests, causing the service instance that it corresponds to change. Of course, don't worry, the following explanation of the L7 load balancing server will give you a solution.
  
After determining the destination address of the packet, what the load-balancing server needs to do is forward it to the target instance. The load-balancing server uses a forwarding method that is divided into three main types: Direct routing,tunnelling and IP address translation.
  
When using direct routing mode, the load balancing server and each service instance must be on the same network segment and use the same IP. When data is received, the load-balancing server forwards the packets directly. Each service instance can then return the response to the load-balanced server after processing the packet, or it can choose to send the response directly to the user without having to load-balance the server again. The latter method of return is called Direct Server return. It works as follows:

In this process, both the load balancing server and the individual service instances do not need to make any changes to the IP (Internet Protocol) layer data to forward them. The throughput of a load-balanced server using this forwarding method is very high. In turn, this organizational approach also requires the cluster's builders to have enough understanding of TCP/IP protocols.
  
Another way of forwarding tunnelling is actually similar to direct routing. The only difference is that a series of channels are established between the load balancing server and the individual services. Software developers can still choose to use direct server return to mitigate load-balancing server load.
  
The IP Address translation is very different from the first two methods. The destination address that the user is connected to is actually a virtual address (vip,virtual IP). The load-balanced server, when it receives the request, translates its destination address into the actual address (Rip,real IP) where the service instance resides, and changes the source address to the address where load balancer resides. When the request is processed, the service instance sends the response to the load-balanced server. The load balancing server then changes the address of the response to a VIP and returns the response to the user. In this forwarding mode, the running process is as follows:

Some attentive readers will ask: in the process of message delivery, the user IP is no longer in the message, the load-balanced server when the response should be returned when the user's IP address should be restored? In fact, in this forwarding mode, the load-balancing server maintains a series of sessions to record information about each individual request that is being processed through the load-balanced server. But these sessions are very dangerous. If you set the duration of the session to longer, you need to maintain too many sessions on a high-concurrency load-balancing server. Conversely, if the duration of the session is set too short, it is possible to cause an ACK storm to occur.
  
Let's look at a longer session duration. Assuming that the current load-balancing server receives 50,000 requests per second, and that the load-balanced server has a session expiration time of 2 minutes, it needs to maintain 6 million sessions. These sessions consume a significant portion of the load-balancing server's resources. And at the peak of the load, the resources it consumes can grow exponentially, putting more pressure on the server.
  
However, setting the session duration to a shorter time is more cumbersome. This causes an ACK Storm between the user and the load-balanced server, which consumes a lot of bandwidth from the user and the load-balanced server. In a TCP connection, the client and the server need to communicate through their respective sequence number. If a session on a load-balanced server fails quickly, it is possible for other TCP connections to reuse the session. The sequence number on both the client and server side of the reused session is regenerated. If the original user sends the message again at this time, the load balancing server notifies the client that it has an error sequence number by an ACK message. And after the client accepts the ACK message, it sends another ACK message to the load-balancing server notifying the service that the sequence number it owns has an error. After the server accepts the ACK message, the ACK message is sent again to the client notifying it that it owns the sequence number error ... This will continue to send this meaningless ACK message between the client and the server until an ACK message is lost during network transmission.
  
So at first glance, the use of IP Address translation is easiest, but it is the most dangerous and the most cost-effective option compared to the other two scenarios.

Load balancing based on L3/4 load

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.