Detailed description of LVS principles and LVS principles
1. What is LVS?
A group of servers are connected to each other through a high-speed LAN or geographically distributed WAN, and there is a Load Balancer at their front end ). The load scheduler can seamlessly schedule network requests to real servers, so that the structure of the server cluster is transparent to customers, the network services provided by the customer to access the cluster system are the same as accessing a high-performance and high-availability server. The customer program is not affected by the server cluster and does not need to be modified. System scalability is achieved by transparently adding and deleting a node in the service cluster, and high availability is achieved by checking node or service process faults and correctly resetting the system. Because our load scheduling technology is implemented in the Linux kernel, we call it Linux Virtual Server ).
Ii. Three working modes of LVS
In the implementation technology of the scheduler, IP Server Load balancer technology is the most efficient. In the existing IP Server Load balancer technology, Network Address Translation is used to construct a group of servers into a high-performance, high-availability virtual server, we call it VS/NAT technology (Virtual Server via Network Address Translation). This method is used in most commercialized IP Server Load balancer schedulers, for example, Cisco's LocalDirector, F5's Big/IP, and Alteon's ACEDirector. Based on the analysis of the disadvantages of VS/NAT and the asymmetry of network services, we propose to implement Virtual Server VS/TUN (Virtual Server via IP Tunneling) through IP Tunneling ), and Virtual Server VS/DR (Virtual Server via Direct Routing) through Direct Routing, which can greatly improve system scalability. Therefore, the IPVS software implements these three IP load balancing technologies. Their general principles are as follows (We will describe their work principles in other chapters ),
Virtual Server via Network Address Translation (VS/NAT)
Through network address translation, the scheduler overwrites the target address of the request message and distributes the request to the real backend server based on the preset scheduling algorithm. When the response packet of the real server passes through the scheduler, the source address of the message is overwritten and then returned to the customer to complete the load scheduling process.
Virtual Server via IP Tunneling (VS/TUN)
When NAT technology is used, because requests and response packets must be overwritten by the scheduler address, the processing capability of the scheduler becomes a bottleneck when there are more and more customer requests. To solve this problem, the scheduler forwards the request message to the real server through the IP tunnel, and the real server directly returns the response to the customer. Therefore, the Scheduler only processes the request message. Generally, the network service response is much larger than the request message. after VS/TUN technology is used, the maximum throughput of the cluster system can be increased by 10 times.
Virtual Server via Direct Routing (VS/DR)
VS/DR rewrite the MAC address of the request message to send the request to the Real Server, while the real server directly returns the response to the customer. Like VS/TUN technology, VS/DR technology can greatly improve the scalability of Cluster Systems. This method does not involve the overhead of the IP tunnel, and does not require real servers in the cluster to support the IP tunnel protocol, however, the scheduler and the Real Server must have a network card connected to the same physical network segment.
Iii. LVS Scheduling Algorithm
For different network service requirements and server configurations, the IPVS scheduler implements the following eight load scheduling algorithms:
Round Robin)
The scheduler uses the "Wheel call" scheduling algorithm to distribute external requests to the real servers in the cluster in turn in sequence. The scheduler treats each server equally, regardless of the actual number of connections and system load on the server.
Weighted Round call (Weighted Round Robin)
The scheduler uses the Weighted Round call scheduling algorithm to schedule access requests based on different processing capabilities of the Real Server. This ensures that servers with high processing capabilities can process more access traffic. The scheduler can automatically query the actual server load and dynamically adjust its weight.
Least Link (Least Connections)
The scheduler uses the "least connections" scheduling algorithm to dynamically schedule network requests to servers with the least established connections. If the real server of the cluster system has similar system performance, the "minimum connection" scheduling algorithm can better balance the load.
Weighted Least Connections)
When the server performance in the cluster system is significantly different, the scheduler uses the "weighted least link" scheduling algorithm to optimize the Server Load balancer performance, servers with higher weights will bear a large proportion of the active connection load. The scheduler can automatically query the actual server load and dynamically adjust its weight.
Locality-Based Least Connections)
The "locality-based least link" scheduling algorithm is a load balancing algorithm for the target IP address. It is currently mainly used in Cache Cluster Systems. This algorithm finds the Server recently used by the target IP address of the request. If the server is available and is not overloaded, the request is sent to the server. If the server does not exist, alternatively, if the server is overloaded and has half of the server's workload, use the "least link" principle to select an available server and send the request to the server.
Locality-Based Least Connections with Replication)
The "local least link-based replication" scheduling algorithm is also used for load balancing of the target IP address. It is mainly used in the Cache cluster system. It differs from the LBLC Algorithm in that it maintains a ing from a target IP address to a group of servers, while the LBLC algorithm maintains a ing from a target IP address to a server. This algorithm finds the server group corresponding to the target IP address based on the requested target IP address, and selects a server from the server group based on the "minimum connection" principle. If the server is not overloaded, send requests to the server. If the server is overloaded, select a server from the cluster based on the "minimum connection" principle and add the server to the server group, send the request to the server. At the same time, when the server group has not been modified for a period of time, delete the busiest server from the server group to reduce the degree of replication.
Destination Hashing)
The scheduling algorithm uses the target IP address as the Hash Key to find the corresponding server from the static allocation Hash list, if the server is available and is not overloaded, send the request to the server. Otherwise, null is returned.
Source Hashing)
The "Source Address Hash" scheduling algorithm uses the request's source IP address as the Hash Key to find the corresponding server from the static allocation Hash list, if the server is available and is not overloaded, send the request to the server. Otherwise, null is returned.
Iii. operating principles of LVS
1. when a Virtual server (VS/NAT) User accesses a network service through a Virtual IP Address (Virtual service IP Address) through NAT, the request message arrives at the scheduler, the scheduler selects a server from a group of real Servers Based on the connection scheduling algorithm, and changes the Virtual IP Address of the packet to the Address of the selected server, the destination port of the packet is changed to the corresponding port of the selected server, and the modified packet is sent to the selected server. Meanwhile, the scheduler records the connection in the connection Hash table. When the next packet of the connection arrives, the address and port of the original selected server can be obtained from the connection Hash table, perform the same rewrite operation and send the message to the selected server. When the response packet from the real server passes through the scheduler, the scheduler changes the source Address and source port of the packet to Virtual IP Address and corresponding port, and then sends the packet to the user. In this way, the customer only sees the services provided on the Virtual IP Address, and the structure of the server cluster is transparent to the user.
The following is an example of VS/NAT, as shown in 3:
Figure 3: VS/NAT example
As shown in the following table, all traffic destined for the IP address 202.103.106.5 and port 80 is distributed to the Server Load balancer instance 172.16.0.2: 80 and 172.16.0.3: 8000. The packets whose destination address is 202.103.106.5: 21 are transferred to 172.16.0.3: 21. Packets sent to other ports will be rejected.
Protocol |
Virtual IP Address |
Port |
Real IP Address |
Port |
Weight |
TCP |
202.103.106.5 |
80 |
172.16.0.2 |
80 |
1 |
172.16.0.3 |
8000 |
2 |
TCP |
202.103.106.5 |
21 |
172.16.0.3 |
21 |
1 |
In the following example, we can learn more about the packet rewrite process.
The following source and target addresses may be sent to the Web Service:
SOURCE |
202.100.1.2: 3456 |
DEST |
202.103.106.5: 80 |
The scheduler selects a server from the scheduling list, for example, 172.16.0.3: 8000. The message is rewritten as the following address and sent to the selected server.
SOURCE |
202.100.1.2: 3456 |
DEST |
172.16.0.3: 8000 |
The response packets returned from the server to the scheduler are as follows:
SOURCE |
172.16.0.3: 8000 |
DEST |
202.100.1.2: 3456 |
The source address of the response message will be changed to the address of the virtual service, and then the message will be sent to the customer:
SOURCE |
202.103.106.5: 80 |
DEST |
202.100.1.2: 3456 |
In this way, the customer thinks that the request is correct from the 202.103.106.5: 80 service, but does not know whether the request is handled by server 172.16.0.2 or server 172.16.0.3.
2. implement virtual servers (VS/TUN) through IP tunneling)
In the VS/NAT cluster system, the request and response datagram files must be run through the load scheduler. When the number of real servers is between 10 and 20, the load scheduler will become a new bottleneck for the entire cluster system. Most Internet services have the following characteristics: request packets are short, and response packets often contain a large amount of data. If requests and responses can be processed separately, that is, the Server Load balancer is only responsible for scheduling requests, and the response is directly returned to the customer, which greatly improves the throughput of the entire cluster system.
IP tunneling is a technology that encapsulates an IP packet in another IP packet. It encapsulates the original IP packet, this allows the destination datagram to be encapsulated and forwarded to another IP address. IP tunneling is also known as IP encapsulation ). We use IP tunneling technology to encapsulate request packets and send them to backend servers. response packets can be directly returned to customers from backend servers. However, there is a group of backend servers instead of one. Therefore, we cannot establish a one-to-one tunnel statically, but dynamically select a server, encapsulate and forward request packets to the selected server. As shown in Architecture 4 of VS/TUN, each server configures its VIP address on its own IP tunneling device.
Figure 4: VS/TUN Architecture
Workflow 5 of VS/TUN shows that its connection scheduling and management are the same as those in VS/NAT, but its packet forwarding methods are different. The scheduler dynamically selects a server based on the load of each server, encapsulates the request packet in another IP packet, and then forwards the encapsulated IP packet to the selected server; after the server receives the packet, it first unseals the packet and obtains the packet whose original destination address is VIP. The server sends the packet that the VIP address is configured on the local IP tunneling device. Therefore, this request is processed, then, the response packet is directly returned to the customer based on the route table.
Figure 5 VS/TUN Workflow
It should be noted that, according to the default TCP/IP protocol stack processing, the target address of the request message is VIP, and the source address of the response message must also be VIP, therefore, the response message does not need to be modified and can be directly returned to the customer. The customer considers the response message to be a normal service and does not know which server to process it.
3. Implement Virtual Server (VS/DR) through direct routing)
Similar to VS/TUN, VS/DR uses the asymmetric characteristics of most Internet services. The Server Load balancer only schedules requests, and the server directly returns the response to the customer, this greatly improves the throughput of the entire cluster system.
The architecture of VS/DR is shown in Figure 6: both the scheduler and the server group must have a NIC physically connected through a non-disconnected LAN, such as through a high-speed switch or HUB. The VIP address is shared by the scheduler and the server group. The VIP address configured by the scheduler is externally visible and used to receive request packets from virtual services; all servers configure the VIP address on their Non-ARP network devices. It is invisible to the outside and is only used to process network requests whose target address is VIP.
Figure 6: Architecture of VS/DR
VS/DR workflow: its connection scheduling and management are the same as those in VS/NAT and VS/TUN. Its Packet forwarding methods are different, route packets directly to the target server. In VS/DR, the scheduler dynamically selects a server based on the load of each server, and does not modify or encapsulate IP packets, instead, change the MAC address of the data frame to the MAC address of the server, and then send the modified data frame to the LAN of the server group. Because the MAC address of the data frame is the selected server, the server can certainly receive the data frame and obtain the IP packet. When the server finds that the destination VIP address of the packet is on a local network device, the server processes the packet and then directly returns the response packet to the customer according to the route table. In VS/DR, according to the default TCP/IP protocol stack processing, the target address of the request message is VIP, and the source address of the response message must also be VIP, therefore, the response message does not need to be modified, and can be directly returned to the customer. The customer thinks that the service is normal and does not know which server to process.
Iv. Advantages and disadvantages of the three IP Server Load balancer technologies are summarized in the following table:
_ |
VS/NAT |
VS/TUN |
VS/DR |
Server |
Any |
Tunneling |
Non-arp device |
Server network |
Private |
LAN/WAN |
LAN |
Server number |
Low (10 ~ 20) |
High (100) |
High (100) |
Server gateway |
Load balancer |
Own router |
Own router |
Note: The maximum number of servers supported by the preceding three methods is assumed that the scheduler uses a m Nic. The scheduler's hardware configuration is the same as that of the backend server, and is applicable to general Web services. When higher hardware configurations (such as Gigabit NICs and faster processors) are used as schedulers, the number of servers that the scheduler can schedule increases accordingly. When the application is different, the number of servers changes accordingly. Therefore, the above data estimates mainly compare the scalability of the three methods.
1. Virtual Server via NAT
The advantage of VS/NAT is that the server can run any operating system that supports TCP/IP. It only needs an IP address configured on the scheduler, and the server group can use a private IP address. The disadvantage is that it has limited scalability. When the number of server nodes increases to 20, the scheduler may become a new bottleneck of the system, in VS/NAT, both request and response messages must be sent through the Server Load balancer. We measured the average latency of rewrite packets on the Pentium 166 processor host as 60us, And the latency of the processors with higher performance would be shorter. Assuming that the average length of TCP packets is 536 Bytes, the maximum throughput of the scheduler is 8.93 MBytes/s. assume that the throughput of each server is 800 KBytes/s, so that a scheduler can drive 10 servers. (Note: This is the data obtained a long time ago)
The VS/NAT-based cluster system can meet the performance requirements of many servers. If the load scheduler becomes a new bottleneck of the system, there are three ways to solve this problem: hybrid approach, VS/TUN and VS/DR. In the DNS hybrid cluster system, there are several VS/NAT load schedulers, each of which has its own server cluster, at the same time, these load schedulers form a simple domain name through a RR-DNS. However, VS/TUN and VS/DR are better ways to increase system throughput.
For network services that transmit IP addresses or port numbers in the packet data, you need to write an application module to convert the IP addresses or port numbers in the packet data. This will bring about the workload of implementation, and the Application Module checks the overhead of packets to reduce the system throughput.
2. Virtual Server via IP Tunneling
In the VS/TUN cluster system, the Server Load balancer only schedules requests to different backend servers, and the backend servers directly return the response data to users. In this way, the Server Load balancer can process a large number of requests. It can even tune hundreds or more servers (servers of the same size) without becoming a bottleneck in the system. Even if the load Scheduler only has Mbps full-duplex NICs, the maximum throughput of the entire system can exceed 1 Gbps. Therefore, VS/TUN can greatly increase the number of servers scheduled by the load scheduler. The VS/TUN scheduler can schedule hundreds of servers without becoming a System Bottleneck and can be used to build high-performance super servers.
VS/TUN technology requires servers, that is, all servers must support "IP Tunneling" or "IP Encapsulation" protocols. Currently, VS/TUN backend servers mainly run Linux operating systems. We have not tested other operating systems. Because "IP Tunneling" is becoming a standard protocol for various operating systems, VS/TUN should be applicable to backend servers running other operating systems.
3. Virtual Server via Direct Routing
Like the VS/TUN method, the VS/DR Scheduler only processes client-to-server connections, and the response data can be directly returned to the customer from an independent network route. This greatly improves the scalability of the LVS cluster system.
Compared with VS/TUN, this method has no overhead of IP tunnel, but requires that both the Server Load balancer and the actual server have a NIC connected to the same physical network segment, the server network device (or device alias) does not respond to ARP, or can Redirect packets (Redirect) to the local Socket port.