Kernel Implementation of ipvs Load Balancing Module

Source: Internet
Author: User

Transmission Mode:
[Direct routing mode]: directly searches the route table and uses the destination address of the original data packet as the search key. The IP address configured locally is the destination address of the data packet. Why does the data need to be searched locally? Why do we need to continue routing? This is because the arrival of the local destination is just an illusion that the machines that actually provide services are still behind, that is, the services are balanced by the load. Now the question is, since a local destination IP address is configured, can other machines still configure this IP address? In that case, isn't the IP address conflict?
In direct routing mode, the Server Load balancer and the machine that actually provides services after the Server Load balancer are configured with the same IP address. In the Server Load balancer, the IP address is configured on a physical real Nic, the data packets used to receive the client's data packets. Obviously, these data packets will eventually go to The ip_local_deliver function. Next, the data packets will pass the nf_ip_local_in hook, which happens to be ipvs and so on, next, call the hook function ip_vs_in. After judging that the data packet needs to be load balanced, it will call the packet_xmit callback function of the data structure ip_vs_conn that has been established to the real machine, in this function, information in ip_vs_conn is used to find the route. ip_vs_conn has three important fields: caddr-Client IP address; vaddr-virtual IP address, that is, the IP address configured on the physical Nic of the Server Load balancer. At the same time, the IP address is bound to the loopback Nic of any so-called Server Load balancer instance that provides real services; daddr-this is the IP address of the physical Nic of the balanced machine, which is simply understood as negative Load balancer direct connection.
The data is sent from the egress device of the identified daddr route. The data reaches the daddr and then enters the route query to check whether the data is local-in or forward, the fib_lookup function is called during the search. it searches for Route tables or traverses route rule tables. When multiple_tables is configured, in the end, it finds that the destination IP address is the local address bound to the loopback address, that is, the vaddr, and the data is actually processed by the upper layer. The destination address of the local data packet received from a network port is not configured on this network port because the default routing search policy does not check the relationship between the IP address of the network adapter and the destination IP address., this check does not make much sense, because the destination IP address of most of the pass packets has nothing to do with the IP address of the local Nic, at the beginning of the route check, there is no way to tell whether this is a pass-through package or a local receive package. However, you cannot bind the relationship between the entry network port and the route. You can configure a policy route or add a new rule table in the case of multiple_tables, bind the r_ifindex field of the fib_rule to a hard disk. In this case, the fiber _ lookup will check the field: (R-> r_ifindex & R-> r_ifindex! = Images-> IIF)
To completely hide these real processing machines, you need to hide their virtual IP addresses. Because all machines are configured with the same vaddr, free ARP transmission will lead to a large number of IP address conflicts, so what we need to do is to hide these IP addresses and simply open them to the protocol stack routing lookup that enters the package. The so-called hiding is not to let others outside the local protocol stack know that, because any IP address in the Ethernet uses ARP to let others know that they exist, you need to ping a LAN machine, ARP is required first, and the MAC address of the destination is obtained after the reply, so that the destination can be actually sent. Therefore, as long as the virtual website can be restrained from sending an ARP response, it is configured on loopback, to prevent it from responding to any external ARP request, you only need to configure a Kernel Parameter -- arp_ignore. This parameter controls the reply policy for the request that the IP address is not configured on the NIC that receives the ARP request, vaddr is configured on loopback, and ARP requests must be entered by ethx. Therefore, it does not respond to any ARP requests and does not broadcast free ARP on its own, therefore, this vaddr is actually a "useless" ip address. The useless meaning is that it cannot be addressed.
Which IP Address does the Real Server listen? It listens to vaddr, which is the vaddr configured on loopback. This address is invisible to other hosts except for the Server Load balancer, because it cannot respond to ARP requests (arp_ignoe), you can still use vaddr to connect to services on the Real Server, provided that a static route is set to direct to a specific real server, in fact, if Static Routing is configured, Ping is also possible. The reason why arp_ignore is set is that the real server is afraid of responding to ARP requests and frequent ARP broadcasts. In many cases, it is not necessary to set it.
[Nat mode]: This mode is easy to configure. You do not need to configure virtual IP addresses or ARP kernel parameters. However, the performance of direct routing mode is poor, after all, there are many things to do. The Nat mode is simple, that is, to change the IP address and port information to the IP address and port of the real balancing machines, so that the real service machines are hidden behind the Load balancer, the idea is the same as that of normal Nat.
[Tunnel mode]: The tunnel mode is to repackage data into a new IP data packet, and then send the data to the virtual Nic by modifying the code, which is used by the application for load balancing, it can also be directly sent to an ipip tunnel.
Scheduling Algorithm:
[Rotation Algorithm]: Provides services one by one...
[Weighted...]: Let the kernel understand the "capabilities" of each server based on the configuration. Instead of equal treatment of all servers, the kernel tries its best to allow servers with high processing capabilities to process as many requests as possible.
[... Algorithm]: (scheduling is essentially the same as process scheduling, omitted)
Key data structure:
[Struct ip_vs_app]: indicates an application type, that is, a server Load balancer service. The port and Protocol describe its application layer information. In addition, this struct contains a large number of callback functions, these functions are associated with specific applications.
[Struct ip_vs_conn]: a connection between a Server Load balancer and a real service provider. A Server Load balancer maintains multiple connections for the same ip_vs_app at the same time, this is the significance of Server Load balancer. Caddr indicates the IP address of the client to be served. vaddr is the IP address of the Server Load balancer. In direct routing mode, it is also the virtual IP address bound to the loopback of the real machine, and daddr is the destination IP address, that is, the IP address that can be routed to the real service provider. The app is bound to ip_vs_app for the connection, and packet_xmit is the callback function for sending. Different modes have different implementations.
[Struct ip_vs_service]: a service description. The above ip_vs_conn is a connection of the ip_vs_service. Its ADDR represents a virtual IP address, that is, the Public Service IP address. In fact, the service is not provided by this IP address, but must be routed to another IP address, protocol and port have the same meaning as ADDR, but they are layer-4 information.
[Struct ip_vs_protocol]: layer-4 protocol identifier. Conn_schedule is its scheduling callback function. Note: Server Load balancer scheduling is based on the layer-4 protocol, while sending is based on the connection.
Code:
[Register several hooks in the netfilter System]:
Static struct nf_hook_ops ip_vs_in_ops = {
. Hook = ip_vs_in,
. Owner = this_module,
. PF = pf_inet,
. Hooknum = nf_ip_local_in,
. Priority= 100,
};
Nf_ip_local_in indicates that the data destination is the virtual server, which is called by ip_local_deliver. ip_vs_in is the processing function:
Static unsigned int ip_vs_in (...)
{
Struct sk_buff * SKB = * pskb;
Struct iphdr * IPH;
Struct ip_vs_protocol * PP;
Struct ip_vs_conn * CP;
...
Pp = ip_vs_proto_get (IPH-> Protocol); // obtain the registered layer-4 protocol for Server Load balancer.
If (unlikely (! Pp) // if not, it will be received in the conventional mode.
Return nf_accept;
IHL = IPH-> IHL <2;
CP = PP-> conn_in_get (SKB, PP, IPH, IHL, 0); // check whether the package belongs to a created ip_vs_conn
If (unlikely (! CP )){
Int V; // if this ip_vs_conn is not found, a new
If (! PP-> conn_schedule (SKB, PP, & V, & CP) // schedule, that is, initialize a CP
Return V;
}
...
Restart = ip_vs_set_state (CP, ip_vs_dir_input, SKB, pp );
If (CP-> packet_xmit) // send data on the new or old connection
Ret = CP-> packet_xmit (SKB, CP, pp );
Else {
Ip_vs_dbg_rl ("Warning: packet_xmit is null ");
Ret = nf_accept;
}
...
Return ret;
}
Since the data is sent out, the reply package must be back. There is an asymmetry in the implementation of ipvs, that is, the forward package will be imported locally and made a choice, however, the response packet of The Real Server is directly forward, and the response packet is not imported locally, because the SNAT operation is not performed when the packet is sent to the real service period, most DNAT operations are performed in Nat mode. Therefore, the destination IP address and port of the response packet are still the IP address and port of the original client. Therefore, when the data is returned to the Server Load balancer, the server Load balancer finds that the target address is not its own, so it forward. This non-SNAT asymmetric implementation improves the efficiency, but it is necessary to design such a framework with caution, it is dangerous because the Server Load balancer is bypassed. The real server does not have to go through the Server Load balancer when it reaches the client. If the port is changed by DNAT in ip_vs_in, data will be disordered when the real server returns to the client without a Load balancer (this will not happen in ipvs, because the direct routing mode does not change the IP address and port information ), however, you can configure Avoid this situation. The following describes how to handle reverse packets:
Static struct nf_hook_ops ip_vs_out_ops = {
. Hook = ip_vs_out,
. Owner = this_module,
. PF = pf_inet,
. Hooknum = nf_ip_forward,
. Priority= 100,
};
Static unsigned int ip_vs_out (...)
{
Struct sk_buff * SKB = * pskb;
Struct iphdr * IPH;
Struct ip_vs_protocol * PP;
Struct ip_vs_conn * CP;
Int IHL;
If (SKB-> nfcache & nfc_ipvs_property)
Return nf_accept;
...
Pp = ip_vs_proto_get (IPH-> Protocol );
If (unlikely (! Pp) // normal package -- non-load balancing reverse package
Return nf_accept;
CP = PP-> conn_out_get (SKB, PP, IPH, IHL, 0 );

If (unlikely (! CP )){
... // Normal package -- non-load balancing reverse package
Return nf_accept;
}
... // The snat_handler should be called in any case below, and the direct routing mode and Nat mode should be processed in a unified manner. This is the result of asymmetric design.
If (PP-> snat_handler &&! PP-> snat_handler (pskb, PP, CP ))
Goto drop;
SKB = * pskb;
SKB-> NH. iph-> saddr = CP-> vaddr; // no matter how the source IP address is updated, although this is unnecessary for direct routing mode, this is also the result of asymmetric design.
...
}
For TCP, tcp_conn_schedule is the scheduling function of ip_vs_protocol:
Static int tcp_conn_schedule (struct sk_buff * SKB,
Struct ip_vs_protocol * PP,
Int * verdict, struct ip_vs_conn ** CPP)
{
Struct ip_vs_service * SVC;
Struct tcphdr _ tcph, * th;
... // Find an ip_vs_service of TCP, which is based on the IP address-vaddr and port of the virtual service.
If (Th-> SYN & (SVC = ip_vs_service_get (SKB-> nfmark, SKB-> NH. iph-> protocol,
SKB-> NH. iph-> daddr, th-> DEST ))){
...
* CPP = ip_vs_schedule (SVC, SKB); // start scheduling
...
}
Return 1;
}
Struct ip_vs_conn * ip_vs_schedule (struct ip_vs_service * SVC, const struct sk_buff * SKB)
{
Struct ip_vs_conn * CP = NULL;
Struct iphdr * IPH = SKB-> NH. iph;
Struct ip_vs_dest * DEST;
_ 2010_ports [2], * pptr;
...
If (SVC-> flags & ip_vs_svc_f_persistent)
Return ip_vs_sched_persist (SVC, SKB, pptr );
If (! SVC-> fwmark & pptr [1]! = SVC-> port ){
...
}
DeST = SVC-> schedle-> schedule (SVC, SKB); // select an available Real Server Based on the scheduling algorithm
...
CP = ip_vs_conn_new (IPH-> protocol, // set a connection to the Real Server
IPH-> saddr, pptr [0],
IPH-> daddr, pptr [1],
DeST-> ADDR, DEST-> port? DeST-> port: pptr [1],
0,
DEST); // ip_vs_bind_dest updates the load of the selected server.
If (Cp = NULL)
Return NULL;
Ip_vs_conn_stats (CP, SVC); // update count
Return CP;
}
In direct routing mode, the packet_xmit of a connection is ip_vs_dr_xmit:
# Define ip_vs_xmit (SKB, RT )/
Do {/
(SKB)-> ipvs_property = 1 ;/
(SKB)-> ip_summed = checksum_none ;/
Nf_hook (pf_inet, nf_ip_local_out, (SKB), null ,/
(RT)-> U. dst. Dev, dst_output );/
} While (0)
Int ip_vs_dr_xmit (struct sk_buff * SKB, struct ip_vs_conn * CP,
Struct ip_vs_protocol * pp)
{
...
If (! (RT = _ ip_vs_get_out_rt (CP, rt_tos (IPH-> ToS ))))
Goto tx_error_icmp;
...
Dst_release (SKB-> DST );
SKB-> DST = & RT-> U. DST; // you can specify a route entry.
...
Ip_vs_xmit (SKB, RT); // send data
...
}
[Scheduling algorithm process]: simply select a real server based on a specific policy and throw the data. You can consider modifying the code to implement regular reporting of your load to the Server Load balancer on a regular basis by the Real Server, so it is best to enable user-mode processes to work together.
[General process]:
1. Data Packet entry-
2. Enter the virtual service-
3. Search for existing connections to the Real Server-
4. If it is found, to 5-
5. Send-end
6. If no-
7. Select a Real Server and initialize a connection to the server, to 5-
8. Wait for the reverse data, find the existing connection, and process it-
5.1.nat mode -- modify the destination address to the address of the Real Server and change the data packet
. Direct routing mode -- directly searches for a route using the IP address of a physical network card of the Real Server as the key value, and the data packet does not change
[Conclusion]: ipvs implements a one-to-many ing. This mechanism cannot be implemented using one-to-one NAT technology, but ipvs is also a "almost" connection-based load balancing, however, you can flexibly implement Load Balancing Based on Multiple packets by setting the timeout value to a short value. In any case, this is still not a fully packet-based load balancing solution.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.