In-depth analysis of so_dontroute and so_bindtodevice

Source: Internet
Author: User

So_dontroute does not skip the search of the route table, but only reduces the search scope to the directly connected host of the same three-layer network segment. so_bindtodevice does not skip the search of the route table, but only keeps the outgoing devices fixed, that is, a search key is added, so neither of them can skip the process of searching the route table. In essence, so_dontroute also adds a search key. The query of Route tables cannot be crossed in the protocol stack implemented by Linux, but some restrictions can be added. Take the hash route table as an example, in the fn_hash_lookup function:
If (F-> fn_scope <strong-> fl4_scope) // check the route range (**)
Continue;
Err = maid (F-> fn_type, maid (F), clerk, Res );
If (ERR = 0 ){
// Find
}
In fib_semantic_match:
If (! Response-> OIF | response-> OIF = NH-> nh_oif )//***
Break; // find
It can be seen that in the core search functions, dontroute and bindtodevice only impose some restrictions, and the logic of route search is not bypassed in the calling functions at any level of the core search function.
During normal data transmission, no device information is exported during route search. The egress device is determined by the matching result of the route table. After so_bindtodevice is set through setsockopt, the egress device information is available before the route query. In ip_queue_xmit, the search key FL is added. OIF = Sk-> sk_bound_dev_if (SK-> sk_bound_dev_if is 0 if bindtodevice is not available), route query continues as usual, and when it reaches fib_semantic_match, it works at ***. For the so_dontroute option, in the same way, it works at **. the Linux kernel protocol stack defines the "routing range" as an enumeration, which contains N types in total:
Enum rt_scope_t
{
Rt_scope_universe = 0, // any address Routing
Rt_scope_site = 200, // User-Defined
Rt_scope_link = 253, // local direct connection route
Rt_scope_host = 254, // host route
Rt_scope_nowhere = 255 // The route does not exist
};
The value increases gradually, and the larger the value is, the less difficult it is to match. Therefore, if rt_scope_link is configured due to so_dontroute, the scope in the route table is rt_scope_universe if the target host is external, in this way, no match is found, so if so_dontroute is set, even if a destination address has a route, as long as it is outside, it cannot be found. In addition, so_bindtodevice also has a constraint, that is, where the data packet will inevitably return from where it is. If it is not returned from the exit port, it cannot be sent to the user State for processing, which is handled by inet_match, after _ inet_lookup is entered, the aim is to find a socket associated with the data packet. inet_match will be called for matching the characters in each Hasse conflict chain. inet_match contains the following sentence:
! (_ SK)-> sk_bound_dev_if) | (_ SK)-> sk_bound_dev_if = (_ DIF)
For UDP, the same logic also exists in udp_v4_lookup_longway, which ensures where to go and where to return.
According to the characteristics of so_bindtodevice, it is okay to use it for load balancing, but the premise is that the data to be balanced is sent from the local machine, rather than forward, this is because when the route table search key is set in ip_route_input_slow, the egress device is set to 0, and rth-> FL when the route cache is searched in ip_route_input. OIF =
0 indicates that the egress device must not be set (). Therefore, to balance the load of traffic data, you must first redirect it to the user State of the local machine, then, multiple Sockets (depending on the number of network adapters that can be balanced) are created. Each socket is bound to an egress Nic device, and the data proxy is performed on the socket associated with these devices based on the load, however, this requires that the machine performance be good enough, and the benefits of Server Load balancer are far greater than the overhead losses caused by redirect. Server Load balancer is always an important topic, especially in the hybrid Linux world. Some Linux Hosts have superb performance, but they are only 386 antique, with the routing cache mechanism of Linux kernel, it cannot achieve packet-based load balancing. without modifying the protocol stack, you must find a solution in user mode. One way is redirect, another way is to use a virtual network card, that is, use the character device interface of the virtual network card to import layer-3 or layer-2 data to the user State, and then use so_bindtodevice to balance to each network card, this method is only half done so far. Because the character device of the virtual network card does not produce user space data, sending data to the socket is equivalent to a tunneling encapsulation, this is a problem. It is best to deencapsulate data after a balanced road section. Therefore, a symmetric host is required to unencapsulate the data.
But it is slow. Does so_dontroute have no other function, so there is no other way to say it? That's not the case. It's not that easy! The key here is the "egress device". As long as the device can be identified, the data can be successfully sent out with a set of so_dontroute characters. This is a fact, in ip_route_output_slow:
If (maid (& FL, & res )){
// Possible cause of failed search: 1. No route table match exists; 2. Route table match exists, but not local.
Res. Fi = NULL;
If (oldalign-> OIF) {// However, as long as there is an exit device, the message can be sent successfully: 1. set so_bindtodevice to bind a device; 2. add a device Route IP Route add IP/mask Dev Device
If (FL. fl4_src = 0)
Fl. fl4_src = inet_select_addr (dev_out, 0, rt_scope_link );
Res. type = rtn_unicast;
Goto make_route;
}
If (dev_out)
Dev_put (dev_out );
Err =-enetunreach;
Goto out;
}
}
Therefore, not only does so_dontroute not skip route lookup, but also does not route the packet instead of not searching the route table, that is, the package will not pass through the router. The package sent by the socket with the dontroute is never sent to the gateway, but the local route search cannot be avoided! So_dontroute is often used to determine only the sending egress but no route. Don't worry too much, why? What about ARP? Since dontroute only cares about whether there are exit devices, what if there are exit devices after the data reaches the link layer? The guy who knows the network process knows ARP. Real Data Communication requires the establishment of The Link Layer channel. Therefore, ARP is necessary. Don't worry, ARP is implemented in the previous way. If a host is directly connected to the sending host, it is clear that as long as the target host has a route, even if the IP addresses they configured are not in a CIDR Block, ARP replies will be correctly received. so_dontroute is also used in this case. However, if we just want to do an experiment, that is, If the destination IP address is a non-existent IP address in different CIDR blocks, Will ARP respond? If no response is received, how can data be communicated? It is very simple. If there is no response, data cannot be communicated, and only a response can be made. However, it is clear that this response cannot be sent back. However, in either case, first, the gateway starts the ARP proxy. Although the data set to so_dontroute in the sending socket is not routed by the gateway, the gateway responds to ARP (actually a spoofing ), the data is still sent to the Gateway. The second case is that the gateway does not start the ARP proxy, but it uses IP-Mac protection, it detects a strange ARP request (an internal request for a MAC address of a different network segment). Although this is not illegal, the router can also respond to its MAC address with curiosity, so as to introduce the data packet here, if it is not illegal to request a MAC address of a different IP address segment, but if the gateway sets the binding between the IP address and the MAC address, it does not find any information about the destination IP address in these bindings, so it considers the request to be faulty and may still introduce this packet in response to its MAC address. So unless you really have a direct connection to the machine, the data will not be sent, but some ARP information is enough to be confusing.
So_bingtodevice only binds an interface to the socket, while so_dontroute simply does not send messages through the gateway. No matter what gateway you set, it always uses the destination address of the data as the next hop.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.