A bug that is not a bug about ICMP redirect Routing

Source: Internet
Author: User

The first network problem in the new company was about redirecting routes, and this less-than-noticed problem took me a whole afternoon to sort out, and this article describes how the Linux protocol stack treats redirected routes.
How route items are generatedAny network-capable device has a routing table on the inside that indicates how the packet is emitted from the device and where the next station is sent. A routing table consists of one item, each of which is called a route item that has the following generation methods:
1. Automatically discovered routing items when the NIC starts and configures the IP address, it automatically generates a link-layer route that matches the IP address and prefix, as the link is up to the nexthop, all elements can be immediately determined. This process therefore requires no manual intervention or any intervention by the routing protocol.
For example, if the IP address on the eth0 is configured as 192.168.41.150/24, then according to this IP address, the kernel stack will calculate the IP address of the network segment is 192.168.41.0/24, it is clear, if according to IP specifications, The IP address held by the eth0 is also 192.168.41.0/24 the network segment, all packets destined for that segment need not be nexthop to the link layer, all access to the network segment is "direct", so no external intervention is required to generate a route:
192.168.41.0/24 Dev eth0
Indicates that all access 192.168.41.0/24 are sent directly through eth0 and mapped to the specific access logic, which is the direct ARP destination address.
This routing entry is automatically deleted when the IP address of the NIC is deleted or changed.
2. Statically configured route entry if the destination address does not belong to any of the network network card holding IP address of the segment, God knows how to go, at this time must need an external intervention, generally divided into two kinds, manual intervention and Protocol intervention.
Manual intervention refers to the system integrated command, manually configure a route, tell the protocol stack to the destination address or the target network segment of the next hop is what, so that the packet can be pushed forward, is for IP hop forward, the next hop IP address host must be directly connected to the local, However, the IP address of the next hop is not required to be in the same IP segment as any of the local NICs (due to the presence of force Onlink). Therefore, if the IP of the unique NIC Eth0 is 192.168.41.150/24 and needs to be accessed from the native 1.2.3.0/24, you need to specify a 192.168.41.150/ 24 Nexthop in the same segment or not in the same network segment but the link layer is directly connected to the Nexthop:
1.2.3.0/24 nexthop 192.168.41.254 Dev eth0-Represents the same network segment Nexthop
1.2.3.0/24 Dev eth0 Force Onlink-represents not on the same network segment, but can ensure direct-attached nexthop
This route entry is deleted manually by the administrator or the NIC corresponds to the nexthop IP address delete/change, and the corresponding NIC down will also delete the route entry.
3. The routing terms generated by the dynamic routing protocol can also be generated by the dynamic routing protocol, in addition to the manual configuration, so the so-called routing protocol is to inject intelligence into the whole network through a conventional protocol, so that it can automatically discover the network topology, which is more flexible than the static manual configuration. After all, the administrator's brain is not a full-time memory network topology, he is a person, in addition to work to deal with a variety of social, family Affairs, even a workaholic, can not just remember and configure the network, so still put this matter to full-time performers, this performer is a dynamic routing protocol, In this paper, the dynamic routing protocol is not explained in depth to avoid distracting.
After the dynamic routing protocol runs into the convergent state, all network devices (routers, etc.) agree on the network topology, that is, each device generates a map in memory, and each device knows how to reach any other target, and the next jump is clear, The generated route entries are no different from the manually generated route entries.
The deletion mechanism of dynamic route items is similar to static route entries, the protocol deletes the route entries, can be deleted manually, and the underlying events will automatically delete the route entries, and the difference is that the routing item is deleted or the underlying event occurs, such as the hot plug of the NIC, which triggers the routing protocol recalculation Convergence.
4. Redirect Routing items There is always an exception, this is the Iron law. In daily life, we often encounter buck situation, someone told you to do this thing to find a, you found A,a told you to go straight to B, because she also want to find B ... In the case of route forwarding, the situation is simpler, if the native IP address is s, the next hop gateway configured for A, and a of the next hop gateway is B, while the s,a,b three are in the same network segment and link layer directly connected, the simplest case, it can be thought that they are connected to the same switch, The logic here is simple: since S can go directly to B, why bother a, and then return to B on the original path?
This is the reason for the existence of redirect routing, a redirect to the routing of the issuer, s to redirect the routing of the receiving and processing, S received this a issued redirect route, will be in the "Routing table" (why quoted?). This is the topic of this article) where a new route entry is inserted, which is typically a redirect for a specific IP rather than a network segment, such as:
the ip:192.168.41.150/24 of S
A's ip:192.168.41.253/24 GW192.168.41.254
B's IP:192.168.41.254/24
An existing route in the system: 2.2.2.0/24 nexthop 192.168.41.253
s sends a packet to 2.2.2.100 to find the routing table first to a, and then a returns to s a redirect: please send directly to B.
After S receives the redirect, it generates the following redirect route:
2.2.2.100 Nexthop 192.168.41.254 redirect
So is this route inserted directly into the system's routing table? If so, when will it be deleted? After all, the first three kinds of routes are inserted at the same time there is a delete mechanism, the redirect route exactly when to delete it? When is it going to be deleted? When is it going to be deleted? Around the question of how this redirect route was removed, I have some summaries as below.
Linux routing table what standard routing table layout, route item format, query algorithm and so on are clouds! As long as the packet forwarding can be quickly positioned to Nexthop, is successful! Industry more common practices such as the generation of a forwarding table based on routing tables, forwarding content can be injected into the hardware to achieve high-speed hardware forwarding, can also generate a much smaller (just hope, the fact is not necessarily so!) But query a more efficient cache table. Like Cisco's big on-board devices generally have their own line cards, routing tables and forwarding statements are completely separated, forwarding to publish all-hard implementations, such as CEF technology. As for the implementation of a protocol stack for a linux-like layout, it's a bit more flexible, because of this flexibility, the location of redirect routing is constantly changing, and the infrastructure changes inevitably introduce some problems (like the one I found in 2010 on the TSO bug of the Linux NAT module ...).
1. Take 2.6.32 as an example because the kernel has been changing, including minor changes and big strides, I can't search through every detail, only coarse-grained sampling of the kernels I used to use. In 2.6.32 and the previous kernel, the protocol stack maintains a route cache, which you can view as the "much smaller but more efficient" cache table, and the content in the route cache is the cache for the Kernel standard routing table query results. The standard routing table has two options for compiling the kernel, namely the hash table and the Trie table, but regardless of the table, the theory and common sense makes it more efficient to think of a one-dimensional hash table cache than the Hash/trie table with "Longest prefix matching", which is the route The reason the cache exists.
When the packet is to be sent, first look for the route cache, if the hit does not have to check the standard routing table, if the missing will continue to find the theoretically less efficient hash table or Trie table, in fact, as with any cache mechanism, this is a gambling tradeoff, Whether efficiency is increased depends on the layout of the cache, as we'll see later, unlike the CPU cache, where IP routing is available for time locality for TCP streams, but no spatial locality is available, this may cause a route cache attack.
In this version of the kernel, the redirect route is saved in the route cache, which is a reasonable choice because the route cache has two removal mechanisms, manual flush and timeout expiration, even if persistent redirect traffic causes the redirected route The cache will never expire, or it can be flush manually.
However, in this version of the kernel, if the kernel disables the route cache, for example, if you set the Net.ipv4.rt_cache_rebuild_count to 1, then it will cause the redirect route entry is never available, you continue to contract, Nexthop Receiver continues to reject concurrent redirects, however you disable the route cache and do not receive this redirect route, there is no mechanism to prohibit this disruptive behavior. But this is not a problem, and administrators will find this behavior and correct it in a timely manner (don't expect everything to be done automatically, that's too complicated!). ), you bought a ¥18000 suit to your manager, or cash on delivery, and did not write the sender, suffering courier and business, ICMP is not always do this thing ah.
2. Take 3.0.1 as an example in this version, redirect the processing of the route to a place, perhaps at this time has found the route cache will end the class, early evacuation of the bar, hurriedly the redirect route from the route cache transferred to the Inet_peer.
First, explain briefly what is Inet_peer. Inet_peer is an end-to-end concept that differs from the hop-by-step forwarding of IP routing, and inet_peer directly records the peer-to-peer communication with it, such as when you talk to friends in the company with the United States Qq,peer refers to American friends, in the United States, The hop-on Nexthop may be your company's egress router address or even the Egress gateway address of your department's VLAN. So what can be saved in Inet_peer? The contents of Inet_peer saved in each kernel version are almost changed, mainly look at the end-to-end semantics, it can be said, as long as the end-to-end meaningful, can be saved in the Inet_peer, typically TCP information (timestamp-related, inet_ Peer represents all of the TCP connections to this machine on the peer machine! ), it is also reasonable to save the redirect route, which represents the message "where to get to the next station on the peer."
REDIRECT Routing for the Home, you will no longer be affected by disabling the route cache. Is there a problem? It doesn't seem to be. Because Inet_peer also has its own deletion mechanism, is the time-out aging mechanism, which is the only removal mechanism. The reason why the manual flush mechanism is not provided is that the "stateful" protocol like TCP is used to Inet_peer, and the direct brute force removal affects the connection state. Is the only time-out aging mechanism not enough?
I personally stand in the perspective of self-justification is enough, as long as there is a mechanism to remove it is enough! The question now is how to make inet_peer expire, and it's easy to expire as long as it lasts. But the problem is that if there is traffic to continue to find the route, continue to find it in the get_peer operation, because the IP routing is not state, so long as this happens, Inet_peer will never expire, so the problem is converted to want to make inet_peer expire, such as blocking the traffic to peer (only local originating traffic, non-forwarding traffic!) )。 At first I wanted to block out the output traffic to peer with iptables drop, however, because the output chain call was on the IP route, it had to be dropped! So you can only manually stop all the services that arrive at peer at the application level!

So far, maybe a little messy, I'll summarize the route lookup process for the 3.0.1 version.


R1-route Cache
r2-Standard routing table
R3-inet_peer saved redirect Route

r1= Find route Cache
If R1 not hit
R2= Finding the routing table
If R2 not hit
Discarded
Else
r3= Find Inet_peer
If R3 not hit
Using R2
Else
Use the redirected routes saved in R3!
Else

Using R1



This is a glance! Where R3 is and only has a unique expiration removal mechanism, there is no flush mechanism. So in order to delete Inet_peer, had to first block out to reach inet_peer all traffic, and then through the SYSCTL set inet_peer timeout parameter is the shortest, manually flush off the route cache, waiting for time-out to delete Inet_peer, The routing of the Standard routing table can be used to release blocked traffic. The process is so complex that people unfamiliar with the kernel's implementation do not know what it is that makes it a bug.
A patch in the community solves this problem. In fact, it is hoped that when calling the IP route flush cache, the Inet_peer saved redirect Nexthop is set to unavailable. So the patch added two monotonically incrementing counters, one is global, one is the Inet_peer field, the two must be consistent, the Inet_peer saved redirect Nexthop is available, so the logic becomes the following:


R1-route Cache

r2-standard route table
R3-inet_peer saved redirect Route

R3.genid-inet_peer ID field
g_genid-Global ID field

FLUSH CACHE:
g_genid++

SET REDIRECT ROUTE:
R3.GW = Nexthop
R3.genid = G_genid

Lookup:
r1= Find route cache
If R1 Miss
    R2 in ICMP redirect = Find route table
    if R2 misses
        discard
    Else
        r3= Find Inet_peer
        if R3 missing
             using R2
        else if r3.genid not equal to G _genid
            use R2
        else
            Use redirected routes saved in R3 r3.gw!
Else

Using R1



This also solves the problem perfectly, and the call to the IP route flush cache disables all previously redirected routes.
3.kernel 3.17 and later this version has no route cache, in fact, before this version of the route cache early class, because the route cache entry is a strict IP address two tuple and IP routing behavior is not spatial locality, If an attacker forges a source IP address or a destination IP address, it may make the route cache table malformed, even if the total hash table length is limited, which may cause the route cache to be frequently missing and continuously rebuild, resulting in a "less severe" dos. If the IP address is not well planned, it will cause the route cache hash table to be malformed, so the route cache is finally dismissed, even if it is not an attack traffic. In fact, the missing side of this software cache is very large, because the software cache and normal Hash/trie queries use the same CPU, the resource allocation relationship between them is proportional, on the contrary, the loss of hardware cache can be kept in a constant range of costs, Because hardware caches and slow queries use different resources, hardware caches generally have their own hardware logic, which is a few orders of magnitude higher than CPU queries.
Now to the point, where is the redirect route saved? Are you going to continue in the Inet_peer? In fact, it was perfectly possible, but it moved to another place where the structure was fib_nh_exception. The continuous move of redirect routing items reminds me of myself, constantly changing places, Harbin, Changchun, Zhengzhou, Shanghai, Shenzhen ... But every move has a reason that seems to have to be an excuse in fact.

After moving the redirected route from Inet_peer to Fib_nh_exception, there is little change in the execution logic:


r2-Standard routing table
R3-fib_nh_exception saved redirect Route

ID field of the R3.genid-inet_peer
NSG_GENID-NETNAMESAPCE in the Global ID field

FLUSH CACHE:
nsg_genid++

SET REDIRECT ROUTE:
R3.GW = nexthop in ICMP redirect
R3.genid = G_genid

LOOKUP:
R2= Finding the routing table
If R2 not hit
Discarded
Else
r3= Find Fib_nh_exception
If R3 not hit
Using R2
else if R3.genid is not equal to Nsg_genid
Using R2
Else

Use the redirected routes saved in R3 r3.gw!



Can see is the inet_peer replaced with the fib_nh_exception added netnamespace support.
After this move is complete, the operation has almost no change, the redirect route still with the 3.0 version of the kernel hit redirect RT patch, when the flush cache is executed, the GenID of this namespace is incremented, so even if the R3 hit, Because of its genid and Nsg_genid, it is still not available.

Rationality Problem Kernel 3.0 version does not consider the problem of flush, so can not remove the redirect route, so only by Inet_peer timeout to delete, after patch, introduced GenID can be deleted in the flush cache while the redirect routing item, later route After the cache class, redirect routing moved to the Fib_nh_exception, from the name can be seen, redirect routing is an "exception" situation, belonging to orphaned orphans, later versions, almost only manually flush off redirect routing. There is a metaphysical question of who the redirect route should be deleted, and the best answer should look who added who deleted it, but, although the system adds this route, its root cause is the configuration error of the administrator or routing protocol, the probability of the dynamic routing protocol causing the redirect route is very small almost 0 , because the redirect route is for the host, which is the local originating packet, the responsibility is all given to the configurator of the static route, either the network administrator or the system administrator.
At the moment, I prefer to just keep the manual removal of redirected routes, and cannot support how to remove redirection routes over time. Keeping a single delete entry in this way can bring greater convenience to operations and troubleshooting. As for the Linux 3.0 kernel redirection route can not be manually flush to delete it is not a bug, if you stand in my position, it is a bug, but after all, it can be blocked traffic-expired deleted by the way the redirect route is deleted, so it is not a bug. The key problem is that there are problems, as long as the solution is success, others are floating clouds, left to the philosophy department of the academics to discuss it ....

A bug that is not a bug about ICMP redirect Routing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.