Socket Lookup cache routing mechanism for Linux kernel protocol stacks

Source: Internet
Author: User

Is Anza by the table fast? Or check the socket hash table fast? This is not the root of the problem. The root of the problem is how to use the two effectively, making the two partners rather than competitors. What's going on?

We know that if a packet is to arrive locally, it will go through two lookups (temporarily regardless of Conntrack): The IP layer finds the Routing and Transport layer to find the socket. How to merge the two.

The Linux kernel protocol stack takes one approach: adding a DST field to the socket as a means of caching routing, SKB to find the socket first before locating the route, then setting the cached DST to SKB, and then finding the route to find the DST, It eliminates the process of routing lookups.

The question is, when is the DST field of the socket set? Of course is "the first SKB related to the socket" when the arrival of the setting, no doubt even if the first SKB to find the socket, at this time the socket on the DST is also null, then it will be honest to find the routing table, if found, The found route entry is set to the DST field of the socket.

This feature is called Ip_early_demux in the Linux implementation, and is described in the kernel documentation as follows:

Ip_early_demux-boolean Optimize input packet processing down to one demux for

Certain kinds of local sockets. Currently we only does this

For established TCP sockets.

It may add a additional cost for pure routing workloads that

Reduces overall throughput, in such case you should disable it.

Default:1

For forward forwarding, this feature is bound to degrade performance, but I'm not going to say this obvious question, I want to say two points:

1. Level cache Logic

We know that routing lookup is a "best-effort" multi-pair matching process, and SKB and routing entry are not exact one by one correspondence, so the socket cannot be cached in the routing entry, but the routing entry can be cached in the socket. Because the socket and SKB are a one by one correspondence (I'm not talking about TCP listen socket), I can also cache the socket in the routing cache because the route cache and SKB are also one by one corresponding relationships.

The Linux kernel, however, removed the support for the routing cache, but that's fine, as long as you know: one by one corresponding exact matches can cache a looser non-one by one corresponding match. If you look at the Conntrack, you know what to do, I once entry the route cache into the Conntrack, according to this logic, is reasonable, the same, the socket can also be cached into the conntrack inside, This already has the iptables-related match and target.

2. Automatic or Manual

Since Linux has ip_early_demux configuration parameters, the question is when to turn it off and when. In particular, if you do not know how many packages are arriving locally, how many packages are forward, the problem seems more difficult to answer. At this point, do you believe that the administrator is not 0 or 1 of the configuration, or let the system to dynamic adaptive?

How to count packets is especially important, typically, if more than 60% of the package arrives locally, then it is turned on, and vice versa. Ip_early_demux configuration parameters as a global parameter is nothing bad, because if not, then there will be another problem, that is, how to determine whether a package to be early_demux ... For the boundary of non-seven-layer equipment, in general, the traffic is classified, divided into management surface traffic and data surface traffic, for the former, the end of the traffic is local, and for the latter, the machine only do forward, if the package can be efficiently in advance to classify the plane, then two ip_early_ Demux configuration is better, and for out-of-band management, Linux Nsnamespace can do a good job.

    • This article is from: Linux Tutorial Network

Socket Lookup cache routing mechanism for Linux kernel protocol stacks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.