Linux Network protocol stack implementation is accurate but not yet delicate, needless to say netfilter, just said TC is enough, but there are several shortcomings, this article to do an incomplete record, it is an essay, do not take seriously.
0. Find the kind of Linux stack as a pure software implementation, retaining the hardware interface, but this article does not involve hardware.
In the implementation of Linux protocol stack, because there is no hardware circuit curing, the search algorithm is unavoidable, such as routing lookup, neighbor lookup, Conntrack lookup, socket lookup, and so on. In fact, the protocol stack serves as a public organization for all packets, and if a packet arrives at the protocol stack, the processing logic must help it find the data structure associated with it, so the lookup is inevitable, even in hardware. However, there are two types of lookups, and the effect of these two types of lookups on performance is inconsistent.
0.1. Do not find that does not create such as Route lookup, if the route entry is not found, then directly return the failure, the packet is discarded. For such lookups, the creation and deletion of table items is triggered by specific events (such as human configuration, Nic Up/down, etc.), not automatically. The success of the find results is consistent with the performance of the failure, and the different protocols stack treats success and failure differently, so this article does not care about such lookups.
0.2. Not found to create a lookup like Conntrack, neighbor lookup This class, if the lookup fails, a new table entry will be established, so the success and failure of the search results has a completely asymmetric effect on performance. If the lookup fails, the performance loss is huge, even for the efficient hash algorithm, at least you have to traverse the specific hash value specified by the conflict linked list to find the failure, which on average seems to be a large cost, and then found that the failure, this is the beginning, next to allocate memory, create table entries, This is a big expense, both time consuming and space consuming. Although space loss is unavoidable, I want to find the discovery failure at the fastest speed before the memory must be allocated to create the table entries.
0.3. Find TCP between 0.1 and 0.2 Socket lookup between 0.1 and 0.2, for the listen status of the socket lookup, its goal is to create a customer socket, but first it to ensure that a specific TCP four tuple is not in the established state or the TW state of the socket is found, if there is a large number of TW sockets, It will take a lot of time to prove that "there is not a single match in so many TW sockets." It would be nice if I could explain this quickly.
For tuples that do not match the listen socket, the lookup failure is reported directly.
Next I will not analyze the search methods in several Linux kernel stacks in detail.
2.nf_conntrack find the Linux nf_conntrack optimization space is very large, the test shows that the core stack to join the Conntrack, pps (Packet Per Second) will fall by half, the maximum connection number of long connections is reduced by half. For short connections, performance degradation is noticeable even if the individual timeout times are set very short.
The speed of new Conntrack table entries limits the speed of new connections, while the amount of memory conntrack can occupy and the duration of a Conntrack table entry limit the maximum number of connections. In the case of maintaining a large number of Conntrack table entries, if the hashsize is not large enough, then the hash conflict list will be very long, creating a new connection, that is, the creation of the new conntrack will be extremely lossy, because it must be greatly consumed before the discovery fails. The next thing to do is work. It would be a good thing if a quick discovery failed before it was created.
3. The route cache lookup is similar to the cache lookup, such as the lookup of the route cache, we know that the route cache has an expiration time, if a router too much traffic, there will be a large number of route entries are cache, Find the cache itself is a big expense, the likelihood of hash conflict is very large, the cost of such a great effort has not been found, had to enter the slow path, is absolutely angry dead!
In fact, in the presence of large traffic flows, the lookup cost of the routing cache will be much larger than the slow path cost of the regular routing table lookup, perhaps because of this, Linux has finally canceled the routing cache.
4.ipset Lookup also similar to the Ipset in the table entry, today in the hospital to the small doctor's clearance, Suddenly found that 6.23 version of Ipset has a timeout parameter, support the time-out itself can do a lot of things, logic processing automation a lot, but the protocol stack does not know whether a table item has been deleted because of expiration, personally feel, like Ipset find this class, even if not carry timeout parameters, if you can quickly determine "Not set" is also very good, of course, do not explicitly determine "not in set" when the search for a particular data structure, such as Hash,tree lookup.
5.Bloom filter in the above, I have finally expressed a desire, that is, to find the failure of the search as soon as possible, so that you can directly do the work, and do not have to spend time on an inevitable failure, the price may be openwrt such a chimney garbage can afford, But for Linux on board presentable, there is absolutely no pay. Of course, almost all operating systems implement the same protocol stack as Linux.
How to quickly find the failure of a lookup is a fundamental problem, but one more abstraction is how to make sure that "an element must not be in a collection". This matter has a special theory to deal with, that is Bloom filter, it is in fact in time complexity and space complexity of the efficiency is very high, but there is no free lunch, what is the price? The price is the possibility of miscalculation! Although it is possible to misjudge, but the algorithm can determine some facts, if it is the answer to every judgment is "possible", then it is not available, we always want to determine some facts, 100% to determine some facts! In order to better illustrate, I write it as a function r=b (x), if return 0, then the X is not in the collection, if return 1, then it is a "possible" fact that X has the possibility of y% is not in the set, the number of Y is not complex, but not the focus of this article.
The normal hash table is a hash algorithm to narrow the search scope, and then in the conflict linked list of the exact match, so the results are undoubtedly determined. However, the Bloom filtering algorithm does not maintain the conflict chain list, it is only gradually using a number of different hash algorithms to narrow the search scope, even if the scope is smaller, there is still the possibility of conflict, and this may be a miscalculation. Bloom filtering algorithm in the data structure is very sophisticated design, if using n hash algorithm, then only need to maintain an n-bit bitmap, for the collection to add an element, for the element to apply each hash algorithm to calculate the N range from 0 to N-1 hash value Si, and in the bitmap position si 1. Now that the element x is in the collection, the N hash value XI is computed for x, and all the values on the map on the XI are made and computed, and the result is 0, which means that element x must not be in the collection, and if it is, it must be set to 1 when it is added.
6. Apply Bloom filter is it better if I deploy a bloom filter before the discovery algorithm described in section 2-5 above is started? If the bloom algorithm is designed well enough, for most cases, if it returns 0, I can jump directly to the logic of creating the operation, eliminating a lot of traversal time. If we return 1, then we still need to make exact match, which is equivalent to adding a bloom filter on the basis of the algorithm I thought was bad, and the situation deteriorated. But that's the price! This is adventure! I can push the blame to "This bloom algorithm is not well designed"! On the other hand, it is necessary to weigh the cost of the N hash and calculate the cost of the 1 hash plus traverse which is large, at this time should not be a simple analysis of the complexity of times, because for Bloom, if N is determined, then the time complexity is undoubtedly o (1), it must be more efficient than the hash table? In fact, we should take a weighted statistic, and this value depends on rigorous performance stress testing.
Or that sentence, no free lunch, or use hardware acceleration, when you spend money, or design a good algorithm, at this time you pay the adventure and the algorithm after the failure of compensation!
7. Hierarchical hash lookups like the one in the MMU, the idea of a page table lookup, as well as the idea of a routing lookup algorithm in BSD, uses a multi-level hash lookup instead of a single hash plus a conflicting list traversal.
I use Conntrack lookup as an example, I can simplify the conntrack to a {IP1,IP2} pair, the first element is the key, the second is a value, so that the Conntrack lookup can be made into the BSD system routing look like, where IP2 can be seen as the next hop. or something.
I do not mean that the multi-level hash algorithm is more efficient than the individual hash algorithm, but that the multi-level hash table can be calculated in multiple CPU cores, multi-level hash table can be each hash calculation as a dimension, each CPU core calculates a dimension of the hash value, positioning the dimension of the coordinates , a single hash table can not take advantage of multi-CPU core advantages, you must first calculate the hash value to locate the hash bucket and then traverse the conflict list. In the multi-CPU core age, the traditional way of analyzing performance based on time complexity may be outdated.
Optimization of Linux protocol stack lookup algorithm