Optimization of Linux stack search algorithm

Source: Internet
Author: User

The Linux network protocol stack can be accurate but still explained, needless to say, NetFilter. To put it simply, TC enough, but there are a few mishap, this article does not constitute a complete record, if it is an essay, can not be taken seriously.
0. The discovery of the species Linux stack as a pure software implementation, retains the hardware interface, but the article does not involve hardware.


In the implementation of Linux protocol stack, the search algorithm is unavoidable because there is no hardware circuit curing. For example, Route lookup, neighbor Lookup. Conntrack Find, socket lookup, and so on. In fact. The protocol stack serves as a public organization for all packets, assuming that a packet arrives at the protocol stack. The processing logic must help it find the data structure associated with it, so the lookup is certain. This is true even in hardware. However, there are two types of lookups, and the effect of these two types of lookups on performance is inconsistent.


0.1. Do not find this type of route lookup is not created, assuming that the route entry is not found. The failure is then returned directly, and the packet is discarded. For this type of lookup. The creation and deletion of table entries is triggered by specific events (such as human configuration, Nic Up/down, and so on). Not on your own initiative. The success of the find results is consistent with the performance of the failure, and the different protocols stack treats success and failure differently, so this article does not care about such lookups.


0.2. You cannot find the created image like Conntrack. Neighbors look for this class, assuming that the lookup fails, a new table entry will be created, so the success and failure of the find result is completely inaccurate for the performance impact.

Assuming that the lookup failed, the performance loss is huge, even for the efficient hash algorithm, at least you have to traverse the specific hash value specified conflict list talent discovery failure, which on average seems to be a very large overhead, and then found failure. This is the beginning, the next thing to do is to allocate memory and create the table entry. It's a very big expense, and it consumes both time and space. Although space loss is unavoidable. But I want to find the discovery failure at the fastest speed before I have to allocate memory to create a table entry.

0.3. Finding TCP sockets between 0.1 and 0.2 finds between 0.1 and 0.2 for the listen state of the socket. Its goal is to create a customer socket, but first it wants to ensure that a specific TCP four tuple is not found in the established state or the TW state socket, assuming that there are a large number of TW sockets, will consume a lot of time to prove "so many TW No one in the socket matches it. " It would be nice to say this at a high speed.


        A tuple that does not match a listen socket will report a lookup failure directly.

Next I'll be less specific about how to find a few Linux kernel stacks.

2.nf_conntrack find Linux Nf_conntrack optimized space is very large, testing shows that the kernel stack added to the conntrack is fully loaded with PPS ( Packet Per Second) will fall by half. The maximum number of connections for long connections is reduced by half. for short connections. The performance degradation is noticeable even if the individual timeout times are set very short.
       new Conntrack table entry speed limits the speed of new connections. The maximum number of connections is limited by the amount of memory the Conntrack can occupy and the duration of a conntrack table entry. At the same time keep a large number of Conntrack table entries in the case. Assuming that the hashsize is not large enough, the hash conflict list will be very long. Creating a new connection, the creation of new Conntrack, would be extremely lossy, because it would have to be significantly depleted to find that the lookup failed, and the next thing to do. It would be a good thing to assume that a high-speed discovery failed before it was created.
3. The route cache lookup is the same for similar cache lookups. For example, to find the route cache, we know that the route cache has an expiration time, assuming that a router has too much traffic, there will be a large number of route entries by the cache, to find the cache itself is a very large amount of overhead, hash conflict is very likely. It took so much strength not to find out. Had to enter the slow path, it was so angry!
       actually. In the presence of large traffic flows, the lookup cost of the routing cache will be much larger than the slow path cost of the regular routing table lookup, perhaps because of this, Linux eventually cancels the routing cache.


4.ipset find also similar to Ipset in the table items, today in the hospital to a small doctor clearance. Suddenly found that the 6.23 version of the Ipset has a timeout, supporting the time-out itself can do very much, the logic of self-treatment of many, but the protocol stack does not know whether a table item has been deleted due to expiration, personally think, like Ipset find this class, even if not with the timeout parameters. It is also very good to be able to determine the "not set" at high speed, and of course not to understand the "not set". The search for a specific data structure, for example, hash. Tree Lookup.


5.Bloom filter in the above. I finally expressed a desire to find a failure as soon as possible, so as to be able to directly do the work, and not to spend time on a certain failure of the matter, the price may be openwrt this chimney garbage can afford, but for boarded presentable Linux, absolutely can't afford.

Of course. Almost all of the operating system implementations of the protocol stack, and Linux.
How to find failure at high speed. This is a fundamental problem, but one more abstraction is how to make sure that "an element must not be in a set". There is a special theory to deal with this matter. That is the Bloom filter, which in fact in time complexity and space complexity of the efficiency is very high, but there is no free lunch, what is the price? The price is the possibility of miscalculation! Although it may be misjudged. But the algorithm can still determine some facts, assuming that it is "possible" to answer every inference, then it is not available, and we always want to determine some facts. 100% to determine some facts! To better illustrate, I write it as a function r=b (x), assuming that 0 is returned, then X is not in the collection, assuming that it returns 1. Then the fact that a "possible" is explained. The possibility that X has y% is not in the collection. Specific y is how much, the mathematics behind the fact is not complex, but not the focus of this article.


The normal hash table is a hash algorithm to narrow the search scope, and then in the conflict linked list of the exact match, so the results are undoubtedly determined. However, the Bloom filtering algorithm does not maintain the conflict chain list, it simply gradually reduces the search scope by using several different hash algorithms. Even if the scope is small, there is a possibility of conflict, and this may be a miscalculation.

The Bloom filtering algorithm is designed to be very static on the data structure, assuming that the N hash algorithm is used, then only one n-bit bitmap needs to be maintained. When adding an element to the collection, each hash algorithm is applied to the element to calculate the N range from 0 to N-1 of the hash value Si, and the position Si in the bitmap is 1. Now infer whether element x is in the collection, calculates N hash value XI for X, and computes all the values on the bitmap on the XI as the result is 0. The description element x must not be in the collection. Assuming it is there, you must set all the relevant bits to 1 when you join.


6. Apply Bloom filter I am not better at deploying a bloom filter before the discovery algorithm described in section 2-5 above is started. Assuming that the bloom algorithm is well designed, for most scenarios, assuming that 0 is returned, I can jump directly to the logic of the create operation, eliminating a lot of traversal time. If you return 1, you still need to match exactly. This is tantamount to adding a bloom filter based on the algorithm I thought was bad. The situation worsened.

But that's the price! This is adventure! I was able to push the blame to "This bloom algorithm is not well designed"! There is one more aspect. There is a tradeoff between calculating the cost of n hashes and calculating the cost of 1 hashes plus traversal. In this case, the time complexity should not be analyzed simply because of the bloom. Assuming that n is determined, then the time complexity is undoubtedly o (1). Is it better to be more efficient than a hash table? In fact, we should take the weighted statistical value, and this value depends on the rigorous performance of the pressure test.
Or that sentence, no free lunch, or use hardware acceleration. At this point you are spending money, or designing a good algorithm. At this point you pay the risk and the algorithm fails after the compensation.
7. Hierarchical hash lookup looks like a page table in the MMU. And the idea of routing lookup algorithms in BSD. Use multi-layer hash lookup instead of single hash plus conflicting link list traversal.
I take conntrack as an example, I can simplify the conntrack to a {IP1,IP2} pair, the first element is the key, the second is the value, so that the Conntrack lookup can be made into a BSD system routing look like, where IP2 can be seen as the next hop.

or something.
I'm not saying that multilevel hash algorithms are more efficient than individual hash algorithms, but multi-level hash tables can be computed over CPU cores. Multi-level hash each table can be hash computed as a dimension, each CPU multi-core computing dimension hash value. Coordinates of the location dimension, compared to single hash we cannot take advantage of multiple CPU cores. You have to calculate the value of the hash talent localization in the hash bucket list in order to pass the conflict. In the more CPU multicore era, the traditional way to analyze the performance of time-based complexity calculations may be outdated.

Optimization of Linux stack search algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.