Linux TC (Traffic Control) Framework principle Analysis

Source: Internet
Author: User

Recent work more or less with Linux flow control a bit of a relationship, since a few years ago know have TC such a thing and some understand its principle, I did not move it, because I do not like TC command line, is too cumbersome, iptables command line is also more cumbersome, But the TC command line is more intuitive than the TC command line, which is too technical. Maybe I don't have a deep understanding of the NetFilter framework for the TC framework, perhaps. Iptables/netfilter corresponds to TC/TC.
The Linux kernel has a built-in traffic control framework that enables traffic throttling, traffic shaping, policy application (discard, NAT, etc.). Can you think of anything else from this frame? Maybe not now, but I'll start by saying that the NetFilter framework is similar to the TC framework, but the two are very different.
After mastering the NetFilter framework, it is much easier to experience the TC framework, especially when you feel that NetFilter has such limitations, with these problems to understand the design of the TC framework, you may find that TC in some ways to compensate for the shortcomings of the netfilter. Before I go into detail, let me introduce the similarities between the two and the differences in design due to their original intentions.
First of all, NetFilter, this framework is designed to filter packets on the kernel path of the network protocol stack, like a checkpoint on a road, where NetFilter set up 5 locations on the path of the protocol stack processing network packets. A packet is checked on the path being processed, and the result is a number of actions: Accept, Discard, queue, import other paths, etc., the framework only needs a single result for a single packet, and what services are provided within the NetFilter framework does not have any rules.
Now we look at the TC, which is designed to provide a service to the packet or data flow, such as speed limit, shaping, and so on, and this is not a similar netfilter results can be expressed, to provide these services need to perform a series of actions, so how to "plan and organize the execution of these actions" is the key to the TC framework design! In other words, the TC framework focuses on how to execute rather than just want to get an action to perform. In other words, what the NetFilter framework does, and the TC framework focuses on what to do. (About NetFilter I have written a lot of code and articles, no longer repeat ...)
About speed limit, traffic shaping theory has a lot of, more common, such as the use of token bucket, but this article is concerned about the implementation of the TC framework of Linux and not the content of the token bucket algorithm, however in a short article can not be described in detail from the flow control theory to various operating system version of the implementation of the history, But we know that the use of queues is the real choice in most implementations, so now the question is how the Linux TC framework organizes the queues. I'll compare the NetFilter and TC for the last time before I go into more detail about queue organization.
If you know the difference between a UNIX character device and a block device, it's easier to understand the difference between a netfilter frame and a TC frame. A hook point of the netfilter is similar to a pipe character device, and SKB is a one-way character stream in the device, typically flowing from one end to the other, and then flowing from the other end in the order of entry, with a result such as Accept,drop. And the TC framework is similar to a block device, the content is randomly stored and random access, that is, the order of SKB entered is not necessarily the order of SKB out, and this is what traffic shaping needs to do. In other words, the TC framework must implement a random-access packet storage buffer in which the flow control, of course, we already know, this is implemented by the queue.
Of course, nothing is absolute, netfilter a hook point can also have a storage buffer or perform a series of actions, typically conntrack in the Shard and Nat function, for prerouting This hook point of the Shard reorganization, No doubt for the Shard, just into the hook, temporarily stored in the inside, until all the shards have come to the success of the cut after a one-time outflow of this hook, and for NAT, NetFilter's processing result is undoubtedly "performed a series of actions" and not just accept. In addition, I have written a number of modules, using NetFilter to achieve flow control, in turn, the TC framework can also be implemented NetFilter functions, in short, when you understand the design principles of these frameworks and their nature, in the use and expansion, you can discovering, ease.
Personally, for a single netfilter hook point, the TC framework is its superset, more flexible to achieve, and of course more complex. NetFilter's own TC does not have the charm of its hook point location definition.
Well, now it's time to formally introduce the design of the TC framework.
Many online search information in the introduction of TC, without exception, the TC is composed of "queue procedures, categories, filters," the composition of the three, most ambiguous, I dare say these are from a document or a book. Very few people understand the design of the TC framework from another angle, which is a more challenging thing in itself, which I personally prefer. Before introducing the queue organization of TC, let me introduce what is called recursive control, the so-called recursive control is controlled hierarchically, and for each level, the control mode is consistent. Familiar with the CFS dispatch know, for group scheduling and task scheduling are all using the same scheduling, but obviously the group and task are different levels, I drew the following diagram to briefly describe this situation:


650) this.width=650; "Src=" http://img.blog.csdn.net/20141026225256484?watermark/2/text/ Ahr0cdovl2jsb2cuy3nkbi5uzxqvzg9nmjuw/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/dissolve/70/gravity/center " alt= "Center"/>


Not only the organization of control logic, even Linux in the implementation of the UNIX process model, but also the use of this tree-like recursive control logic, each level is a two-layer tree, showing this model:


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/4D/29/wKiom1RNDBuiCM5hAAFp6zJIm3c574.jpg "title=" Tc2.jpg "alt=" Wkiom1rndbuicm5haafp6zjim3c574.jpg "/>


It can be seen that the recursive control is fractal, if you can use a three-dimensional map show better, for the point, in addition to each node except the leaf node is a separate small tree, whether it is a tree or a small tree, for the control of logic or organizational logic, its nature is exactly the same.
Recursive control facilitates arbitrary stacking of control logic, which we have seen in the design of the protocol stack, such as x over Y, referred to as Xoy, such as Pppoe,ip over UDP (tun mode OpenVPN), TCP over IP (native TCP/IP stack) ... For TC, consider one of the following requirements:
1. Divide the entire bandwidth into TCP and UDP at a 2:3 ratio;
2. In TCP traffic, it is divided into different priority according to the source IP address segment;
3. In the same priority queue, divide the bandwidth into HTTP applications and others according to the ratio of 2:8;
4 .....

As can be seen from the above requirements, this is a recursive control of the demand, where 1 and 3 are using the bandwidth ratio allocation, but it is obvious that this belongs to different levels. The whole architecture should look something like this:


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/4D/2A/wKioL1RNDFiCUiHlAANeTGKgt4A809.jpg "title=" Tc3.jpg "alt=" Wkiol1rndficuihlaanetgkgt4a809.jpg "/>


But things are far from the imagination of the simple, although the above diagram has shown you the outline of the TC framework, but it does not help to achieve it. There are a few typical questions, how do you identify the packets to different queues, what data structures the non-leaf nodes of the graph are going to present, and if they are not the real queues but have the behavior of the queue, then how to express them? ...
Linux in the implementation of TC, the "queue" is abstracted, basically it maintains two callback function pointers, one is enqueue queued operation, one is dequeue out of the team operation. Whether it is enqueue or dequeue, it is not necessarily the case that the packets are actually queued, but simply "performing a series of operations". This "perform a series of operations" can be:
1. For the leaf node, actually enter a real queue or pull a packet out of the real queue;
2. Recursively invokes the enqueue/dequeue of other abstract queues.

Note the 2nd above, referring to the "other abstract queue", so how to locate this abstract queue? This requires a choice, namely a selector, according to the characteristics of the packet to the packet into an abstract queue, this time, the TC design diagram can be used to express:


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/4D/29/wKiom1RNC_aQyP2VAAWvbHAakTQ531.jpg "title=" Tc4.jpg "alt=" Wkiom1rnc_aqyp2vaawvbhaaktq531.jpg "/>


As you can see, I'm not using that classic "queue protocol, category, filter" ternary to define the TC framework, but rather to explain it in the sense of a recursive control. If the classic ternary set in this picture, it will be the following look, notice, I deleted the unnecessary text, so that the picture is not too chaotic, need text please refer to:


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/4D/2A/wKioL1RNDDWT6o5NAAT0OWwNOYo271.jpg "title=" Tc5.jpg "alt=" Wkiol1rnddwt6o5naat0owwnoyo271.jpg "/>


It can be seen that the change is not a compromise or think alike.
Well, now say humorous, or netfilter related, of course, it is not the comparison with TC, but my personal thoughts. Once upon a while, I highly respected Cisco's ACLs, which should be applied to the NIC interface, while the NetFilter was intercepted on the processing path rather than the processing device, and for netfilter, the processing device was just a no-no-no match, regardless of the relationship, All packets must go through NetFilter Hook Point Choice, at least you have to determine whether it matches-I ethX ... I want to hang a filter_list on the Net_device, also wrote some code, found that the effect is better, ready to use. I am a person who often repeatedly build wheels, when I later saw the implementation of the TC, found that the TC framework is what I want to find, so I put words, can be achieved with netfilter, with TC can also be achieved. Also, TC is based on the queue discipline (the data structure field is so written, qdisc-queue discipline, which is not affected by the classical ternary expression), the abstract queue/out of the team does not specify how to implement, and the Queuing protocol and network card binding ( More precisely, the queue of the NIC-if the network card supports multiple queues-instead of blocking on the processing path. So I have two options:
1. Implement a new Qdisc, its built-in a simple FIFO queue, enqueue operation from NetFilter transplanted matches/target, all accept packets into the FIFO;
2. Make a fuss on the classifier, whether to attribute the packet to a category not only to look at the characteristics of the packet, but also to execute an additional action callback function, only the function returned 0 to represent success, and since as a callback, you can do any action (Drop,nat, etc.), Shut the door to Lualu.

In the above 1 and 2, the 2nd has been implemented, and the 1th is easy to implement, you only need to implement a queue protocol, or for each of the queue procedures to add an action, it looks as follows:


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/4D/29/wKiom1RNC9LTQxuyAAWz3Yg1z0c151.jpg "title=" Tc6.jpg "alt=" Wkiom1rnc9ltqxuyaawz3yg1z0c151.jpg "/>


For the 2nd, relatively simple, its essence is in that diamond in the fuss, enlarged after the diamond as shown:


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/4D/2A/wKioL1RNDBWC2VAnAAHUUq0_Ink586.jpg "title=" Tc7.jpg "alt=" Wkiol1rndbwc2vanaahuuq0_ink586.jpg "/>


This is the TC framework to implement the function of the firewall and NAT function, which I have always wanted. In fact, I have known this matter, but I do not like the TC command, because it is too technical to configure, maintenance is extremely difficult, and even more difficult to maintain than the iptables rules, and maintenance is super important, it is even more important than you think how to write this rule, because how to write a moment, If you have enough accumulation, then you can deal with the moment, if you encounter a problem, dare to say that the revelation of inspiration is instantaneous, such as drink, but maintenance is a long time, and the maintenance of the person is not necessarily you, you have to consider for others, because the technology society is altruistic society.
OK, so far, I believe I have said all the words, are framed, no details in the inside, although not much like the TC command line, but I would like to finally use a picture to show each TC command and the kernel data structure of the relationship, still no details, command is not complete, omitted the match, Because I know that's not important:


650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M00/4D/29/wKiom1RNC66j84v1AALDLQg0B1c416.jpg "title=" Tc8.jpg "alt=" Wkiom1rnc66j84v1aaldlqg0b1c416.jpg "/>


Look at my article, you may be very difficult to get the kind of copy after the direct paste on the things that can be used, the code is omitted, the command is omitted, even if it is my own, in seeing what I wrote many years ago, very much want to run something fast, but there is no such thing. But I think that thought is greater than fulfillment, and if you understand the essence behind the realization or reality, then you will be handy and comfortable.


Linux TC (Traffic Control) Framework principle Analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.