[Blog recommendations] Analysis of the principles of the Linux TC (Traffic Control) Framework
This blog post is from the Bkjia blog dog250 blogger. If you have any questions, go to the blog homepage for an interactive discussion! Blog: http://dog250.blog.51cto.com/2466061/1568267 |
Recently, my work is more or less related to Linux throttling. Since I learned about TC a few years ago and understood its principles more or less, I have never touched it any more, because I do not like TC command lines, it is too cumbersome, iptables command lines are also cumbersome, but more intuitive than TC command lines, while TC command lines are too technical. Maybe I didn't have a deep understanding of the TC framework on the Netfilter framework, maybe yes. Iptables/Netfilter corresponds to tc/TC.
The Linux kernel has a built-in Traffic Control framework for Traffic speed limiting, Traffic shaping, and policy applications (discard, NAT, and so on ). Can you think of anything else from this framework? Maybe not, but I will give it a brief description. The Netfilter framework is similar to the TC framework, but the two are quite different.
After mastering the Netfilter framework, it is much easier to understand the TC framework. In particular, when you think that Netfilter has such a limitation, you can take these questions to understand the design of the TC framework, you may find that TC makes up for the shortcomings of Netfilter in some ways. Before going into the details, let me introduce the similarities between the two and the differences in design caused by their different intentions.
Let's talk about Netfilter first. Undoubtedly, this framework is designed to filter data packets on the kernel path of the network protocol stack, just like the checkpoints on one road, netfilter sets such a level at five locations on the Protocol Stack's path for processing network data packets. A data packet is checked through these levels on the processed path. The result is several actions: accept, discard, queue, and import other paths. The framework only needs to get a result for one data packet, and there is no rule on what services are provided within the level in the Netfilter framework.
Now we can see that TC is designed to provide a service for data packets or data streams, such as speed limiting and shaping. This is not a result similar to Netfilter, to provide these services, You need to perform a series of actions. Therefore, "planning and organizing the execution of these actions" is the key to designing the TC framework! That is to say, the TC framework focuses on how to execute rather than just getting an action to execute. In other words, the Netfilter framework focuses on what to do, while the TC framework focuses on how to do it. (I have written a lot of Code and articles about Netfilter. I will not repeat it here ...)
There are already a lot of theories about speed limiting and Traffic Shaping, such as using a token bucket. However, this article focuses on the implementation of the TC framework in Linux rather than the token bucket algorithm, however, in a short article, it is impossible to describe in detail the history from the traffic control theory to the implementation of various operating system versions. However, we know that queue is the actual choice in most implementations, now the question is, how does the Linux TC framework organize queues. Before going into the queue organization in detail, I will compare Netfilter and TC for the last time.
If you know the differences between UNIX character devices and Block devices, it is easier to understand the differences between the Netfilter framework and the TC framework. A hook point of Netfilter is similar to a pipe character device, and skb is the one-way Pipe stream in this device. Generally, it flows from one end and then from the other end in the order of entry, with a result, such as ACCEPT and DROP. The TC framework is similar to a block device, which stores and accesses the content randomly. That is, the sequence in which skb enters is not necessarily the sequence in which skb comes out. This is exactly what Traffic Shaping requires. That is to say, the TC framework must implement a random access data packet storage buffer in which traffic is controlled. Of course, we already know that this is implemented by the queue.
Of course, everything is not absolute. a hook point of Netfilter can also have a storage buffer or execute a series of actions, typically the sharding and NAT functions in conntrack, for the sharding reorganization of the prerouting hook point, it is undoubtedly for the sharding, but it only enters the HOOK and is temporarily saved in it until all the shards have been split and reorganized, and the HOOK point will not flow out at one time, for NAT, the Netfilter processing result is undoubtedly "executed a series of actions", not just ACCEPT. In addition, I have written some modules that use Netfilter to implement traffic control. In turn, the TC framework can also implement the Netfilter function. In short, after you understand the design principles and the nature of these frameworks, you can use them and expand them to solve problems.
I personally think that for a separate Netfilter HOOK point, the TC framework is its superset, which is more flexible and complex to implement. The charm of Netfilter's TC lies in the definition of its HOOK point location.
Now, we will officially introduce the design of the TC framework.
Many of the information found on the Internet introduced TC, without exception, which is composed of "queue rules, categories, and filters". Most of them are vague, I dare say these are all from a document or a book. Few people understand the design of the TC framework from another perspective. This is a very challenging task. I personally prefer this kind of thing. Before introducing the queue organization of TC, let's first introduce what is called recursive control. The so-called recursive control is hierarchical control, and the control method for each layer is consistent. All those familiar with CFS scheduling know that both group scheduling and task scheduling adopt the same scheduling method. However, it is clear that the Group and task are at different levels, I drew the following figure to briefly describe this situation:
Not only is the organization of control logic, but even Linux uses this tree-like recursive control logic when implementing the UNIX process model. Each layer is a two-layer tree, showing this model:
It can be seen that recursive control is a fragment, and it would be better to display it with a three-dimensional graph. For example, every node except the leaf node is an independent small tree, whether it is a big tree or a small tree, the control logic or organizational logic is of the same nature.
Recursive control facilitates the arbitrary superposition of control logic, which we have seen in the design of the protocol stack, such as X over Y, XoY for short, such as PPPoE, IP over UDP (OpenVPN in tun mode), TCP/IP (native TCP/IP stack )... for TC, consider the following requirements:
1. Distribute the bandwidth to TCP and UDP according to the ratio;
2. In TCP traffic, the source IP address segment is divided into different priorities;
3. In the same priority queue, the bandwidth is allocated to the HTTP application and others according to the ratio;
4 ....
From the above requirements, we can see that this is a demand for Recursive control, where 1 and 3 both use the bandwidth ratio allocation, but obviously, this is of different levels. The entire architecture looks like this:
However, things are far from the imagination. Although the above figure shows the clues of the TC framework, it is not helpful to implement it. There are several typical problems: How do you identify data packets to different queues? What data structure should the non-leaf nodes in the figure present? Since it is not a real queue, there must be queue behavior, so how to express them ?...
In Linux, the "queue" is abstracted when TC is implemented. Basically, it maintains two callback function pointers: enqueue and dequeue. No matter whether it is enqueue or dequeue, it does not necessarily actually put data packets into the queue, but simply "executes a series of operations ". The "execute a series of operations" can be:
1. for leaf nodes, a data packet is actually queued into a real queue or pulled from the real queue;
2. recursively call the enqueue/dequeue of other abstract queues.
Note that the above 2nd point mentioned "other abstract queues", so how to locate this abstract queue? This requires a choice, that is, a selector, which classifies data packets into an abstract Queue Based on the characteristics of data packets. At this time, the design diagram of TC can be used to express the following:
As you can see, I didn't define the TC framework using the classic "queue rules, classes, and filters" triples, but explained it with the meaning of a recursive control. If you use the classic triplet to set it on this graph, it will look like the following. Note that I have deleted unnecessary text so that the graph will not be too messy. For more information about the text, see
It can be seen that the true or false values are slightly the same.
Now let's talk a little bit about it. It's still related to Netfilter. Of course it's not a comparison with TC, but my personal thoughts. Once upon a time, I highly liked Cisco's ACLs because they were applied to NIC interfaces, while Netfilter blocks the processing paths instead of the processing devices. For Netfilter, the processing device is just a match without any special characteristics. No matter whether it is related or not, all data packets must go through the Netfilter HOOK point. At least you need to determine whether it matches-I ethX... I want to mount a filter_list on net_device and write some code. I found that the effect is good and I am ready to use it. I am a person who often repeats the wheel. After seeing the implementation of TC, I found that the TC framework was exactly what I was looking for. So I put it bluntly, it could be implemented using Netfilter, the same can be achieved with TC. In addition, TC is based on the queue rules (the data structure field is written in this way, Qdisc-queue discipline, which is not affected by the classic triple expression, abstract queuing/departure does not specify how to implement it, and the queue procedure is bound to the NIC (more specifically, the NIC queue-if the NIC supports multiple queues), rather than blocking the process path. So I have two options:
1. implement a new Qdisc with a built-in simple FIFO queue, enqueue operation for matches/target transplanted from Netfilter, and all ACCEPT data packets are discharged into the FIFO;
2. in the classifier, whether to classify data packets depends not only on the characteristics of data packets, but also an additional action callback function. Only when this function returns 0 indicates that the data packet is successful, you can perform any action (drop, nat, etc.) in it and close the door to lualu.
In the above 1 and 2, 2nd has been implemented, and the first point is easy to implement. You only need to implement a queue procedure, or add an action for each queue procedure, it looks like this:
For, it is relatively simple, and its essence is to write an article in the diamond. The enlarged diamond is shown in:
In this way, the TC framework is used to implement the functions of the firewall and NAT, which I have always wanted. In fact, I have known this for a long time, but I don't like TC commands very much, because it is too technical to be configured and extremely difficult to maintain, or even more difficult to maintain than iptables rules, maintenance is super important, and it is even more important than how you think about writing this rule, because how to write is a matter of instant. If you have enough accumulation, then you can solve it in an instant. If you encounter a problem, dare to say that the appearance of inspiration is also instantaneous. For example, after drinking, maintenance is a long-term task, the maintenance personnel are not necessarily yourself. You must consider it for others because the technical society is an altruistic society.
Okay, so far, I believe I have already said everything I should say. They are all framework-oriented and have no details in it. Although I do not like TC command lines very much, however, I still want to use the last figure to show the relationship between each TC command and the kernel data structure. There are still no details, and the command is not complete. match is omitted, because I know those are not important:
Looking at my article, it may be difficult for you to get something that can be copied and pasted directly. The code is omitted, and the command is omitted, even if it is myself, when I saw what I wrote many years ago, I wanted to quickly run something like this, but I didn't. However, I think that the idea is more than implementation. If you understand the nature behind implementation or reality, you will be comfortable and easy to use.