Principle Analysis of Linux TC (Traffic Control) framework, tctraffic

Source: Internet
Author: User

Principle Analysis of Linux TC (Traffic Control) framework, tctraffic
Recently, my work is more or less related to Linux throttling. Since I learned about TC a few years ago and understood its principles more or less, I have never touched it any more, because I do not like TC command lines, it is too cumbersome, iptables command lines are also cumbersome, but more intuitive than TC command lines, while TC command lines are too technical. Maybe I didn't have a deep understanding of the TC framework on the Netfilter framework, maybe yes. Iptables/Netfilter corresponds to tc/TC.
The Linux kernel has a built-in Traffic Control framework for Traffic speed limiting, Traffic shaping, and policy applications (discard, NAT, and so on ). Can you think of anything else from this framework? Maybe not, but I will give it a brief description. The Netfilter framework is similar to the TC framework, but the two are quite different.
After mastering the Netfilter framework, it is much easier to understand the TC framework. In particular, when you think that Netfilter has such a limitation, you can take these questions to understand the design of the TC framework, you may find that TC makes up for the shortcomings of Netfilter in some ways. Before going into the details, let me introduce the similarities between the two and the differences in design caused by their different intentions.
Let's talk about Netfilter first. Undoubtedly, this framework is designed to filter data packets on the kernel path of the network protocol stack, just like the checkpoints on one road, netfilter sets such a level at five locations on the Protocol Stack's path for processing network data packets. A data packet is checked through these levels on the processed path. The result is several actions: accept, discard, queue, and import other paths. The framework only needs to get a result for one data packet, and there is no rule on what services are provided within the level in the Netfilter framework.
Now we can see that TC is designed to provide a service for data packets or data streams, such as speed limiting and shaping. This is not a result similar to Netfilter, to provide these services, You need to perform a series of actions. Therefore, "planning and organizing the execution of these actions" is the key to designing the TC framework! That is to say, the TC framework focuses on how to execute rather than just getting an action to execute. In other words, the Netfilter framework focuses on what to do, while the TC framework focuses on how to do it. (I have written a lot of Code and articles about Netfilter. I will not repeat it here ...)
There are already a lot of theories about speed limiting and Traffic Shaping, such as using a token bucket. However, this article focuses on the implementation of the TC framework in Linux rather than the token bucket algorithm, however, in a short article, it is impossible to describe in detail the history from the traffic control theory to the implementation of various operating system versions. However, we know that queue is the actual choice in most implementations, now the question is, how does the Linux TC framework organize queues. Before going into the queue organization in detail, I will compare Netfilter and TC for the last time.
If you know the differences between UNIX character devices and Block devices, it is easier to understand the differences between the Netfilter framework and the TC framework. A hook point of Netfilter is similar to a pipe character device, and skb is the one-way Pipe stream in this device. Generally, it flows from one end and then from the other end in the order of entry, with a result, such as ACCEPT and DROP. The TC framework is similar to a block device, which stores and accesses the content randomly. That is, the sequence in which skb enters is not necessarily the sequence in which skb comes out. This is exactly what Traffic Shaping requires. That is to say, the TC framework must implement a random access data packet storage buffer in which traffic is controlled. Of course, we already know that this is implemented by the queue.
Of course, everything is not absolute. a hook point of Netfilter can also have a storage buffer or execute a series of actions, typically the sharding and NAT functions in conntrack, for the sharding reorganization of the prerouting hook point, it is undoubtedly for the sharding, but it only enters the HOOK and is temporarily saved in it until all the shards have been split and reorganized, and the HOOK point will not flow out at one time, for NAT, the Netfilter processing result is undoubtedly "executed a series of actions", not just ACCEPT. In addition, I have written some modules that use Netfilter to implement traffic control. In turn, the TC framework can also implement the Netfilter function. In short, after you understand the design principles and the nature of these frameworks, you can use them and expand them to solve problems.
I personally think that for a separate Netfilter HOOK point, the TC framework is its superset, which is more flexible and complex to implement. The charm of Netfilter's TC lies in the definition of its HOOK point location.
Now, we will officially introduce the design of the TC framework.
Many of the information found on the Internet introduced TC, without exception, which is composed of "queue rules, categories, and filters". Most of them are vague, I dare say these are all from a document or a book. Few people understand the design of the TC framework from another perspective. This is a very challenging task. I personally prefer this kind of thing. Before introducing the queue organization of TC, let's first introduce what is called recursive control. The so-called recursive control is hierarchical control, and the control method for each layer is consistent. All those familiar with CFS scheduling know that both group scheduling and task scheduling adopt the same scheduling method. However, it is clear that the Group and task are at different levels, I drew the following figure to briefly describe this situation:




Not only is the organization of control logic, but even Linux uses this tree-like recursive control logic when implementing the UNIX process model. Each layer is a two-layer tree, showing this model:




It can be seen that recursive control is a fragment, and it would be better to display it with a three-dimensional graph. For example, every node except the leaf node is an independent small tree, whether it is a big tree or a small tree, the control logic or organizational logic is of the same nature.
Recursive control facilitates the arbitrary superposition of control logic, which we have seen in the design of the protocol stack, such as X over Y, XoY for short, such as PPPoE, IP over UDP (OpenVPN in tun mode), TCP/IP (native TCP/IP stack )... for TC, consider the following requirements:
1. Distribute the bandwidth to TCP and UDP according to the ratio;
2. In TCP traffic, the source IP address segment is divided into different priorities;
3. In the same priority queue, the bandwidth is allocated to the HTTP application and others according to the ratio;
4 ....

From the above requirements, we can see that this is a demand for Recursive control, where 1 and 3 both use the bandwidth ratio allocation, but obviously, this is of different levels. The entire architecture looks like this:




However, things are far from the imagination. Although the above figure shows the clues of the TC framework, it is not helpful to implement it. There are several typical problems: How do you identify data packets to different queues? What data structure should the non-leaf nodes in the figure present? Since it is not a real queue, there must be queue behavior, so how to express them ?...
In Linux, the "queue" is abstracted when TC is implemented. Basically, it maintains two callback function pointers: enqueue and dequeue. No matter whether it is enqueue or dequeue, it does not necessarily actually put data packets into the queue, but simply "executes a series of operations ". The "execute a series of operations" can be:
1. for leaf nodes, a data packet is actually queued into a real queue or pulled from the real queue;
2. recursively call the enqueue/dequeue of other abstract queues.

Note that the above 2nd point mentioned "other abstract queues", so how to locate this abstract queue? This requires a choice, that is, a selector, which classifies data packets into an abstract Queue Based on the characteristics of data packets. At this time, the design diagram of TC can be used to express the following:




As you can see, I didn't define the TC framework using the classic "queue rules, classes, and filters" triples, but explained it with the meaning of a recursive control. If you use the classic triplet to set it on this graph, it will look like the following. Note that I have deleted unnecessary text so that the graph will not be too messy. For more information about the text, see:




It can be seen that the true or false values are slightly the same.
Now let's talk a little bit about it. It's still related to Netfilter. Of course it's not a comparison with TC, but my personal thoughts. Once upon a time, I highly liked Cisco's ACLs because they were applied to NIC interfaces, while Netfilter blocks the processing paths instead of the processing devices. For Netfilter, the processing device is just a match without any special characteristics. No matter whether it is related or not, all data packets must go through the Netfilter HOOK point. At least you need to determine whether it matches-I ethX... I want to mount a filter_list on net_device and write some code. I found that the effect is good and I am ready to use it. I am a person who often repeats the wheel. After seeing the implementation of TC, I found that the TC framework was exactly what I was looking for. So I put it bluntly, it could be implemented using Netfilter, the same can be achieved with TC. In addition, TC is based on the queue rules (the data structure field is written in this way, Qdisc-queue discipline, which is not affected by the classic triple expression, abstract queuing/departure does not specify how to implement it, and the queue procedure is bound to the NIC (more specifically, the NIC queue-if the NIC supports multiple queues), rather than blocking the process path. So I have two options:
1. implement a new Qdisc with a built-in simple FIFO queue, enqueue operation for matches/target transplanted from Netfilter, and all ACCEPT data packets are discharged into the FIFO;
2. in the classifier, whether to classify data packets depends not only on the characteristics of data packets, but also an additional action callback function. Only when this function returns 0 indicates that the data packet is successful, you can perform any action (drop, nat, etc.) in it and close the door to lualu.

In the above 1 and 2, 2nd has been implemented, and the first point is easy to implement. You only need to implement a queue procedure, or add an action for each queue procedure, it looks like this:




For, it is relatively simple, and its essence is to write an article in the diamond. The enlarged diamond is shown in:




In this way, the TC framework is used to implement the functions of the firewall and NAT, which I have always wanted. In fact, I have known this for a long time, but I don't like TC commands very much, because it is too technical to be configured and extremely difficult to maintain, or even more difficult to maintain than iptables rules, maintenance is super important, and it is even more important than how you think about writing this rule, because how to write is a matter of instant. If you have enough accumulation, then you can solve it in an instant. If you encounter a problem, dare to say that the appearance of inspiration is also instantaneous. For example, after drinking, maintenance is a long-term task, the maintenance personnel are not necessarily yourself. You must consider it for others because the technical society is an altruistic society.
Okay, so far, I believe I have already said everything I should say. They are all framework-oriented and have no details in it. Although I do not like TC command lines very much, however, I still want to use the last figure to show the relationship between each TC command and the kernel data structure. There are still no details, and the command is not complete. match is omitted, because I know those are not important:




Looking at my article, it may be difficult for you to get something that can be copied and pasted directly. The code is omitted, and the command is omitted, even if it is myself, when I saw what I wrote many years ago, I wanted to quickly run something like this, but I didn't. However, I think that the idea is more than implementation. If you understand the nature behind implementation or reality, you will be comfortable and easy to use.

What protocol is TC/IP?

In Linux kernel 2.1.x and later versions, TC is the abbreviation of Traffic Control. The Chinese meaning is "traffic control" Traffic Control code to realize IP bandwidth allocation and management.

The framework and working principle of the linux driver

I. Concepts of Linux device driver

A system call is an interface between the operating system kernel and applications, and a device driver is an interface between the operating system kernel and machine hardware. The device driver shields the application from hardware details. In this way, the hardware device is only a device file, and the application can operate the hardware device like a common file. The device driver is part of the kernel and implements the following functions:

1. device initialization and release;

2. Transmit data from the kernel to the hardware and read data from the hardware;

3. Read the data transmitted by the application to the device file and send back the data requested by the application;

4. Detect and handle device errors.

In Linux, there are three main types of device files: character devices, Block devices, and network devices. The main difference between a character device and a block device is that when a read/write request is sent to a character device, the actual hardware I/O usually happens immediately after it is sent, it uses a piece of system memory as a buffer. When a user's process can meet the user's requirements for device requests, it will return the requested data. If not, you can call the request function to perform actual I/O operations. Block devices are designed for disks and other slow devices to avoid excessive CPU time consumption.

It has been mentioned that user processes are dealing with actual hardware through device files. Each device file has its file attributes (c/B), indicating whether it is a character device or a block device? In addition, each file has two device numbers. The first is the master device number, which identifies the driver, and the second is the slave device number, which identifies different hardware devices that use the same device driver, for example, if there are two floppy disks, you can use the device number to distinguish them. The master device number of the device file must be the same as the master device number applied by the device driver during registration. Otherwise, the user process will not be able to access the driver.

It must be mentioned that when a user's process calls the driver, the system enters the core State and is no longer preemptive scheduling. That is to say, the system must perform other work only after the sub-functions of your driver are returned. If your driver is in an endless loop, unfortunately you only have to restart the machine, and then there is a long fsck.

Ii. instance analysis

Let's write a simple character device driver. Although it does not do anything, it can be used to understand how the Linux Device Driver works. Input the following C code into the machine and you will get a real device driver.

Because a user process deals with hardware through a device file, the operation on the device file is similar to some system calls, such as open, read, write, close ..., Note: It's not fopen or fread. But how can we associate system calls with drivers? This requires an understanding of a very critical data structure:

STruct file_operatiONs {

Int (* seek) (struct inode *, struct file *, off_t, int );

Int (* read) (struct inode *, struct file *, char, int );

Int (* write) (struct inode *, struct file *, off_t, int );

Int (* readdir) (struct inode *, struct file *, struct dirent *, int );

Int (* select) (struct inode *, struct file *, int, select_table *);

Int (* ioctl) (struct inode *, struct file *, unsined int, unsigned long );

Int (* mmap) (struct inode *, struct file *, ...... the remaining full text>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.