Classification and scheduling of data packets another explanation for-linux TC

Source: Internet
Author: User

If you understand the TC framework of Linux from the perspective of hierarchical recursion, it is easy to divide the queue into class queue and non-class queue, which is equal in the position of class queue and no class queue. But in fact, there is a hierarchical relationship between them. The reason that it is divided into class queues and classless queues is fully realized, and you can see that the implementation of the TC framework is very compact, based on the recursive "queuing rules, categories, filters" ternary group. But without implementation, we need a more rational way to thoroughly understand packet scheduling.
1. Packet scheduling packet scheduling is a layer, isolating the network card Driver Transceiver module and protocol stack. In other words, the packet from the protocol stack is not directly into the network card driver transmission module, but into the packet scheduling layer, and then the driver module timely from the dispatch layer out of the packet to send, the same, for packet reception is the same. You can think of this packet scheduling layer as a buffer between the protocol stack and the NIC, and if you are familiar with the IO process of the UNIX block device file, you can interpret the dispatch layer as a block device buffer. Block device According to Buffer's metadata solves the random IO and the first served problem that the data actually enters the medium, and the network card device relies on this packet dispatch layer to solve the traffic control problem.
It is this middle layer that makes the NIC device more like a block device rather than a simple FIFO streaming character device. At this scheduling level, packets can be reordered, discarded, blocked, and so on to achieve speed-limiting or shaping operations on packets or data streams. This level of isolation, whether on a NIC or a block device, can be calledstrategy module, that is, strategy! In fact, it's a role that, for a network packet, the upper stack, regardless of "how the packet is sent out", only needs to complete the protocol stack level encapsulation, sending the packet tostrategy layer, the task is over. For the network card, it does not care about the protocol stack, whether the packet is through the protocol stack down, or by other means injected, it only knows fromThe strategy layer pulls away a "best-worth-send" packet, and how to pull a "best-worth-send" packet isthe embodiment of strategy, is alsostrategy layer is the key to the implementation of the packet scheduling layer!
Familiar with the process scheduling should understand what is scheduling, the schedule in English has the concept of a timetable, meaning that in the present or future time to do something, for the process scheduling, is in the process table to find a most worthwhile running process to put into operation, specific to how to find, that is the dispatch ofstrategy embodiment, for modern Linux, is the scheduling class Rt,fair, for the former using the FIFO, such as scheduling, for the latter using the CFS algorithm scheduling, each process at the beginning of the creation or after the API can be in a certain scheduling class, Once the scheduling class is returned, it is necessary to use the scheduling algorithm of the dispatch class when it is the scheduling class scheduling process.
Understanding the scheduling of the process is very helpful to understand the scheduling of the packet, in fact, the only difference between the scheduling of the packet and the process scheduling is the difference in the scheduling entity, it means "find the most worthy to send [using IMQ or IFB can implement ingress scheduling] packets", the basic scheduling method is as follows.
1.1. Scheduling according to FIFO rules
The simplest scheduling method, maintaining a queue, first-in, first-out, as shown below:




1.2. According to priority scheduling slightly more complicated, maintain multiple queues, each priority one, the scheduling algorithm first select a highest priority queue, and then execute the FIFO algorithm in the queue, this is a two-level scheduling algorithm, the diagram is as follows:




1.3. Random fair dispatch and priority scheduling instead, it guarantees that all packets have the same send opportunity, it also maintains multiple queues, each queue is a hash bucket, each packet is hashed into these hash buckets according to the specified key value, and the scheduling algorithm dispatches the hash queue in order from left to right. Each queue executes the FIFO algorithm, the point is that the hash algorithm will change at each interval time, the following diagram:




1.4. Any other Linux kernel that you can think of has not yet implemented the scheduling method and improved algorithms that have been implemented.
In addition to packet scheduling, there is a basic concept, which is traffic shaping. The basic way of shaping is the token bucket, note that the purpose of shaping is not to find the most worthwhile packet to send, but rather to determine whether a packet that has arrived can be sent, because it should not be classified as a packet scheduling category. The difference between scheduling and shaping is that scheduling means that there are already a lot of packets blocking there, picking one that is most worth sending, instead, shaping refers to the fact that there is no packet queue, the packet can be sent directly, and the shaping logic to decide whether it can be sent immediately. The same point in scheduling and shaping is that it is called when selecting a packet to be sent (or received) for the NIC.
1.5. Token abstraction We all know that the token bucket is a powerful tool for traffic shaping and is used by almost all network devices, so it must have its own special place. When it comes to the token bucket, you have to ask the question of how to speed up a data stream, where the stream can be arbitrarily defined, and you can consider it as a collection of packets within a FIFO queue or a collection of packets of classic five-tuple definitions. It can also be a collection of packets with the same hash value, based on a few fields of the packet. The most obvious speed limit is to record the statistics of a data stream, and then follow this information for speed limits, such as recording the last time a data stream was sent and the amount of data sent, and then the historical data and the history of the sending time point weighted averaging, the weight of course, the longer the smaller, When the packet arrives at a speed limit system choice can be sent, through the current time and the time of the last calculation to make a difference, and then the total amount of data divided by this difference, calculate a rate as a benchmark to weigh whether the data flow has been speeding. In fact, the CBQ queue for Linux TC is the way to use it. But even the author himself is not very satisfied, so there is HTB, of course, this is something.
However, this calculation is extremely inaccurate, the implementation of the protocol stack of the manipulation system, the implementation of the NIC driver, and the effect of the timer implementation is very great, so we need another abstraction as a reference, this is the token. Tokens at a certain rate in bytes or bits in a uniform speed into the token bucket, when the packet arrives, only need to see if there is enough tokens inside the token bucket, if there is a description can be sent. Because the token bucket is a container, you can accumulate tokens, so you can easily meet the needs of burst data.

Note that, at this point, we just describe how the packet is dispatched, that is, how to find the packet that is most worth sending, but there is a basic premise that the packet is already there, and in the example, I use the basic data structure of the queue, which is actually the queue used. Another problem is not solved, that is how the packet is in the queue, for the process scheduling, the process is created or run, you can call the Setscheduler system to put the process into a schedule class of the list or queue or red-black tree in order to let the process scheduling module scheduling, But what about packets? There must also be such a mechanism, which can be collectively referred to as queued rules. So far, the overall picture of packet scheduling is as follows:




It can be seen that the separation of packet classification and packet scheduling is beneficial, that is, the packet scheduling system can concentrate on the completion of their own algorithm scheduling details, without having to identify the data flow, the identification and classification of data flow by the upper level of the dispatch system to complete.
2. The upper layer of the packet scheduling in the 1th section, we talked about the meaning of "scheduling", and the process of scheduling to do a more detailed analogy, but so far, not mentioned and process scheduling system in the corresponding concept of scheduling, but only the details of the scheduling algorithm, in the process scheduling, the kernel implementation of the scheduling class, " The process that belongs to a scheduling class will be scheduled to run according to the same algorithm ", corresponding to the packet scheduling, there is also the concept of a class, that is, all packets belonging to a packet class, will participate in the scheduling according to the same algorithm. As the dispatch class is at the top of the process scheduling, the packet classification is also at the top of the packet scheduling.
For the scheduling class in the process scheduling, the various scheduling classes are unequal, before scheduling the real process, the first to dispatch between the scheduling class, that is, to select a scheduling class, and then in the scheduling of all the processes belonging to the dispatch class. Similarly, the packet according to its characteristics of the category is also unequal, first to the various categories of scheduling, its scheduling algorithm is not the same as the data packet scheduling algorithm that several, the difference is that, as the dispatch class at the top of the process scheduling, packet scheduling is also in the upper layer of packet scheduling.
2.1. Packet classification-The process of queuing is based on "packet scheduling" to achieve accurate scheduling and shaping. The question is how to queue a packet to the individual dispatch queues described in section 1th. For the upper layer of packet scheduling, I collectively refer to the queuing rules, note that this queueing rule and the TC document in the queue rules are completely different, here is the queue rule refers to the packet into the dispatch system, until eventually queued to a queue of all the rules, I think this will be more than the recursive "queuing rules, categories, Filters are easier, after all, in addition to the final queue, the middle process just determines which branch the packet will go to next, not the real queue.
So, the entire packet scheduling of the upper layer is a series of decisions, the final draw a queue to reach the end of a unique path, according to graph theory and the realization of efficiency to understand, to determine the only path of the best diagram is the tree, from the root to a leaf node, there is a unique path. So, the process of this series of decisions is precisely the process of getting packets down to the leaves, and the only problem with decision points is how the packets will branch after they reach the branches of the data. Obviously, this tree can be N-fork, and the height of each branch is not necessarily the same, as long as it can reach a leaf node represents the dispatch queue.
The specific choice of which branch of the packet to which the action is built into the middle node, each intermediate node of the decision algorithm can be the same or different, which forms a hierarchical recursive structure tree, as shown in:




According to this understanding, the Linux TC framework of the queue logic is much simpler, but it and the classical ternary interpretation is the same. We can configure a filter on each of the intermediate nodes, if it has a different choice algorithm with the Father filter, then redefine a qdisc, although the name is always easy to be misunderstood in the queue (note that it is correct in the course of the team, because the team is different from the queue, It is recursive). Corresponding to the classic "queuing rule, classification, filter" ternary group, the classification represents a child node under a subtrees tree, while the filter is used to select which child node of the following layer is selected according to the characteristics of the packet.
So, the Linux HTB rule is the most classic, it itself to minimize the misunderstanding of the occurrence, because it strives to make all the branch choice algorithm consistent (because it can be divided into multiple layers), it is actually in each of the nodes have placed a token bucket, This allows you to control the rate of packets entering any branch, and note that these tokens are not used during the queue and are used when the team is out.
2.2. Scheduling-The process of team-out if the process of data packet queue is to open a unique path from root to leaf node, then the process of the team-out is the recursive scheduling process in each layer of the tree. This is why the TC framework of the Linux kernel uses the "queuing rules, classifications, filters" ternary group to describe the TC frame. However, please note that it is not possible to literally understand that the final scheduling and classification of the packet is not related, the classification is only the behavior of the filter based on the characteristics of the packet, as well, the final scheduling and queuing rules of the packet is not related, queuing is only a queue behavior, and scheduling is a team behavior.
If the queue operation is to pick a branch of a packet at each node according to the filter configuration and finally into the leaf node real queues, then the outbound operation is a self-root start in each node according to the scheduling algorithm pick a branch, go to the leaf node to take out a packet process. Whether the queue operation or the operation is from the root to the leaves, the closer the root of the node to participate in the classification and scheduling.
From the process of the team, we can see that this is a process of packet scheduling in each layer of each subtrees tree according to the scheduling algorithm defined by the root of the subtree, as in the queue process, which is also a hierarchical recursive structure tree, as shown in:




According to this understanding, the Linux TC Framework's out-of-line logic can be called a new ternary group, namely "scheduling rules, scheduling entities, scheduling algorithm", wherein the scheduling entity is the tree of each node, of course, also includes leaf nodes, for the leaf node, scheduling entities can be in any data structure rather than the tree node ( Because it will not have any subtrees), the scheduling algorithm chooses which scheduling entity is selected on the following layer.
3.TC scheduling system for process scheduling, the kernel will be called scheduling algorithms and scheduling classes collectively known as the process scheduling system, the same, packet classification and packet scheduling can also be collectively referred to as packet scheduling. We have divided the entire packet scheduling module into the scheduling module and the upper Access Team module, which can be divided into the team module and the team two process, for the team process, the new introduction of a "scheduling rules, scheduling entities, scheduling algorithm" ternary group and the queue process of the classic "queuing rules, classification, filter" This is good for understanding the Linux TC Framework. The entire Linux TC framework is structured as follows:




I have never mentioned Ingress flow control, because in the implementation of the Linux TC Ingress this point can not have a queue, this is said to control the charge. However, this does not mean that Linux cannot achieve ingress flow control.

Classification and scheduling of data packets another explanation for-linux TC

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.