This article describes the design and implementation of QoS support in Linux 2.4.x kernel, and detailed analysis of the default packet scheduling mechanism pfifo.
In traditional TCP/IP routers, all IP data packets are transmitted using a FIFO (first-in-first-out) mechanism. In the early days when the network data volume and key business data volume were not large, the router simply discards the data to deal with congestion. However, with the development of computer networks, the rapid growth of data volumes, and the increase in latency requirements for multimedia and VOIP data applications. The router simply discards data packets and the processing method is no longer suitable for the current network. Simply increasing network bandwidth cannot fundamentally solve the problem. Therefore, network developers propose the concept of service quality. To sum up, we provide network service functions with different service quality for different needs. QoS is a basic requirement for future IP networks.
1. Linux Kernel support for QoS
The Linux kernel network protocol stack starts from 2.2.x and implements the service quality support module. The specific code is located in the net/sched/directory. In Linux, this function module is called traffic control (TC.
First, let's take a look at the general process of sending data packets when the Linux network protocol stack does not have a TC module. 1.
Note: The layering is based on Linux implementation and does not strictly abide by the OSI hierarchy.
It can be seen that, without TC, each packet will call dev_queue_xmit for sending, and then determine whether the packet content needs to be transmitted to the af_packet protocol support body, finally, the system directly calls the sending function registered by the NIC Driver to send data packets. The mechanism for sending data packets is the FIFO mechanism described in this article. Once congestion occurs, the protocol stack only tries its best to call the NIC sending function. Therefore, this traditional processing method has many drawbacks.
To support QoS, Linux designers added the TC module to the Code for sending data packets. This allows you to classify, manage, detect, and process data packets. In order to avoid conflicts with previous code and allow users to choose whether to use TC. The kernel developer adds the TC module between two red circles. (In fact, in the TC module, packet sending also supports the af_packet protocol. This article separates the af_packet protocol in two places for ease of description ).
The following code analyzes the support for the TC module.
Net/CORE/dev. C: dev_queue_xmit function contains some code:
Int dev_queue_xmit (struct sk_buff * SKB ){................... Q = Dev-> qdisc; If (Q-> enqueue) {/* If the device starts TC, press the data packet into the queue */int ret = Q-> enqueue (SKB, q);/* Start the device to send */qdisc_run (Dev); return ;} if (Dev-> flags & iff_up ){............. If (netdev_nit) dev_queue_xmit_nit (SKB, Dev);/* support for the af_packet Protocol */If (Dev-> hard_start_xmit (SKB, Dev) = 0) {/* call the NIC Driver to send the function to send the data packet */return 0 ;}}..................}
|
From the code above, we can see that when Q-> enqueue is false, TC is not used, but this packet is directly sent. If it is true, QoS processing is performed on the data packet.
2. Specific Design and Implementation of TC
Section 1 describes how the Linux kernel supports QoS and how to add the TC module based on previous code. This section describes the design and implementation of TC in detail.
QoS has many congestion mechanisms, such as FIFO queueing, PQ, CQ, and wfq. QoS also requires the ability to handle different congestion for each interface. To implement the above functions, Linux uses an object-based implementation method.
Is a model diagram of the data sending queue management mechanism. The QoS policies can be different congestion handling mechanisms. We can regard this policy as a class and a strategy class. In implementation, this class has many instance objects and Policy objects. You can use different objects to manage data packets. There are many methods for the strategy class. Such as enqueue, dequeue, requeue, init, and destroy. In Linux, The qdisc_ops struct is used to represent the policy class described above.
As mentioned above, each device can adopt different policy objects. Therefore, there must be a bridge between the device and the object so that the device is related to the object used by the device. In Linux, The qdisc struct serves as a bridge.
Through the above description, the entire TC architecture will also come out. For example:
After TC is added, the process of sending data packets should be as follows:
(1) The upper-layer protocol starts to send data packets.
(2) obtain the policy object used by the current device
(3) Call the enqueue method of this object to push data packets into the queue
(4) Call the dequeue method of this object to retrieve data packets from the queue
(5) Call the sending function sent by the NIC Driver
Next, we will analyze how TC installs policy objects for each device from the code.
During Nic registration, register_netdevice is called to install qdisc and qdisc_ops on the device.
Int register_netdevice (struct net_device * Dev ){...................... Dev_init_scheduler (Dev );......................} Void dev_init_scheduler (struct net_device * Dev ){............. /* The qdisc of the installation device is noop_qdisc */dev-> qdisc = & noop_qdisc ;............. Dev-> qdisc_sleeping = & prop; disconnect (Dev);} at this time, the NIC device has just been registered and has not been up yet, using noop_qdisc, struct qdisc noop_qdisc = {noop_enqueue, tcq_f_builtin, & prop,}; noop_qdisc adopts the following Packet Processing Methods: bytes, struct qdisc_ops queues = {null, null, "Noop", 0, noop_enqueue, noop_dequeue, noop_requeue ,};
|
From the definition of noop_enqueue, noop_dequeue, and noop_requeue functions, we can see that they did not classify or queue data packets, but directly released SKB. Therefore, the NIC device cannot send any data packets. Data packets can be sent only after ifconfig is up.
When ifconfig up is called to start the NIC device, the dev_open function is called.
Int dev_open (struct net_device * Dev ){................ Dev_activate (Dev );.................} Void dev_activate (struct net_device * Dev ){............. If (Dev-> qdisc_sleeping ==& noop_qdisc) {qdisc = qdisc_create_dflt (Dev, & pfifo_fast_ops);/* install the default qdisc */}............... If (Dev-> qdisc = Dev-> qdisc_sleeping )! = & Noqueue_qdisc ){................ /*. Install a specific qdisc */}.................}
|
After the device is started, the default qdisc-> ops of the current device is pfifo_fast_ops. If you need to use different ops, you need to install other qdisc for the device. In essence, the dev-> qdisc pointer is replaced. See the dev_graft_qdisc function of sched/sch_api.c.
Static struct qdisc * dev_graft_qdisc (struct net_device * Dev, struct qdisc * qdisc ){............... Oqdisc = Dev-> qdisc_sleeping;/* First Delete the old qdisc */If (oqdisc & atomic_read (& oqdisc-> refcnt) <= 1) qdisc_reset (oqdisc ); /* install the new qdisc */If (qdisc = NULL) qdisc = & noop_qdisc; Dev-> qdisc_sleeping = qdisc; Dev-> qdisc = & noop_qdisc; /* Start the newly installed qdisc */If (Dev-> flags & iff_up) dev_activate (Dev );.....................}
|
From dev_graft_qdisc, we can see that if you want to use a new qdisc, you must first Delete the old one and then install the new one so that Dev-> qdisc_sleeping is the new qdisc, then call the dev_activate function to start the new qdisc. Combined with the statements in the dev_activate function:
if ((dev->qdisc = dev->qdisc_sleeping) != &noqueue_qdisc)
|
We can see that Dev-> qdisc at this time refers to the new qdisc. (Note: The left side of the preceding statement is a value assignment statement .)
When the NIC is down, call dev_close-> dev_deactivate to change the qdisc of the device to noop_qdisc and stop sending data packets.
All QoS policies in Linux are finally installed through the above method. In sch_api.c, The dev_graft_qdisc function encapsulates a layer of functions (register_qdisc) for the module to install the new qdisc. For example, the red (early detection Queue) module calls register_qdisc to install the red object (net/sched/sch_red.c-> init_module ()).
In Linux, if no specific QoS policy is configured after the device is started, the kernel uses the Default policy pfifo_fast_ops for each device. The following pfifo_fast_ops is used for detailed analysis.
The information in can correspond to each part of the pfifo_fast_ops struct:
Static struct qdisc_ops pfifo_fast_ops = {null, null, "pfifo_fast",/* ops name */3 * sizeof (struct sk_buff_head),/* data packet SKB queue */pfifo_fast_enqueue, /* queue function */pfifo_fast_dequeue,/* queue function */pfifo_fast_requeue,/* queue function */null, pfifo_fast_init,/* queue management initialization function */pfifo_fast_reset, /* queue management reset function */};
|
When registering pfifo_fast_ops, you will first call pfifo_fast_init to initialize queue management. See the qdisc_create_dflt function.
Static int pfifo_fast_init (struct qdisc * qdisc, struct rtattr * OPT ){......... For (I = 0; I <3; I ++) skb_queue_head_init (list + I);/* initialize three priority queues */..........}
|
The init function initializes three queues.
When a qdisc is deregistered, the reset function of qdisc ops is called. See the dev_graft_qdisc function.
Static voidpfifo_fast_reset (struct qdisc * qdisc ){.............. For (PRIO = 0; PRIO <3; PRIO ++) skb_queue_purge (list + PRIO);/* release all data packets in three priority queues */..............}
|
The qdisc-> enqueue function is called when the data packet is sent. (In the qdisc_create_dflt function, the enqueue, dequeue, and requeue functions of qdisc_ops are assigned to the function pointers corresponding to qdisc respectively ).
Int dev_queue_xmit (struct sk_buff * SKB ){................... Q = Dev-> qdisc; If (Q-> enqueue) {/* Corresponds to the pfifo_fast_enqueue function */int ret = Q-> enqueue (SKB, q ); /* start sending the device. Two functions are involved: pfifo_fast_dequeue and pfifo_fast_requeue. */qdisc_run (Dev); Return ;}...............}
|
Input Queue function pfifo_fast_enqueue:
Static intpfifo_fast_enqueue (struct sk_buff * SKB, struct qdisc * qdisc ){.............. List = (struct sk_buff_head *) qdisc-> data) + prio2band [SKB-> priority & tc_prio_max];/* First, determine the priority of the data packet, determine the queue to be placed */If (list-> qlen <= SKB-> Dev-> tx_queue_len) {__ skb_queue_tail (list, SKB ); /* put the data packet to the end of the queue */qdisc-> q. qlen ++; return 0 ;}.................}
|
After the data packet is put into the queue, call qdisc_run to send the data packet.
static inline void qdisc_run(struct net_device *dev){while (!netif_queue_stopped(dev) && qdisc_restart(dev)<0)/* NOTHING */;}
|
In the qdisc_restart function, a data packet is first retrieved from the queue (the pfifo_fast_dequeue function is called ). Then call the NIC-driven sending function (Dev-> hard_start_xmit) to send the data packet. If the sending fails, the data packet must be re-pushed to the queue (pfifo_fast_requeue ), then, enable the Soft Interrupt of sending the protocol stack for sending again.
static struct sk_buff *pfifo_fast_dequeue(struct Qdisc* qdisc){…………..for (prio = 0; prio < 3; prio++, list++) {skb = __skb_dequeue(list);if (skb) {qdisc->q.qlen--;return skb;}}……………….}
|
From the dequeue function, we can see that the pfifo policy is to retrieve data packets from the high-priority queue. Only when the queue with the highest priority is empty will the next queue be processed.
The requeue function re-presses the data packet into the header of the corresponding priority queue.
Static intpfifo_fast_requeue (struct sk_buff * SKB, struct qdisc * qdisc) {struct sk_buff_head * List; List = (struct sk_buff_head *) qdisc-> data) + prio2band [SKB-> priority & tc_prio_max];/* determine the queue with the corresponding priority */_ skb_queue_head (list, SKB ); /* press the data packet into the queue header */qdisc-> q. qlen ++; return 0 ;}
|
Back to Top
Summary: QoS is a hot topic at present. Almost all high-end network devices support QoS, and this function is also a key technology for competition between network devices. Linux supports QoS from 2.2.x kernel in order to have a place on high-end servers. Based on the code of Linux 2.4.0, this article analyzes how Linux supports QoS. The implementation of the default queue processing method pfifo in Linux kernel is analyzed.