Linux Controls Network QoS via TC (2) __linux

Source: Internet
Author: User

Let's take a look at how traffic control is implemented in the kernel first, when the kernel sends the data, it eventually calls to Dev_queue_xmit,

struct Qdisc *q

if (q->enqueue) {
rc = __DEV_XMIT_SKB (SKB, Q, Dev, txq);
Goto out;
}

If the Q->enqueue function is not empty, then the logic of the traffic control is entered, and the following calls the __DEV_XMIT_SKB

static inline int __dev_xmit_skb (struct sk_buff *skb, struct Qdisc,
struct Net_device *dev,
struct Netdev_queue *txq)

The function will judge the qdisc->state state, and if __qdisc_state_deactivated, then free SKB return to Net_xmit_drop. If the qdisc->state state is __qdisc_state_running and the Qdisc label has Tcq_f_can_bypass, send the packet directly. Otherwise call qdisc_enqueue_skb put SKB into root qdisc, then call Qdisc_run

Qdisc_run If you judge Qdisc->state as __qdisc_state_running, call __qdisc_run,

void __qdisc_run (struct qdisc *q)
{
unsigned long start_time = jiffies;

while (Qdisc_restart (q)) {
/*
* Postpone processing if
* 1. Another process needs the CPU;
* 2. We ' ve been doing it for too long.
*/
if (need_resched () | | | jiffies!= start_time) {
__netif_schedule (q);
Break
}
}

Clear_bit (__qdisc_state_running, &q->state);
}

__qdisc_run invokes Qdisc_restart until a jiffy is consumed or the CPU needs to be dispatched to another process (need_resched), at which point __netif_reschedule the current CPU softnet_ Data->output_queue handed to Qdisc->output_queue, and triggered a NET_TX_SOFTIRQ soft interrupt

The Qdisc_restart function is as follows:

static inline int Qdisc_restart (struct qdisc *q)
{
struct Netdev_queue *txq;
struct Net_device *dev;
spinlock_t *root_lock;
struct Sk_buff *skb;

/* DEQUEUE Packet * *
SKB = DEQUEUE_SKB (q);
if (unlikely (!SKB))
return 0;

Root_lock = Qdisc_lock (q);
dev = Qdisc_dev (q);
Txq = Netdev_get_tx_queue (Dev, skb_get_queue_mapping (SKB));

Return Sch_direct_xmit (SKB, Q, Dev, txq, root_lock);
}

Qdisc_restart takes a SKB out of Qdisc's head, calls Qdisc_lock gets a qdisc root lock, and then invokes Netdev_get_tx_queue to get the SKB queue based on the Netdev_queue hash, calling Sch_ Direct_xmit send the SKB directly.

Sch_direct_xmit first calls Qdisc_unlock to release the Qdisc root lock, invokes Dev_hard_start_xmit to send SKB by driver, and then judges the return value, and if it is NETDEV_TX_OK, returns QDISC_ Qlen qdisc Queue Length; if it is netdev_tx_locked, there is a lock exception, which is not discussed here; if Netdev_tx_busy is returned, invoke DEV_REQUEUE_SKB to re-enter the queue


Traffic control also supports controlling the package, which invokes NETIF_RECEIVE_SKB for the package, which invokes handle_ing. Handle_ing first to determine whether Skb->dev->rx_queue.qdisc is Noop_qdisc, if it is noop_qdisc then there will be no QoS control, otherwise call Ing_filter

static int ing_filter (struct sk_buff *skb)
{
struct Net_device *dev = skb->dev;
u32 ttl = G_tc_rttl (skb->tc_verd);
struct Netdev_queue *rxq;
int result = TC_ACT_OK;
struct Qdisc *q;

if (Max_red_loop < ttl++) {
PRINTK (kern_warning
"Redir loop detected dropping packet (%d->%d) \ n",
Skb->iif, Dev->ifindex);
return tc_act_shot;
}

Skb->tc_verd = Set_tc_rttl (Skb->tc_verd, TTL);
Skb->tc_verd = Set_tc_at (Skb->tc_verd, at_ingress);
RXQ = &dev->rx_queue;
Q = rxq->qdisc;
if (q!= &noop_qdisc) {
Spin_lock (Qdisc_lock (q));
if (Likely (!test_bit (__qdisc_state_deactivated, &q->state))
result = Qdisc_enqueue_root (SKB, q);
Spin_unlock (Qdisc_lock (q));
}
return result;
}

Ing_filter first Find Skb->dev->rx_queue.qdisc, if not noop_qdisc, and the state is not __qdisc_state_deactivated, then call Qdisc_enqueue_ Root, put the SKB into the queue


As you can see, if the network device supports traffic control, the driver's transceiver function must support Qdisc Enqueue/dequeue, from test results and code, Xen Netback can support, and bridge cannot support

For Xen Netback, there are two ways to limit the QoS of a virtual machine out of package, first of all, to know that the NetFront contract to the Netback, will trigger Netback net_tx_action Tasklet, which will call Net_tx_submit, Then finally call to NETIF_RX, remember this function is not, the kernel protocol stack processing Napi is called NETIF_RX to collect packets. NETIF_RX will call NETIF_RECEIVE_SKB, which will have a and handle_ing functions, to the ingress of the packet to do QoS, this is the first method, Netback do TC Qdisc ingress rules for traffic Control

The second way to limit the package is also mentioned earlier, is to do Mark horse on the bridge, and then in the physical mouth for Mark to do traffic control, because this is the egress rules, so you can use a variety of TC class to do.

Then netback the virtual machine into the package. For the packet, first from the physical port into the network Bridge, and then from the network Bridge into the Netback, the network Bridge to exchange packets to network equipment, the process will call Br_forward_finish, and then call Br_dev_queue_push_xmit through Dev_queue_ XMit "Send" the bag out. Therefore, limit the virtual machine packet QoS, just set the TC egress rule on the netback. Because Netback is a superset of the Net_device device, the Dev_queue_xmit method is supported and the Qdisc rule for egress direction is supported


Finally, look at the relevant settings for Ingress QoS, the kernel comes with the Ingress Qdisc function is very simple, only support the most basic rate limit, for excess traffic is drop, A good practice is to redirect the Ingress traffic to this device through a virtual network device, and then set the traffic control rule on this virtual device.

Modprobe IFB

IP link Set ifb0 up

IP link Set ifb1 up

The IFB device is used by the kernel to do traffic control redirect, and after the driver is loaded, you can set and egress the same rules for the device

TC Qdisc Add dev ifb0 root handle 1:htb default 100

TC class Add dev ifb0 parent 1:classid 1:10 HTB rate 100mbit ceil 100mbit

TC Filter Add dev ifb0 parent 1:protocol IP prio 1 u32 match ip src 0.0.0.0/0 flowid 1:10 #所有流量都match classid 1:10

Finally, we need a rule to redirect the Peth0 into the ifb0 device.

TC Qdisc Add Dev Peth0 ingress

TC Filter Add dev peth0 parent ffff:protocol ip u32 match u32 0 0 action mirred Egress Dev redirect



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.