Linux kernel packet forwarding process (iii) network card frame Reception analysis

Source: Internet
Author: User

"Copyright notice: Reprint please keep Source: Blog.csdn.net/gentleliu. E-mail: shallnew*163.com "

Each CPU has a queue to handle the received frames, all have its data structure to handle ingress and egress traffic, so there is no need to use a locking mechanism between different CPUs. This queue data structure is softnet_data (defined in include/linux/netdevice.h):

/* * Incoming packets is placed on PER-CPU queues so that * no locking is needed. */struct softnet_data{struct qdisc *output_queue; struct sk_buff_headinput_pkt_queue;//list of devices with data to transfer struct LIST_ Headpoll_list; A doubly linked list, where the device has an input frame and so on is processed. A struct sk_buff*completion_queue;//buffer list, where the buffer has been successfully transmitted and can release the struct Napi_structbacklog;};

This structure field can be used for transport and reception. In other words, both the NET_RX_SOFTIRQ and the Net_tx_softirq soft IRQ refer to this structure. Ingress frames are queued to input_pkt_queue (Napi differ).


Softnet_data is initialized in the Net_dev_init function:
/* *       called single threaded during boot, so no need * to take the       rtnl semaphore. */static int __init net_ Dev_init (void) {int I, rc =-enomem;....../** initialise the packet receive QUEUES.*/FOR_EACH_POSSIBLE_CPU (i) {struct soft Net_data *queue;queue = &per_cpu (Softnet_data, i); Skb_queue_head_init (&queue->input_pkt_queue); queue- >completion_queue = Null;init_list_head (&queue->poll_list); queue->backlog.poll = Process_backlog; Queue->backlog.weight = weight_p;queue->backlog.gro_list = Null;queue->backlog.gro_count = 0;} ... Open_softirq (NET_TX_SOFTIRQ, net_tx_action); Open_softirq (NET_RX_SOFTIRQ, net_rx_action); ...}
The non-NAPI device driver generates an interrupt event for each frame it receives, and at high traffic loads it spends a lot of time processing the interrupt event, resulting in a waste of resources. While the NAPI driver mixes interrupt events and polling, its performance is better than the old method under high traffic loads.
Napi's main idea is to mix interrupt events and polling instead of just using the interrupt event-driven model. When a new frame is received, the interrupt is closed and all the ingress queues are processed again. From the kernel point of view, the Napi method reduces the CPU load due to fewer interrupt events.
The XX_RX () function that uses a non-NAPI driver is generally as follows:
void Xx_rx () {struct Sk_buff *skb;skb = DEV_ALLOC_SKB (Pkt_len + 5); if (SKB! = NULL) {Skb_reserve (SKB, 2);/* Align IP on 16 byte Boundaries *//*memcpy (Skb_put (SKB, 2), PKT, Pkt_len); *///copy data to Skbskb->protocol = Eth_type_trans (SKB, Dev) ; Netif_rx (SKB);}}
The first step is to allocate a buffer to hold the message. Note the cache allocation function (DEV_ALLOC_SKB) needs to know the length of the data.

The second step is to copy the message data to the buffer; The Skb_put function updates the end-of-data pointer in the cache and returns a pointer to the new space.

The third step is to extract the Protocol ID and obtain additional information.

Finally call NETIF_RX (SKB) for further processing, which is generally defined in NET/CORE/DEV.C.

int Netif_rx (struct sk_buff *skb) {struct Softnet_data *queue;unsigned long flags;/* If Netpoll wants it, pretend we never Saw It */if (Netpoll_rx (SKB)) return net_rx_drop;if (!skb->tstamp.tv64) Net_timestamp (SKB);/** the code is rearranged So, the the path was the most* short when the CPU was congested, but was still operating.*/local_irq_save (flags); queue = &__g Et_cpu_var (Softnet_data); __get_cpu_var (Netdev_rx_stat). Total++;if (Queue->input_pkt_queue.qlen <= netdev_ Max_backlog) {//whether there is still space, Netdev_max_backlog generally 300//only if the new buffer is empty, the soft interrupt (Napi_schedule ()) will be triggered, if the buffer is not empty, the soft interrupt has been triggered, There is no need to trigger again. if (Queue->input_pkt_queue.qlen) {enqueue:__skb_queue_tail (&queue->input_pkt_queue, SKB);//This is the key point, Add SKB to Input_pkt_queue. Local_irq_restore (flags); return net_rx_success;} Napi_schedule (&queue->backlog);//Trigger soft interrupt goto enqueue;} __get_cpu_var (Netdev_rx_stat). Dropped++;local_irq_restore (flags); KFREE_SKB (SKB); return net_rx_drop;} Export_symbol (NETIF_RX);

static inline void Napi_schedule (struct napi_struct *n) {if (Napi_schedule_prep (n)) __napi_schedule (n);}

void __napi_schedule (struct napi_struct *n) {unsigned long flags;local_irq_save (flags); List_add_tail (&n->poll _list, &__get_cpu_var (softnet_data) poll_list);//joins the device to the polling list, waiting for the frame of the device to be processed __raise_softirq_irqoff (NET_RX_SOFTIRQ );//FINAL trigger soft interrupt local_irq_restore (flags);} Export_symbol (__napi_schedule);

The top half of the interruption is done, and the rest is left to the lower part. The Napi_schedule (&queue->backlog) function will have the NIC that waits to receive the packet into the Softnet_data poll_list queue, then trigger the soft interrupt, allowing the lower half to complete the processing of the data.
Instead, a soft interrupt is triggered directly using the NAPI device's acceptance data, and no need to set up the receive queue through the NETIF_RX () function to trigger a soft interrupt. For example, the E100 hard interrupt handler function is:

static irqreturn_t e100_intr (int irq, void *dev_id) {struct Net_device *netdev = dev_id;struct nic *nic = Netdev_priv (netde v); U8 Stat_ack = Ioread8 (&nic->csr->scb.stat_ack);DP rintk (INTR, DEBUG, "stat_ack = 0x%02x\n", stat_ack); if ( Stat_ack = = Stat_ack_not_ours | | /* Not our interrupt *   /Stat_ack = = stat_ack_not_present)//Hardware is ejected */return irq_none;/* ack Interrupt (s) */iowrite8 (Stat_ack, &nic->csr->scb.stat_ack);/* We hit Receive No Resource (RNR); Restart RU after cleaning */if (Stat_ack & stat_ack_rnr) nic->ru_running = ru_suspended;if (Likely (napi_schedule_ Prep (&NIC->NAPI))) {E100_DISABLE_IRQ (NIC); __napi_schedule (&NIC->NAPI);//Trigger soft interrupt}return irq_handled ;}
We already know in the Net_dev_init () function that the ticker soft interrupt function net_rx_action () is registered, and the function is called when the soft interrupt is triggered.
The Net_rx_action () function is:

static void Net_rx_action (struct softirq_action *h) {struct List_head *list = &__get_cpu_var (softnet_data). Poll_ list;unsigned Long time_limit = jiffies + 2;int budget = Netdev_budget;void *have;local_irq_disable (); while (!list_empty ( list) {struct napi_struct *n;int work, weight;/* If SOFTIRQ window was exhuasted then punt. * Allow this to run for 2 Jiff IES since which would allow * an average latency of 1.5/hz. */if (Unlikely (Budget <= 0 | | time_after (jiffies, Time_limit))//Ingress queue still has buffers and soft IRQ is scheduled to execute again. Goto softnet_break;local_irq_enable ();/* Even though interrupts has been re-enabled, this * access is safe because Interr  Upts can only add new * entries to the tail of this list, and only->poll () * Calls can remove this head entry from the List. */n = List_entry (list->next, struct napi_struct, poll_list), with = Netpoll_poll_lock (n); weight = n->weight;/* this  Napi_state_sched test is for avoiding a race * with Netpoll ' s Poll_napi (). Only the entity which * obtains the lock and Sees Napi_state_sched set would * actually make the->poll () call. Therefore We avoid * accidently calling->poll () when NAPI was not scheduled. */work = 0;if (Test_bit (napi_state_sched, &n->state)) {work = N->poll (n, weight);//Execute poll function, return processed frames Trace_ Napi_poll (n);} Warn_on_once (Work > weight); budget-= work;local_irq_disable ();/* Drivers must not modify the NAPI state if they * cons  Ume the entire weight.  In such cases this code * still ' owns ' the NAPI instance and therefore can * move the instance around on the list At-will. */if (unlikely (work = = weight)) {//queue is emptied. Call Napi_complete () to be responsible for the matter. if (Unlikely (Napi_disable_pending (n))) {local_irq_enable (); Napi_complete (n); local_irq_disable ();} elselist_move_ Tail (&n->poll_list, list);} Netpoll_poll_unlock (have);} Out:local_irq_enable (); #ifdef config_net_dma/* * There May is no more sk_buffs coming right now, so push * any Pendin G DMA copies to hardware */dma_issue_pending_all (); #endifreturn; Softnet_break:__get_cpu_var (nEtdev_rx_stat). Time_squeeze++;__raise_softirq_irqoff (NET_RX_SOFTIRQ); goto out;} 
By the visible, the main work of the lower half is to traverse the list of devices that have data frames waiting to be received, and for each device, execute its corresponding poll function.
For non-NAPI devices, the poll function is initialized to Process_backlog () in the Net_dev_init () function.
The Process_backlog () function is defined as:

static int Process_backlog (struct napi_struct *napi, int quota) {int work = 0;struct Softnet_data *queue = &__get_cpu_v AR (softnet_data); unsigned long start_time = Jiffies;napi->weight = weight_p;do {struct Sk_buff *skb;local_irq_ Disable (); SKB = __skb_dequeue (&queue->input_pkt_queue); if (!SKB) {__napi_complete (NAPI); local_irq_enable (); break;} Local_irq_enable (); netif_receive_skb (SKB);} while (++work < quota && Jiffies = = start_time); return work;}

For NAPI devices, the driver must provide a poll method, and the poll method has the following prototype:
Int (*poll) (struct napi_struct *dev, int *budget);
This method needs to be added at initialization time:
Netif_napi_add (Netdev, &nic->napi, Xx_poll, xx_napi_weight);

Napi-Driven poll method implementation is generally as follows (borrow "Linux device Driver" code, the kernel is a bit not right, too lazy to write):
static int Xx_poll (struct net_device *dev, int *budget) {int npackets = 0, quota = min (Dev->quota, *budget);    struct Sk_buff *skb;    struct Xx_priv *priv = Netdev_priv (dev);    struct Xx_packet *pkt;        while (Npackets < quota && priv->rx_queue) {pkt = Xx_dequeue_buf (dev);        SKB = DEV_ALLOC_SKB (Pkt->datalen + 2); if (! Skb) {if (Printk_ratelimit ()) PRINTK (kern_notice "Xx:packet dropped\n"); priv->stats. rx_dropped++; Xx_release_buffer (PKT);        Continue        } memcpy (Skb_put (SKB, Pkt->datalen), Pkt->data, Pkt->datalen);        Skb->dev = Dev;        Skb->protocol = Eth_type_trans (SKB, Dev); skb->ip_summed = checksum_unnecessary;        /* don ' t check it */NETIF_RECEIVE_SKB (SKB);        /* Maintain stats */npackets++;        priv->stats.rx_packets++;        Priv->stats.rx_bytes + = pkt->datalen;    Xx_release_buffer (PKT); }/* If We processed all packets, we ' re done;    Tell the kernel and reenable ints */*budget-= npackets;    Dev->quota-= npackets;        if (! Priv->rx_queue) {netif_rx_complete (dev);        Xx_rx_ints (Dev, 1);    return 0; }/* We couldn ' t process everything. */return 1;}

The NAPI driver provides its own poll function and private queue.
Either non-Napi or NAPI, their poll function will eventually call NETIF_RECEIVE_SKB (SKB) to process the received frame. The function would like to send a SKB to each registered protocol routine, after which the data goes into the Linux kernel stack processing.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.