Interface Layer message input

Source: Internet
Author: User
Tags htons

When a packet arrives at a network device, hardware interruption is usually triggered. When the system does not support Soft Interrupt, the data packet input process can only be completely processed in the hardware interrupt process. In this case, although data packet input can be completed, too many CPU resources are occupied by hardware interrupt processing, resulting in insufficient system response to other hardware.

In some cases (some embedded devices), when a data packet arrives at a network device, it does not trigger hardware interruption. In this case, you can only use a timer to poll the status of the network device, when a packet arrives, the packet is read from the network device and input to the protocol stack. In this case, the input of data packets must be completely dependent on the timer-triggered frequency in a timely manner. If the frequency is too high, too many CPU resources may be consumed. When the frequency is too low, the data packet throughput will be too low.

Another method can greatly improve the network processing speed, provided that hardware interruption and soft interruption are supported. When a packet arrives at a network device, a hardware interruption is triggered. the device is added to the polling queue, hardware interruption is disabled, and soft interruption is activated. In a Soft Interrupt, read data packets from network devices in the polling queue.

Interface Layer IOCTL

The application operates on the ioctl of the set interface. If it is related to the interface layer, dev_ioctl () and inet_ioctl () are processed.

When obtaining or setting the siocxifxxx command, the ifreq structure is used to pass the corresponding value, or the ifreq structure is used as a part of the interface for passing.

/* * Interface request structure used for socket * ioctl's.  All interface ioctl's must have parameter * definitions which begin with ifr_name.  The * remainder may be interface specific. */struct ifreq {#define IFHWADDRLEN6union{charifrn_name[IFNAMSIZ];/* if name, e.g. "en0" */} ifr_ifrn;union {structsockaddr ifru_addr;structsockaddr ifru_dstaddr;structsockaddr ifru_broadaddr;structsockaddr ifru_netmask;struct  sockaddr ifru_hwaddr;shortifru_flags;intifru_ivalue;intifru_mtu;struct  ifmap ifru_map;charifru_slave[IFNAMSIZ];/* Just fits the size */charifru_newname[IFNAMSIZ];void __user *ifru_data;structif_settings ifru_settings;} ifr_ifru;};


static int __init net_dev_init(void){int i, rc = -ENOMEM;BUG_ON(!dev_boot_phase);if (dev_proc_init())goto out;if (netdev_kobject_init())goto out;INIT_LIST_HEAD(&ptype_all);for (i = 0; i < PTYPE_HASH_SIZE; i++)INIT_LIST_HEAD(&ptype_base[i]);if (register_pernet_subsys(&netdev_net_ops))goto out;/* *Initialise the packet receive queues. */for_each_possible_cpu(i) {struct softnet_data *queue;queue = &per_cpu(softnet_data, i);skb_queue_head_init(&queue->input_pkt_queue);queue->completion_queue = NULL;INIT_LIST_HEAD(&queue->poll_list);queue->backlog.poll = process_backlog;queue->backlog.weight = weight_p;queue->backlog.gro_list = NULL;queue->backlog.gro_count = 0;}dev_boot_phase = 0;/* The loopback device is special if any other network devices * is present in a network namespace the loopback device must * be present. Since we now dynamically allocate and free the * loopback device ensure this invariant is maintained by * keeping the loopback device as the first device on the * list of network devices.  Ensuring the loopback devices * is the first device that appears and the last network device * that disappears. */if (register_pernet_device(&loopback_net_ops))goto out;if (register_pernet_device(&default_device_ops))goto out;open_softirq(NET_TX_SOFTIRQ, net_tx_action);open_softirq(NET_RX_SOFTIRQ, net_rx_action);hotcpu_notifier(dev_cpu_callback, 0);dst_init();dev_mcast_init();rc = 0;out:return rc;}
When the system starts, the initialization priority of net_dev_init () is subsys_initcall, which is used to initialize the relevant interface layer, such as registering the proc file that records relevant statistics and initializing softnet_data for each CPU, register the Soft Interrupt and processing functions of network packet input/output, and register the callback function that responds to CPU status changes.

Softnet_data Structure

The softnet_data Structure describes the packet input and output queues related to the handling of soft network interruptions. Each CPU has a separate softnet_data instance. Therefore, you do not need to lock members in this structure. This structure serves as a link between the interface layer and the network layer.

/* * Incoming packets are placed on per-cpu queues so that * no locking is needed. */struct softnet_data{struct Qdisc*output_queue;struct sk_buff_headinput_pkt_queue;struct list_headpoll_list;struct sk_buff*completion_queue;struct napi_structbacklog;};

The network device queue that outputs data packets in a soft interrupt. A network device in the packet output status is added to the queue. During the packet output Soft Interrupt, the network device traverses the queue and obtains and outputs data packets from the queuing rules of the network device.


Non-napi interface layer cache queue. For non-napi drivers, after reading packets through hard interrupt or polling, call netif_rx () to transmit the received packets to the upper layer, that is, the packet is first cached in the input_pkt_queue queue, and then a Soft Interrupt of the packet input is generated. The Soft Interrupt Routine transmits the packet to the upper layer.


Network Device polling queue. A network device in the packet receiving status is linked to this queue. When the packet input is soft interrupted, it traverses the queue and receives packets through polling.

Napi Method

Napi is a mixture of interrupt mechanism and polling mechanism, which can effectively improve network processing speed. When the network load is heavy, napi technology can significantly reduce the number of hardware interruptions caused by the receipt of data packets, which is very effective for handling high-speed and Short-length data packets.

Napi implementation method: when the first packet in a batch of data packets reaches the network device, the system will be notified by hard interruption. In the hard Interrupt Routine, the system adds the device to the device polling queue of the CPU, disables the interrupt, and activates the Soft Interrupt of data packets. The Soft Interrupt Routine traverses the network devices in the polling queue and reads data packets from it. In this way, napi does not need to execute interrupt routines when new packets are received by the kernel from network devices. Instead, it only needs to maintain the polling queue of network devices, to read New packets.

Network Device interruption routine

Use e100_intr () as the interrupt processing routine of the E100 network device driver. When a network packet arrives at the network device, the network device will trigger the interruption, and then the e100_intr () process it.

static irqreturn_t e100_intr(int irq, void *dev_id){struct net_device *netdev = dev_id;struct nic *nic = netdev_priv(netdev);u8 stat_ack = ioread8(&nic->csr->scb.stat_ack);DPRINTK(INTR, DEBUG, "stat_ack = 0x%02X\n", stat_ack);if (stat_ack == stat_ack_not_ours ||/* Not our interrupt */   stat_ack == stat_ack_not_present)/* Hardware is ejected */return IRQ_NONE;/* Ack interrupt(s) */iowrite8(stat_ack, &nic->csr->scb.stat_ack);/* We hit Receive No Resource (RNR); restart RU after cleaning */if (stat_ack & stat_ack_rnr)nic->ru_running = RU_SUSPENDED;if (likely(napi_schedule_prep(&nic->napi))) {e100_disable_irq(nic);__napi_schedule(&nic->napi);}return IRQ_HANDLED;}

Soft network interruption

Net_rx_action () is a Soft Interrupt Processing routine for network input. When a network device has a packet input, the network device drivers of non-napi and napi usually activate the Soft Interrupt of network input for processing, to improve the system performance.

Round Robin

E100_poll () is the polling processing function of the E100 network device driver. in the Soft Interrupt Processing of network input, this function is called through the function pointer.

static int e100_poll(struct napi_struct *napi, int budget){struct nic *nic = container_of(napi, struct nic, napi);unsigned int work_done = 0;e100_rx_clean(nic, &work_done, budget);e100_tx_clean(nic);/* If budget not fully consumed, exit the polling mode */if (work_done < budget) {napi_complete(napi);e100_enable_irq(nic);}return work_done;}
E100_rx_clean () reads received packets from network devices, and is input to the upper-layer protocol by netif_receive_skb (). work_done is the number of read packets. If the budget is not completely consumed, it indicates that the device message has been fully read. You need to delete the network device from the network device polling queue, exit the polling mode, and interrupt the device.

Non-napi netif_rx ()

Netif_rx () adds packets received from network devices to the interface layer cache queue for upper-layer protocol processing. Generally, the queue insertion process is successful. The queue can effectively prevent discarding received packets due to upper-layer reception congestion. Network Device Drivers implemented using napi do not call this interface to receive packets.

DEFINE_PER_CPU(struct netif_rx_stats, netdev_rx_stat) = { 0, };/** *netif_rx-post buffer to the network code *@skb: buffer to post * *This function receives a packet from a device driver and queues it for *the upper (protocol) levels to process.  It always succeeds. The buffer *may be dropped during processing for congestion control or by the *protocol layers. * *return values: *NET_RX_SUCCESS(no congestion) *NET_RX_DROP     (packet was dropped) * */int netif_rx(struct sk_buff *skb){struct softnet_data *queue;unsigned long flags;/* if netpoll wants it, pretend we never saw it */if (netpoll_rx(skb))return NET_RX_DROP;if (!skb->tstamp.tv64)net_timestamp(skb);/* * The code is rearranged so that the path is the most * short when CPU is congested, but is still operating. */local_irq_save(flags);queue = &__get_cpu_var(softnet_data);__get_cpu_var(netdev_rx_stat).total++;if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {if (queue->input_pkt_queue.qlen) {enqueue:__skb_queue_tail(&queue->input_pkt_queue, skb);local_irq_restore(flags);return NET_RX_SUCCESS;}napi_schedule(&queue->backlog);goto enqueue;}__get_cpu_var(netdev_rx_stat).dropped++;local_irq_restore(flags);kfree_skb(skb);return NET_RX_DROP;}

Process_backlog ()

Process_backlog () is a polling function of a virtual network device in non-napi mode. After the backlog of the virtual network device is added to the polling queue of the network device, process_backlog () is called during soft interruption of data packet input for data packet input.

static int process_backlog(struct napi_struct *napi, int quota){int work = 0;struct softnet_data *queue = &__get_cpu_var(softnet_data);unsigned long start_time = jiffies;napi->weight = weight_p;do {struct sk_buff *skb;local_irq_disable();skb = __skb_dequeue(&queue->input_pkt_queue);if (!skb) {__napi_complete(napi);local_irq_enable();break;}local_irq_enable();netif_receive_skb(skb);} while (++work < quota && jiffies == start_time);return work;}

Processing of input packets at the Interface Layer

Message receiving routine

Packet_type is structured as a network-layer input interface. The system supports multiple protocol families. Therefore, each protocol family implements a packet routine. This structure serves as a bridge between the link layer and the network layer. On Ethernet, when an Ethernet frame arrives at the host, the network-layer receiving processing function is called Based on the packet type of the protocol family.

struct packet_type {__be16type;/* This is really htons(ether_type). */struct net_device*dev;/* NULL is wildcarded here     */int(*func) (struct sk_buff *, struct net_device *, struct packet_type *, struct net_device *);struct sk_buff*(*gso_segment)(struct sk_buff *skb,int features);int(*gso_send_check)(struct sk_buff *skb);struct sk_buff**(*gro_receive)(struct sk_buff **head,       struct sk_buff *skb);int(*gro_complete)(struct sk_buff *skb);void*af_packet_priv;struct list_headlist;};

The packet_type structure instance of IPv4 is defined as ip_packet_type. ip_rcv () is the receiving and processing function of IP datagram.

static struct packet_type ip_packet_type __read_mostly = {.type = cpu_to_be16(ETH_P_IP),.func = ip_rcv,.gso_send_check = inet_gso_send_check,.gso_segment = inet_gso_segment,.gro_receive = inet_gro_receive,.gro_complete = inet_gro_complete,};

Many packet_type instances are stored in ptype_base in the form of a scattered list, and the packet_type instance is added to ptype_base through dev_remove_pack () and the specified packet_type instance is deleted from ptype_base through dev_pack. However, for the pf_packet protocol family, the packet_type instance whose type is eth_p_all is not registered in the ptype_base hash list, but registered in the ptype_all linked list.

Netif_receive_skb ()

Netif_receive_skb () implements the packet input to the upper-layer protocol. First, traverse the ptype_all linked list, input a packet to the input interface of the ptype_all linked list, and then forward the packet through the bridge. If the forwarding succeeds, you do not need to enter it locally. Otherwise, you can traverse the ptype_base hash list and call the corresponding message receiving routine Based on the transport layer protocol type of the received message.

Dev_queue_xmit_nit ()

For the original interface that has been created through socket (af_packet, sock_raw, htons (eth_p_all), not only can receive external input packets, but also for locally output packets, if the conditions are met, can also receive.

Dev_queue_xmit_nit () is used to receive data packets output locally. This function is called during the output at the link layer to input data packets that meet the conditions to the raw interface.

Responding to changes in CPU status

Each CPU has its own softnet_data. In this case, the CPU can process the output queue and input queue in softnet_data. When the CPU status changes, there is a special state that requires special processing, that is, cpu_dead. At this time, the CPU is no longer working, therefore, packets in the softnet_data input/output queue of the CPU must be transferred to other CPUs for processing. In order to change the CPU status, the initialization function at the interface layer registers the callback function dev_cpu_callback () to respond to CPU status changes through hotcpu_notifier ()

static int dev_cpu_callback(struct notifier_block *nfb,    unsigned long action,    void *ocpu){struct sk_buff **list_skb;struct Qdisc **list_net;struct sk_buff *skb;unsigned int cpu, oldcpu = (unsigned long)ocpu;struct softnet_data *sd, *oldsd;if (action != CPU_DEAD && action != CPU_DEAD_FROZEN)return NOTIFY_OK;local_irq_disable();cpu = smp_processor_id();sd = &per_cpu(softnet_data, cpu);oldsd = &per_cpu(softnet_data, oldcpu);/* Find end of our completion_queue. */list_skb = &sd->completion_queue;while (*list_skb)list_skb = &(*list_skb)->next;/* Append completion queue from offline CPU. */*list_skb = oldsd->completion_queue;oldsd->completion_queue = NULL;/* Find end of our output_queue. */list_net = &sd->output_queue;while (*list_net)list_net = &(*list_net)->next_sched;/* Append output queue from offline CPU. */*list_net = oldsd->output_queue;oldsd->output_queue = NULL;raise_softirq_irqoff(NET_TX_SOFTIRQ);local_irq_enable();/* Process offline CPU's input_pkt_queue */while ((skb = __skb_dequeue(&oldsd->input_pkt_queue)))netif_rx(skb);return NOTIFY_OK;}

Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.