Linux network performance optimization method analysis (1)

Source: Internet
Author: User

For network behavior, you can simply divide it into three paths: 1) Sending path, 2) forwarding path, 3) receiving path, the optimization of network performance can be based on these three paths. Because packet forwarding is a concern of devices with routing functions, this article does not cover it. If you are interested, you can learn it on your own in the Linux kernel, hash-based route lookup and dynamic Trie-based route lookup algorithms are used respectively ). This article focuses on the analysis of the optimization methods on the sending and receiving paths. The NAPI is essentially the optimization on the receiving path, but because it appears earlier in the Linux kernel, it is also the basis for the subsequent optimization methods, so it is analyzed separately.

The most basic NAPI


The core of NAPI is that, in a busy network, every time a network packet arrives, it does not need to be interrupted, because a high frequency of interruption may affect the overall efficiency of the system. This is a hypothetical scenario, when we use a standard 100 M Nic, the actual receiving rate may be 80 MBits/s. When the average packet length is 1500 Bytes, the number of interruptions per second is:

80 M bits/s/(8 Bits/Byte * 1500 Byte) = 6667 interruptions/s

6667 interruptions per second, which is a huge pressure on the system. In this case, it can be switched to polling instead of being interrupted; however, polling is inefficient when the network traffic is small. Therefore, when the traffic is low, the method based on interruption is more appropriate. This is why NAPI appears, in low-traffic scenarios, data packets are received through interruption, while in high-traffic scenarios, data packets are received through polling.

Currently, all the NIC functions in the kernel support NAPI. As described above, NAPI is suitable for processing high-speed data packets. The benefits of NAPI are as follows:

  • Interrupt mitigation. As shown in the preceding example, under high traffic, the network card may Interrupt several thousand times per second. If each interruption requires the system to handle it, it is a huge pressure, while NAPI uses polling to disable Nic reception interruption, which will reduce the pressure on the system to handle the interruption.
  • Packet throttling. the Linux NIC driver before NAPI always generates an IRQ after receiving the data Packet, and then adds the skb to the local softnet In the interrupt service routine, then, the local NET_RX_SOFTIRQ Soft Interrupt is triggered for subsequent processing. If the packet speed is too high, because the priority of IRQ is higher than SoftIRQ, most resources in the system are interrupted in response, but the size of the softnet queue is limited, and the received excess packets can only be discarded, therefore, this model is useless with valuable system resources. In this case, NAPI directly discards the packet and does not continue to throw the packet to the kernel for processing. In this way, the NIC discards the packet to be discarded as early as possible, the kernel will not be visible to the data packets to be dropped, which also reduces the pressure on the kernel.

The usage of NAPI generally includes the following steps:

  1. In the interrupt processing function, the network subsystem is notified that the packet is received quickly in polling mode. The hardware function determines whether to disable the reception interruption, the kernel is told to process the package in polling mode by using the netif_rx_schedule () function. You can also use the following method. The netif_rx_schedule_prep is used to determine whether the round-robin mode is Enabled ::

    List 1. Set the NIC to the polling Mode

    Void netif_rx_schedule (struct net_device * dev); or if (netif_rx_schedule_prep (dev) _ netif_rx_schedule (dev );

  2. Create a polling function in the driver. It obtains data packets from the NIC and sends them to the network subsystem. Its prototype is:

    Listing 2. NAPI polling Method

        int (*poll)(struct net_device *dev, int *budget); 

    The Round Robin function is used to process data packets in the receiving queue using the poll () method after switching the NIC to the round robin mode. If the queue is empty, the round robin function is switched to the interrupt mode again. To switch back to the interrupt mode, you must first disable the polling mode, use the netif_rx_complete () function, and then enable the NIC to receive the interrupt ..

    Listing 3. Exit the polling Mode

             void netif_rx_complete(struct net_device *dev); 

  3. The polling function created in the driver needs to be associated with the actual network device struct net_device, which is generally completed during Nic initialization. The sample code is as follows:

    List 4. Set Nic support for polling Mode

       dev->poll = my_poll;    dev->weight = 64; 

    The other field in it is the weight (weight). This value does not have a very strict requirement. It is actually an empirical data, generally 10 Mb Nic, we set it to 16, for faster NICs, we set it to 64.

NAPI Interfaces

The following are some interfaces of NAPI functions, which are basically involved in the previous sections. Let's take a look:

Netif_rx_schedule (dev)

Called in the NIC interrupt processing function to switch the NIC reception mode to polling

Netif_rx_schedule_prep (dev)

When the network adapter is Up and running, set this network adapter as preparing to add it to the polling list. You can view this function as the First Half of netif_rx_schedule (dev ).

_ Netif_rx_schedule (dev)

Add the device to the polling list, provided that the netif_schedule_prep (dev) function has returned 1

_ Netif_rx_schedule_prep (dev)

Similar to netif_rx_schedule_prep (dev), but it does not determine whether the NIC device is Up or running.

Netif_rx_complete (dev)

This function is used to remove the NIC interface from the polling list. It is generally called after the polling function is completed.

_ Netif_rx_complete (dev)

Newer newer NAPI

In fact, the naming of NAPI (New API) is a bit cool. We can see that Linux kernel geeks have much less control over names than code, two consecutive reconstruction of NAPI is called Newer newer NAPI.

Similar to netif_rx_complete (dev), but you must ensure that the local interrupt is disabled.

Newer newer NAPI

In the NAPI initially implemented, there are two fields in the structIn net_device, poll () and weight are respectively the polling function, while the so-called Newer newer NAPI is reconstructed several times after the kernel of version 2.6.24, the core of NAPI is to separate NAPI-related functions from net_device, which reduces coupling and makes code more flexible, because NAPI-related information has been stripped from a specific network device, it is no longer the previous one-to-one relationship. For example, some network adapters may provide multiple ports, but all ports share the interruption of the same accept packet. At this time, only one copy of the separated NAPI information is saved, shared by all ports at the same time, so that the code framework can better adapt to real hardware capabilities. The central structure of Newer newer NAPI isNapi_struct:

Listing 5. NAPI struct

 /*  * Structure for NAPI scheduling similar to tasklet but with weighting  */  struct napi_struct {  /* The poll_list must only be managed by the entity which  * changes the state of the NAPI_STATE_SCHED bit.  This means  * whoever atomically sets that bit can add this napi_struct  * to the per-cpu poll_list, and whoever clears that bit  * can remove from the list right before clearing the bit.  */  struct list_head  poll_list;  unsigned long  state;  int  weight;  int  (*poll)(struct napi_struct *, int);  #ifdef CONFIG_NETPOLL  spinlock_t  poll_lock;  int  poll_owner;  #endif  unsigned int  gro_count;  struct net_device  *dev;  struct list_head  dev_list;  struct sk_buff  *gro_list;  struct sk_buff  *skb;  }; 

If you are familiar with the implementation of the old NAPI interface, there is nothing to say about the fields poll_list, state, weight, poll, dev. gro_count and gro_list will be described later in GRO. Note that the biggest difference from the previous NAPI implementation is that the struct is no longer part of net_device. In fact, we want the NIC Driver to allocate and manage napi instances independently, usually it is placed in the private information of the NIC driver, the main advantage is that if the driver is willing, you can create multiple napi_struct, as more and more hardware has begun to support multiple receive queues, the implementation of multiple napi_struct makes the use of multiple queues more effective.

Compared with the original NAPI, the registration of the polling function has some changes. The new interface is:

 void netif_napi_add(struct net_device *dev, struct napi_struct *napi,     int (*poll)(struct napi_struct *, int), int weight) 

If you are familiar with the old NAPI interface, there is nothing to say about this function.

It is worth noting that the previous poll () method prototype also needs some small changes:

    int (*poll)(struct napi_struct *napi, int budget); 

Most NAPI-related functions also need to change the previous prototype. below is the API for enabling the polling function:

    void netif_rx_schedule(struct net_device *dev,                            struct napi_struct *napi);     /* ...or... */     int netif_rx_schedule_prep(struct net_device *dev,        struct napi_struct *napi);     void __netif_rx_schedule(struct net_device *dev,             struct napi_struct *napi); 

To disable the round robin function, you must:

    void netif_rx_complete(struct net_device *dev,    struct napi_struct *napi); 

Because multiple napi_struct instances may exist and each instance is required to be able to enable or disable independently, the driver must ensure that all napi_struct instances are disabled when the NIC interface is disabled.

Functions netif_poll_enable () and netif_poll_disable () are no longer needed, because polling management is no longer directly managed with net_device, instead of the following two functions:

    void napi_enable(struct napi *napi);     void napi_disable(struct napi *napi); 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.