Application of NAPI technology in Linux network drive (3)

Last Update:2017-06-25 Source: Internet

Author: User

Tags comparison table

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Article title: Application of NAPI technology in Linux network drivers (III ). Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.
　　 Can it make receiving faster?
Now let's take a look at how to improve NAPI efficiency. before talking about efficiency, let's take a look at a model provided in NAPI_HOWTO.txt in the linux documentation to construct your own nic poll method, however, there are some differences with 8139. one dirty_rx field in the NIC device description is not used in 8139CP.
　　
Dirty_rx is the buffer that has opened the sk_buff buffer pointer and the rx_ring that has been submitted to the NIC to participate in the receiving. However, the buffer that has not completed the transmission and the total number of buffer zones that have completed the transmission are not completed yet, similar to this, cur_rx indicates the next buffer pointer for transmission. we can see some specific usage of this field in the example of NAPI_HOWTO.txt:
　　
/* Cur_rx is the next buffer pointer to be transmitted. if the cur_rx pointer is greater than dirty_rx, each Transport buffer in the rx-ring that has been opened in the rx-ring is exhausted, in this case, you need to call refill_rx_ring to open up a new buffer for some rx-ring receiving units that have submitted data to the network layer, increase the dirty_rx value, and prepare for the next data reception ,*/
If (tp-> cur_rx-tp-> dirty_rx> RX_RING_SIZE/2 |
Tp-> rx_buffers [tp-> dirty_rx % RX_RING_SIZE]. skb = NULL)
Refill_rx_ring (dev );
　　
/* Restart the clock count in the worker. in this way, the NIC interrupt processing is not enabled )*/
If (tp-> rx_buffers [tp-> dirty_rx % RX_RING_SIZE]. skb = NULL)
Restart_timer ();
/* If it is executed here, it indicates that there are several situations that may occur. The difference between the first current cur_rx and dirty_rx does not exceed half of the total rx_ring receiving unit. after calling refill_rx_ring, dirty_rx does not increase, (maybe a large number of units in the rx-ring receive data without being processed by the network-layer function). as a result, dirty_rx does not increase, and there are no idle units to receive new data, in this case, we need to call netif_rx_schedule again to wake up the soft interrupt, call the POLL method of the device, and collect the data in rx-ring. */
Else netif_rx_schedule (dev);/* we are back on the poll list */
　　
The dirty_rx field is used in the driver of the RTL-8169, but it is not used in 8139CP, in fact, this is not the 8139CP driver immature performance, you can read NAPI_HOWTO.txt can know, now 8139CP is not strictly in accordance with the requirements of NAPI to do, if you are interested, you can compare the 8139CP and RTL-8169 between the two drivers, although neither of them completes data forwarding from the driver layer to the network layer in the NIC interrupt processing, they are all completed in the soft interrupt, however, in 8139, some of its unique hardware features were used to enable the NIC to use the RxOK to notify the arrival event while receiving data with shutdown interruption, then, the POLL method is used to forward data directly from the NIC to the upper layer. RTL8169 also needs to use input_pkt_queue (socket buffer (sk_buff) input queue in the softnet_data structure ). It is used to schedule sk_buff data from NIC interruption to soft interruption; in this case, the biggest benefit for the 8139CP is that the dirty_rx field and the cur_rx field are not used to let the POLL method and NIC interrupt know the current transmission unit status, also, you do not need to call refill_rx_ring regularly from time to refresh the rx-ring to obtain idle transmission units. when it comes to disadvantages, I think a POLL method is rewritten, you cannot borrow/net/core/dev. process_backlog in c is used as its own POLL method, but this cost is worth it.
　　
Having said so much, it seems that it has nothing to do with improving efficiency. In fact, the opposite is true. through understanding the meaning of some fields in softnet_data, we should be clearer. as described below, the method to improve efficiency is to borrow some NAPI _ HOWTO.txt methods on the basis of 8139CP. from the actual usage results, in some application scenarios, the 8139CP of Linux is indeed improved, we first look at the use of 8139CP in the Linux 2.6.6 kernel on the x86 (PIII-900Mhz) platform data packet receiving and processing situation: the comparison table is as follows:
　　
Psize Ipps Tput Rxint Done
----------------------------------------------------
60 490000 254560 21 10
128 358750 259946 27 11
256 334454 450034 34 18
512 234550 556670 201239 193455
1024 119061 995645 884526 882300
1440 74568 995645 995645 987154
　　
As shown in the preceding table:
"Pszie" indicates the package size.
Number of packets that the system can receive per second
"Tput" total number of packets with more than 1 MB per POLL
"Rxint" receiving interrupt count
The number of POLL times required for "Done" to load the data in the rx-ring. this value also indicates the number of times we need to clear the rx-ring.
　　
From the table above, we can see that in 8139CP, when the receiving rate reaches 490 K packets/s, only 21 interruptions are generated, and only 10 POLL is required to receive data from rx_ring, however, when large data packets are at a low rate, the reception interruption will increase sharply until each data packet needs a POLL method for processing, the final result is that every interruption requires a POLL method, which leads to a sharp drop in efficiency, which greatly reduces the system efficiency, therefore, NAPI is suitable for a large number of data packets and is as small as possible. However, for large data packets, and with a low speed, it will lead to a decrease in the system speed.
　　
To improve this situation, we can consider using the following methods. we have achieved good results through a series of tests on MIPS, Xsacle, and SA1100 platforms:
　　
1. completely cancel NIC interruption. use RXOK to control the reception interruption.
2. the timer is used to interrupt the control handle of timer_list, and an appropriate interval period is set based on the hardware platform (the interval period varies depending on the platform). the rx-ring is directly POLL, we directly use the interrupt vector 0 -- irq0 on MIPS and Xscale as the top-half round robin for rx-ring (note that the HZ value we selected on the above two platforms is 1000, generally, this value is 100, and the log program of Wall-time is re-compiled, so that the log of Wall-Time is still separated by 10 MS ), of course, you can also choose the appropriate timing time based on your own platform and application status.
3. with the help of the input_pkt_queue queue in softnet_data, after the POLL method is completed in the bottom-half for clock interruption, the data is not directly transmitted to the network layer for processing, but sk_buff is hung on the input_pkt_queue queue, the wake-up soft interrupt is post-processing. of course, it can be imagined that this requires a certain amount of memory cost, and the real-time performance will be worse.
4. the dirty_rx field and the refill_rx_ring function are used to create a new buffer for some rx-ring units when the network layer program is relatively idle after the POLL method is called, this saves time when new data packets are reached, and the operating system does not need to open up new spaces to cope with new data.
5. note: our upper-layer applications primarily distribute network data, and there are no complex applications with many background processes at the application layer, what we have done above between and is to improve network data processing at the expense of overall system efficiency.
　　
Let's look at the improved 8139CP driver using 8139CP on x86 (PIII-900Mhz) platforms:
　　
Psize Ipps Tput Rxint Done
----------------------------------------------------
60 553500 354560 17 7
128 453000 350400 19 10
256 390050 324500 28 13
512 305600 456670 203 455
1024 123440 340020 772951 123005
1440 64568 344567 822394 130000
　　
From the point of view, the efficiency and volatility of data transmission have been significantly improved, and the gap between the number of POLL required for high-speed and low-rate data is as significant as the previous 8139CP driver, in this way, the maximum packet reception capability is increased to 553 K/s. we can increase the maximum packet receiving capability by about 15%-25% on the MIPS series and Xscale series platforms.
　　
Finally, using NAPI is not the only way to improve network efficiency. it can only be regarded as an equity strategy. the fundamental solution is that upper-layer applications can exclusively occupy network devices or provide a large amount of buffer resources, in this case, our experimental data shows that the receiving efficiency can be improved by 100%-150% or more.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More