On the network device drive of Linux kernel development

Source: Internet
Author: User

Transfer from http://www.ibm.com/developerworks/cn/linux/l-cn-networkdriver/Network Equipment Introduction

Network equipment is an essential part of computer architecture, and if the processor wants to communicate with the outside world, it usually chooses the network device as the communication interface. As is known to all, in the OSI (open Systems Interconnection, open Internet interconnection), the network is divided into seven levels, from bottom to top are the physical layer, Data link layer, network layer, transport layer, Session layer, presentation layer and application layer. We speak of the network equipment also includes two levels, one layer is called MAC (Media Access Control) layer, corresponding to the OSI data link layer, the other layer is called PHY (physical layer) layer, corresponding to the physical layer.

There are many common network devices, such as ppc85xx tsec, AMCC 440GX EMAC, INTEL 82559, and so on, they work basically the same principle.

DMA Introduction

The core processing module of the network device is a controller called DMA (Direct Memory Access), which can assist the processor to process the data receiving and sending. For data transmission, it can automatically send the organized data, without processor intervention, for data reception, it can organize the received data in a certain format, notify the processor, and wait for the processor to fetch.

The DMA module is called the BD (buffer Description, cache descriptor), each packet is divided into several frames, and each frame is saved in a BD. A BD structure typically contains the following fields:

typedef struct {     void *bufptr;    /* Save the current BD corresponding cache start address  */     int length;      /* Save the packet length stored in the cache      */     int SC;           /* Save the current BD status information          */  } bd_struct;

All BD forms a BD table, as shown in 1, typically the BD tables that send direction and receive direction are independent.

Figure 1. BD table Structure

Data sending process

The flow of data sent by the network device through DMA is shown in Figure 2.

Figure 2. Data sending process

The specific meanings of the steps in the diagram are described below:

(1) The protocol layer notifies the processor to start sending data;

(2) The processor takes out a BD from the BD table, copies the data to be sent to the current BD corresponding cache, and sets the status of the BD;

(3) The processor notifies the network device to start sending data;

(4) The MAC module notifies the DMA unit to start sending data;

(5) DMA module Operation BD Table, remove the current active BD;

(6) The DMA module sends the data in the current BD corresponding cache to the MAC module;

(7) The MAC module sends this data to the network;

(8) The network device notifies the processor that the data has been sent;

(9) The processor notifies the protocol layer to send the following frame of data.

The steps (4) ~ (8) are hardware auto-completed, do not need software intervention, so it can save the workload of the processor.

Data ingestion Process

The network device is receiving data via DMA as shown in Process 3.

Figure 3. Data ingestion Process

The specific meanings of the steps in the diagram are described below:

(1) Processor initialization of BD table;

(2) The processor initializes the network device;

(3) The MAC module receives data from the network;

(4) The MAC module notifies the DMA module to fetch data;

(5) The DMA module pulls the appropriate BD from the BD table;

(6) The MAC module sends the data to the current BD corresponding cache;

(7) The network device notifies the processor to start receiving data (in the form of interrupts or polling);

(8) The protocol layer takes the data out of the current BD cache.

The steps (3) ~ (6) are hardware auto-completed, do not need software intervention, so it can save the workload of the processor.

Back to top of page

Linux Network device driver Model data structure

Data

The core structure of a network device described in the Linux kernel is called the NET_DEVICE,NET_DEVICE structure defined in the Include/linux/netdevice.h file. The fields of this structure can be divided into the following categories.

Global information

The class contains the device name (name field), Device status (State field), Device initialization function (init field), and so on.

Hardware information

This class contains device memory usage (Mem_end and Mem_start fields), interrupt number (IRQ field), IO base address (base_addr field), and so on.

Interface information

The class contains the MAC address (dev_addr field), Device properties (flag field), Maximum Transmission Unit (MTU field), and so on.

Device interface functions

This class contains all of the interface functions provided by the current device, such as the device open function (opening field), which is responsible for opening the device interface, which is called by default when the user configures the network with the ifconfig command; the device stop function (stop field), which is responsible for shutting down the device interface The data sending function (hard_start_xmit field), which is called when the user calls the socket to begin writing data, and is responsible for sending data to the network device.

Function interface

Device initialization function

Network device drivers exist as kernel modules in the Linux kernel, and corresponding to the initialization of modules, an initialization function is required to initialize the hardware registers of the network devices, configure the DMA, and initialize the relevant kernel variables. The device initialization function is called when the kernel module is loaded, and its function is as follows:

static int __init xx_init (void) {    ...}  Module_init (xx_init);   This statement indicates that the Xx_init function is called automatically when the module is loaded

The device initialization function mainly accomplishes the following functions:

1. Hardware initialization

Because the network device is mainly divided into PHY, MAC and DMA three hardware modules, the developer needs to initialize these three modules separately.

    1. Initializes the PHY module, including setting the duplex/half-duplex operation mode, device operating rate, and self-negotiation mode.
    2. Initializes the MAC module, including setting the device interface mode, and so on.
    3. Initialize the DMA module, including establishing the BD table, setting the BD properties, and assigning the cache to the BD.

2. Kernel variable Initialization

Initialize and register the kernel device. The kernel device is a variable that has a property of Net_device, and the developer needs to apply for that variable's space (via the Alloc_netdev function), set variable parameters, hook up interface functions, and register the device (via the Register_netdev function).

The common hook interface functions are as follows:

Net_device *dev_p;  Dev_p->open              = Xx_open;   Device Open function Dev_p->stop              = xx_stop;   Device Stop function dev_p->hard_start_xmit = XX_TX;     Data send function Dev_p->do_ioctl          = xx_ioctl;//other control functions ...

Data transceiver Functions

Data reception and transmission is the most important part of the network device drivers, for users, they do not need to know what network equipment used by the current system, how to send and receive network equipment, all these details for the user is blocked. Linux uses sockets as a bridge for connecting users and network devices. The user can operate the socket via the Read/write function, and then interact with the specific network device through the socket to perform the actual data sending and receiving work.

Linux provides a type of data interface called Sk_buff, the data that the user passes to the socket is first saved in the sk_buff corresponding buffer, and the SK_BUFF structure is defined in the Include/linux/skbuff.h file. The structure in which it holds the packet is shown below.

Figure 4. Sk_buff Data structure diagram

1. Data transmission Process

When the user calls the socket to start sending data, the data is stored in the Sk_buff type of cache, and the sending function of the network device (the Hard_start_xmit registered in the device initialization function) is also called, as shown in the flowchart.

Figure 5. Data transmission flowchart

    1. The user first creates a socket and then calls write functions such as write to access the network device through the socket, while saving the data in a buffer of type Sk_buff.
    2. The socket interface calls the network device send function (hard_start_xmit), Hard_start_xmit has been hooked up to a specific send function similar to XX_TX during initialization, XX_TX mainly implements the following steps.
      1. Remove an idle BD from the Send BD table.
      2. Modify the BD properties according to the data saved in Sk_buff, one is the data length and the other is the packet cache pointer. It is important to note that the packet cache pointer must correspond to a physical address, because DMA can only identify the physical address that stores the data cache when it obtains the corresponding data in the BD.
        Bd_p->length = skb_p->len;  Bd_p->bufptr = Virt_to_phys (Skb_p->data);
      3. Modify the status of the BD to be ready, and the DMA module will automatically send the corresponding data in the ready-to-state Bd.
      4. Move the pointer that sends the BD table toward the next BD.
    3. The DMA module starts sending data in the ready-to-use BD cache to the network and automatically resumes the BD when the send is complete.

2. Data reception Process

When the network device receives the data, the DMA module automatically saves the data and notifies the processor to fetch it, and when the processor detects that the data is received by means of interruption or polling, the data is saved to the Sk_buff buffer and read through the socket interface. The flowchart is shown below.

Figure 6. Data reception flowchart

    1. After the network device receives the data, the DMA module searches for the BD table, takes out the idle BD, and automatically saves the data to the BD's cache, modifies the BD as the ready state, and simultaneously triggers the interrupt (this step is optional).
    2. The processor can check the status of the receiving BD table by interrupting or polling, either way, and they need to implement the following steps.
      1. Remove an idle BD from the receiving BD table.
      2. If the current BD is ready, check the current BD's data status and update the data receive statistics.
      3. The data taken from the BD is stored in the Sk_buff buffer.
      4. Updates the status of BD to idle state.
      5. Move the pointer to the receiving BD table pointing to the next Bd.
    3. The user invokes read functions such as read, reads the data from the Sk_buff buffer, and releases the buffer.
Interrupts and polling

The Linux kernel has two options for receiving data, one is interrupt mode and the other is polling mode.

Interrupt mode

If you choose the interrupt mode, you must first register the interrupt class model and interrupt handler that corresponds to the interrupt before using the driver. The network device driver will attach the specific Xx_open function to the drive's open interface at initialization, the steps of the Xx_open function hook interrupt are as follows.

REQUEST_IRQ (RX_IRQ, Xx_isr_rx, ...);  REQUEST_IRQ (TX_IRQ, Xx_isr_tx, ...);

The interruption of network equipment is generally divided into two kinds, one is send interrupt, the other is receive interrupt. The kernel needs to register both of these interrupt class models separately.

    1. The job of sending interrupt handler (XX_ISR_TX) is to monitor data sending status, update data sending statistics and so on.
    2. The receiving interrupt handler (XX_ISR_RX) works mainly to receive data and pass it to the protocol layer, monitor the data receiving status, update the data receiving statistics, etc.

In the case of interrupt mode, because each packet receives an interrupt, and the processor quickly jumps into the interrupt service program to process the packet, the interrupt receive mode is high-real-time, but if the packet traffic is large, excessive interruptions can increase the load on the system.

Polling method

If you use polling, you do not need to enable the interrupt state of the network device, and you do not need to register an interrupt handler. The operating system will specifically open a task to check the BD table periodically, if the current pointer pointed to the BD is not idle, then the BD corresponding data is taken out, and restore the BD idle state.

Because of the principle of task timing check, the real-time polling method is poor, but it does not interrupt the overhead of that system context switch, so polling is more efficient when processing large traffic packets.

Back to top of page

Linux Network device driver optimization

With the continuous development of science and technology, network equipment can carry the rate of continuous improvement, the current popular network equipment can support 10mbps/100mbps/1gbps these three kinds of speed. Although the hardware performance of the network equipment is constantly improving, but the actual performance of the Linux system (the delivery rate) can really reach up to 1Gbps level? This is related to the performance of the processor, in general, we are running a system in which the transmission rate is less than 1Gbps (because we can not all the resources of the processor to contribute to the sending and receiving of the message), but we have limited conditions as far as possible to take some optimization measures to improve the performance of network equipment.

Application of the Cache

The cache is located at the top level of the storage-system pyramid (the next layer is memory), the cache is small (the first-level cache is typically dozens of KB, and the two-level cache is typically a few megabytes), but its access rate is dozens of times times the amount of memory. Therefore, if the processor accesses memory through the Cache, it will greatly increase the access rate. In the data transmission and receiving of network equipment, the proper application of Cache can optimize the performance of the drive. The following are some of the Cache optimization measures.

Set memory Properties reasonably

The memory page table has several properties, one of which is whether it is accessed through the Cache. When configuring memory for BD tables, these allocated memory properties need to support Cache access.

There are two ways to access the cache: one is writeback (write back), when the processor updates the memory data, the data is first saved in the cache, and the cache does not update the data into memory in time, but waits until the cache needs to be updated again to write the data back into memory. The other is the write-through operation (writing Through), when the processor updates the memory data, the data is first saved in the cache, and the cache is immediately updated into memory as soon as the data is coming. Obviously, the write-back operation has a higher performance than the write-through operation, and usually we set the memory Page table property to writeback mode.

Cache operation when sending and receiving data

In the case of memory support Cache and write-back mode, when the data is sent, the processor first writes the data into the cache, if the DMA module is sent directly from memory, the data will not be consistent with the cache, so in the driver, the data in the cache needs to be updated to memory, The DMA is then notified to send.

When the data is received, the DMA module receives the data in memory, and if the processor receives data from that memory, the processor takes the data from the cache, but the cache does not know that the memory has been updated, which causes the received data to be inconsistent with the actual, so in the driver, The cache needs to be refreshed before receiving the data to ensure that the cache is consistent with memory.

It should be noted that not all processors require the above operation, some processors with the DMA controller can be aware of the cache (Io-cache Coherence), they can automatically do the above cache operation, so for such a processor, the driver does not need to pay attention to the cache.

Interrupt or poll?

As mentioned earlier, network device driver support two ways to receive data, one is interrupt, the other is polling, in the case of large data traffic, you can consider polling method to achieve higher efficiency.

When using polling, there is also a problem that has to be considered, that is the choice of polling task priority, it is well known that when the task priority is high, the task will not be interrupted by other low-priority tasks, which can ensure that the processor can concentrate on the data receiving work, but if the task priority is low, Once other high-priority tasks occur, the processor pauses the current data reception to perform other tasks, which in turn affects the efficiency of the network device driver. Therefore, the driver designers need to combine the actual situation, the appropriate choice of task priority.

Device Interface mode

Sometimes we find that although the network device claims to have a 100Mbps rate, but the actual data sent and received is very slow, in this case, we first need to check the network Device interface mode is set correctly.

PHY Module Interface mode

The PHY module has two interface modes, mandatory mode (mandatory 10m/100m/1g, etc.) and self-negotiation mode. Exactly which mode to choose depends on the current PHY module connected to the peer PHY State, if the peer set is self-negotiation mode, the PHY module on the side also needs to be set to self-negotiation, so that the results can be negotiated is the current link can support the maximum rate. Conversely, if the peer is set to mandatory mode, the local side also needs to be set to mandatory, and the force rate is the same as the mandatory rate set on the peer.

MAC Module Interface mode

MAC modules also have different interface mode options for different rates (10M/100M/1G, etc.), and if the set mode does not match the rate at which the PHY module is running, it will greatly affect the speed of the data receiving and transmitting of the network device. Therefore, when initializing the MAC module, it is necessary to check the operating rate of the PHY module to select the appropriate interface mode.

The interface mode selection for each PHY/MAC module device is different, so when developing a network device driver, you need to identify the device you are using and correctly configure its interface mode when the device is initialized.

Back to top of page

Conclusion

Linux Network device driver and the specific device association is very large, so in the actual programming needs to be combined with specific equipment to write the driver code, we in the development process to pay special attention to the driver optimization, because the network device drivers will directly affect the performance of the entire system.

On the network device drive of Linux kernel development

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.