Network Device Driver for Linux kernel development

Source: Internet
Author: User

From: http://www.ibm.com/developerworks/cn/linux/l-cn-networkdriver/

Network Device Introduction

Network
A device is an essential part of the computer architecture. If a processor wants to communicate with the outside world, it usually chooses a network device as the communication interface. As we all know
Systems
Interconnection, open Internet interconnection), the Network is divided into seven layers, from bottom to top are the physical layer, data link layer, network layer, transport layer, Session Layer, presentation layer and application
Layer. The network device we are talking about also includes two layers: one layer is called the MAC (Media Access Control) layer, which corresponds to the data link layer of OSI; the other layer is called
The physical layer corresponds to the physical layer.

There are many common network devices, such as ppc85xx tsec, AMCC javasgx EMAC, and Intel 82559. They work in the same way.

DMA Introduction

Network
The core processing module of the network device is a controller called direct memory access.
The module can assist the processor in data sending and receiving. For data sending, It can automatically send organized data without processor intervention. For data receiving, it can send the received data in a certain format.
Organize, notify the processor, and wait for the processor to fetch it.

The unit used by the DMA module to send and receive data is called BD (buffer description, cache descriptor). Each packet is divided into several frames, and each frame is saved in one BD. The BD structure usually contains the following fields:

Typedef struct {
Void * bufptr;/* Save the start address of the cache corresponding to the current BD */
Int length;/* Save the length of data packets stored in the cache */
Int SC;/* Save the status information of the current BD */
} Bd_struct;

 

All BD forms a BD table, as shown in 1. Generally, the sender and receiver BD tables are independent of each other.


Figure 1. BD table structure


Data sending Process

Figure 2 shows the process for network devices to send data through DMA.
.


Figure 2. Data transmission process


The meanings of each step are described as follows:

(1) The protocol layer notifies the processor to start sending data;

(2) The processor extracts a BD from the BD table, copies the data to be sent to the cache corresponding to the current BD, and sets the BD status;

(3) The processor notifies the network device to start sending data;

(4) The Mac module notifies the DMA unit to start sending data;

(5) The DMA module operates the BD table to retrieve the currently valid BD;

(6) The DMA module sends the data in the cache corresponding to the current BD to the Mac module;

(7) The Mac module sends the data to the network;

(8) the network device notifies the processor that data has been sent;

(9) The processor notifies the protocol layer to send the following frame of data.

Step (4 )~ (8) The hardware is automatically completed without software intervention, which can save the workload of the processor.

Data receiving process

The network device receives data through DMA.


Figure 3. Data receiving process


The meanings of each step are described as follows:

(1) initialize the BD table for the processor;

(2) initialize the network device of the processor;

(3) The Mac module receives data from the network;

(4) The Mac module notifies the DMA module to fetch data;

(5) The DMA module extracts the appropriate BD from the BD table;

(6) The Mac module sends data to the cache corresponding to the current BD;

(7) the network device notifies the processor to start receiving data (in interrupted or polling mode );

(8) The protocol layer removes data from the current BD cache.

Step (3 )~ (6) The hardware is automatically completed without software intervention, which can save the workload of the processor.

Back to Top

Linux Network Device Driver Model

Data Structure

Data Structure

The core structure type of network devices described in the Linux kernel is net_device. The net_device structure is defined in the include/Linux/netdevice. h file. Fields of this structure can be divided into the following categories.

Global Information

This class contains the device name (Name field), device status (State field), and device initialization function (init field.

Hardware information

This class contains the memory usage (mem_end and mem_start fields), interrupt number (IRQ field), and I/O base address (base_addr field) of the device.

Interface Information

This class includes the MAC address (dev_addr field), device attribute (flag field), and maximum transmission unit (MTU field.

Device Interface Function

The
Class contains all the interface functions provided by the current device. For example, if the device opens the function (open field), this function is used to open the device interface. When you use ifconfig
This function is called by default when the command is used to configure the network. The device stop function (stop field) is used to disable the device interface. The data sending function (hard_start_xmit)
When a user calls a socket to start writing data, the function is called and is responsible for sending data to the network device.

Function Interface

Device initialization Function

Network device drivers exist in the Linux kernel in the form of kernel modules, which correspond to the initialization of modules, an initialization function is required to initialize the hardware registers of network devices, configure DMA, and initialize related kernel variables. The device initialization function is called when the kernel module is loaded. Its function form is as follows:

Static int _ init xx_init (void ){
......
}
Module_init (xx_init); // This statement indicates that the xx_init function is automatically called when the module is loaded.

 

The device initialization function provides the following functions:

1. hardware initialization

Because network devices are mainly divided into three hardware modules: phy, Mac, and DMA, developers need to initialize these three modules respectively.

  1. Initialize the PHY module, including setting the duplex/Half Duplex running mode, device running rate, and self-negotiation mode.
  2. Initializes the Mac module, including setting the device interface mode.
  3. Initialize the DMA module, including creating BD tables, setting BD attributes, and allocating cache to BD.

2. kernel variable Initialization

Initialize and register the kernel device. The kernel device is a variable with the attribute "net_device". developers need to apply for the space corresponding to the variable (through the alloc_netdev function), set the variable parameters, hook interface functions, and register the device (through the register_netdev function ).

Common hook interface functions are as follows:

Net_device * dev_p;
Dev_p-> open = xx_open; // device OPEN function
Dev_p-> stop = xx_stop; // device stop Function
Dev_p-> hard_start_xmit = xx_tx; // data sending Function
Dev_p-> do_ioctl = xx_ioctl; // other control functions
......

 

Data Receiving and receiving Functions

Quantity
The receiving and sending of data is the most important part of network device drivers. for users, they do not need to know what network devices are used by the current system, and how to send and receive data from network devices, all of these details are for users
Is blocked. Linux uses Socket as a bridge between users and network devices. You can operate functions such as read/write.
And then use the socket to interact with a specific network device to send and receive data.

Linux provides a data interface called sk_buff. data transmitted to the socket is first stored in the buffer corresponding to sk_buff. The structure of sk_buff is defined in include/Linux/skbuff. h file. It stores the following data packet structure.


Figure 4. sk_buff Data Structure


1. Data transmission process

When a user calls a socket to start sending data, the data is stored in the sk_buff type cache, And the sending function of the network device (hard_start_xmit registered in the device initialization function) is also called, the flowchart is as follows.


Figure 5. Data Transmission Flowchart


  1.  
    1. The user first creates a socket, then calls write functions such as write to access the network device through the socket, and saves the data in the buffer of the sk_buff type.
    2. The socket interface calls the network device sending function (hard_start_xmit). hard_start_xmit has been attached to a specific sending function similar to xx_tx during initialization. The xx_tx mainly implements the following steps.
      1. Extracts an idle BD from the send BD table.
      2. Modify the attributes of BD based on the data stored in sk_buff. One is the data length, and the other is the data packet cache pointer. It is worth noting that the data packet cache pointer must correspond to a physical address, because DMA can only identify the physical address that stores the data cache when obtaining the data corresponding to BD.

         bd_p->length = skb_p->len; 
        bd_p->bufptr = virt_to_phys(skb_p->data);
      3. Modify the status of the BD to the ready state. The DMA module automatically sends the data in the ready state BD.
      4. Move the pointer of the send BD table to the next BD.
    3. The DMA module starts to send data in the ready state BD cache to the network. After sending the data, the system automatically restores the BD to the idle state.

2. Data receiving process

When the network device receives the data, the DMA module automatically saves the data and notifies the processor to fetch the data. After the processor finds that data is received through interruption or polling, save the data to the sk_buff buffer and read it through the socket interface. The flowchart is as follows.


Figure 6. Data receiving Flowchart


  1.  
    1. After the network device receives the data, the DMA module searches for the received BD table, extracts the idle BD table, automatically saves the data to the cache of the BD, and changes the BD to the ready state, and trigger the interrupt at the same time (this step is optional ).
    2. The processor can check the status of the received BD table by means of interruption or polling. Either method requires the following steps.
      1. Retrieve an idle BD from the receiving BD table.
      2. If the current BD status is ready, check the data status of the current BD and update the data receipt statistics.
      3. Extract data from BD and save it in the buffer zone of sk_buff.
      4. Update the status of BD to idle.
      5. Move the pointer of the BD table to the next BD.
    3. You can call read functions such as read to read data from the sk_buff buffer and release the buffer.

Interruption and polling

The Linux Kernel provides two methods to choose from when receiving data. One is the interrupt mode and the other is the polling mode.

Interrupt mode

If you select the interrupt mode, you must register the interrupt type number and interrupt handler corresponding to the interrupt before using the driver. When the network device driver is initialized, the specific xx_open function is attached to the open interface of the driver. The steps for stopping the xx_open function are as follows.

 request_irq(rx_irq, xx_isr_rx, …… ); 
request_irq(tx_irq, xx_isr_tx, …… );

 

There are two types of network device interruptions: sending interruption and receiving interruption. The kernel needs to register the two interrupt types respectively.

  1. The task of sending Interrupt Processing Program (xx_isr_tx) is to monitor the data sending status and update data sending statistics.
  2. The xx_isr_rx program receives and transmits data to the protocol layer, monitors the data receiving status, and updates data receiving statistics.

For the interrupt mode, the processor will quickly jump to the interrupt service program to process the packet because each packet receives an interrupt. Therefore, the interrupt receiving mode is highly real-time, however, if the packet traffic is high, excessive interruptions will increase the system load.

Round Robin

If the polling method is used, you do not need to enable the interruption status of the network device or register the interrupt handler. The operating system will enable a task to regularly check the BD table. If the current Pointer Points to the BD not idle, the data corresponding to the BD will be retrieved and the BD will be restored to its idle state.

Because the principle of regular task check is used, the real-time performance of the polling and receiving method is poor, but it does not interrupt the overhead of context switching, therefore, the polling method is more efficient when processing large-volume data packets.

Back to Top

Linux network device driver optimization

With
With the continuous development of science and technology, network devices can carry an increasing rate. Currently, popular network devices generally support 10 Mbps, 100 Mbps, and 1 Gbps
These three rates. Although the hardware performance of network devices is constantly improving, the actual running performance (packet sending/receiving rate) in Linux can reach up to 1 Gbps.
? This is related to the performance of the processor. Generally, the packet sending and receiving rate in the system we run is less than 1 Gbps.
(Because it is impossible for us to contribute all the resources of the processor to the sending and receiving of packets), but we can try to take some optimization measures to improve the running performance of network devices under limited conditions.

Cache application

Cache
Located at the top layer of the storage system pyramid (the lower layer is the memory), the cache capacity is not large (the first level cache is usually dozens of KB, and the second level cache is usually several
MB), but its access rate is dozens of times the memory. Therefore, if the processor uses the cache
To access the memory, it will greatly increase the access rate. In data transmission and receiving of network devices, the appropriate application cache can optimize the driving performance. The following are some Cache Optimization Measures.

Properly set memory Properties

Memory Page tables have multiple attributes, one of which is whether to access the table through cache. When configuring memory for the BD table, these allocated memory attributes must support cache access.

Cache
There are two access methods: write back. When the processor updates memory data, the data is first stored in the cache.
Data is not updated to the memory in time, but is written back to the memory only when the cache needs to be updated again. Write
Through), when the processor updates the memory data, the data is first stored in the cache.
Immediately update the data to the memory. Obviously, write-back operations have higher performance than write-through operations. Generally, we set the memory page table attribute to the write-back mode.

Cache operations during data sending and receiving

In
When the memory supports cache and the write-back method is used, when data is sent, the processor first writes the data into the cache. If the DMA
If the module directly extracts data from the memory and sends the data, the data will be inconsistent with the cache. Therefore, in the driver, you need to update the data in the cache to the memory, and then notify
DMA for sending.

When receiving data, the DMA module receives data from the memory. If the processor receives data from the memory
Cache,
It is unknown that the memory has been updated, which will cause the received data to be inconsistent with the actual data. Therefore, in the driver, you need to refresh the cache before receiving data to ensure that the cache
Consistency with memory.

It should be noted that not all processors require the above operations. Some DMA controllers in the processor can perceive the cache (I/O-Cache Coherence, they can automatically perform the above cache operations, so for such processors, the driver does not need to pay attention to the cache.

Interrupt or polling?

As mentioned earlier, network device drivers support two data receiving Methods: interruption and polling. When the data traffic is large, you can consider using the round-robin method to achieve higher efficiency.

When
When the round robin method is adopted, another issue that must be considered is the selection of the round robin task priority. As we all know, when the task priority is high, the task will not be interrupted by other low-priority tasks, thus
Ensure that the processor can focus on data receiving. However, if the task has a low priority, the processor will suspend the current data receiving and execute other tasks, so
Will affect the efficiency of network device drivers. Therefore, the driver must select the task priority based on the actual situation.

Device Interface Mode

Sometimes we will find that although the network device claims to have a rate of 100 Mbps, the actual data sending and receiving is very slow. In this case, we first need to check whether the interface mode of the network device is set correctly.

Phy Module Interface Mode

Phy
There are two interface modes for the module, namely mandatory mode (10 m/100 m/1G) and Self-negotiation mode. Which mode should be selected depends on the peer connection of the current PHY module?
If the peer is set to the self-negotiation mode, the local
The module also needs to be set to self-negotiation to ensure that the negotiation result is the maximum rate supported by the current link. If the peer mode is set to force mode, the local end must also be set to force and
The speed must be the same as the forced speed set by the peer.

Mac Module Interface Mode

Mac
The module also has different interface modes for different rates (10 m/100 m/1G ).
If the speed of the module does not match, it will greatly affect the speed of network device data transmission and receiving. Therefore, when initializing the Mac module, you must check
Module speed to select the appropriate interface mode.

The interface modes of each PHY/MAC module are different. Therefore, when developing a network device driver, you must specify the device to be used, the interface mode is correctly configured during device initialization.

Back to Top

Conclusion

Linux network device drivers are highly correlated with specific devices. Therefore, you need to write driver code based on specific devices in actual programming. We should pay special attention to driver optimization during the development process, because the drive quality of network devices directly affects the performance of the entire system.

 

References

  • Linux Device Drivers, by Jonathan Corbet, Alessandro Rubini, Greg kroah-Harman, Southeast University Press
    .
  • Reference: mpc8548e powerquicc III Integrated Processor family reference manual, Freescale
    .
  • In the developerworks Linux Zone
    Find Linux developers (including new Linux beginners)
    For more information, see our most popular articles and tutorials.
    .
  • Learn all Linux tips on developerworks
    And Linux tutorials
    .
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.