Linux NIC Driver Analysis

Source: Internet
Author: User

 

Original article address Http://www.linuxforum.net/forum/showflat.php? Cat = & board = driver & number = 635688 & page = 0 & view = collapsed & S
Learning should first simplify the problem and complicate the problem. Starting from the very beginning, we began to deal with complicated problems, which would inevitably make people feel stunned and stretched. The same is true for reading Linux Nic drivers. The long source code is mixed with unfamiliar variables and symbols, and it is a matter of course. Don't worry, there is always a solution to things. First, let's leave something out of our control. Code Cut out, leave the necessary part, and master the framework. It is the best practice of the author.
The Linux NIC driver code is usually around 3000 lines. The amount of code and the amount of knowledge it expresses are undoubtedly huge. Can we shorten the amount of code, it makes our learning easier. With the unremitting efforts of the author, we have reduced the network device to more than 600 lines while still working properly, we cut out the functions that are currently unavailable. In this way, the thing is much simpler, and there is really a framework left (to ask, please contact me through xhbbs@tom.com ). Next we will analyze this executable framework.
I will not list all the Function Code involved in the kernel used in the following analysis. However, please refer to the specific file for reference.
First, let's look at the device initialization. When we compile our Program Then, we need to load the generated target file to the kernel. We will first run ifconfig eth0 down and rmmod 8139too to uninstall the NIC Driver in use, and then insmod 8139too. O loads our driver (8139too. O is the target file generated by our compilation ). Just as the C program has the main function main (), the module also has the first function to be executed, namely module_init (rtl8139_init_module); in our program, rtl8139_init_module () run the following code after insmod:
Static int _ init rtl8139_init_module (void)
{
Return pci_module_init (& rtl8139_pci_driver );
}
It calls pci_module_init () directly. The function code is in Linux/Drivers/NET/eepro100.c and the rtl8139_pci_driver is defined in our driver code, it is the link between the driver and the PCI device. Rtl8139_pci_driver is defined as follows:
Static struct pci_driver rtl8139_pci_driver = {
Name: modname,
Id_table: rtl8139_pci_tbl,
Probe: rtl8139_init_one,
Remove: rtl8139_remove_one,
};
Pci_module_init () is not defined in the driver code. You must have thought that it is a standard interface provided by the Linux kernel to the module. What does this interface do, I have tracked this function. Pci_register_driver () is called. in Linux/Drivers/PCI. C, pci_register_driver does three things.
① The rtl8139_pci_driver parameter is registered in the kernel. There is a large linked list of PCI devices in the kernel. The PCI driver is mounted here.
② It is to view the configuration space of all PCI devices on the bus (NIC is a type of PCI device). If the identification information is the same as id_table in rtl8139_pci_driver, rtl8139_pci_tbl is defined as follows:
Static struct pci_device_id rtl8139_pci_tbl [] _ devinitdata = {
{0x10ec, 0x8129, pci_any_id, pci_any_id, 0, 0, 1 },
{Pci_any_id, 0x8139, 0x10ec, 0x8139, 0 },
{0 ,}
};
The driver is used to drive the device, so the probe function in rtl8139_pci_driver is called rtl8139_init_one, which is defined in our driver, it is used to initialize the entire device and make some preparations. Note that pci_device_id is a structure defined by the kernel to identify different PCI devices. For example, 0x10ec here represents RealTek, we scanned the configuration space of the PCI device. If we found a device manufactured by RealTek, the two would be the same. Of course, after the company number, you have to check other device numbers and other devices. If they are all correct, it means that this driver can serve this device.
③ The rtl8139_pci_driver structure is mounted on the data structure (pci_dev) of the device, indicating that the device has its own driver. The driver also finds its service object.
PCI is a bus standard, and the devices on the PCI bus are PCI devices. These devices have many types, including Nic devices. Each PCI device is abstracted as a data structure in the kernel pci_dev, it describes all the features of a PCI device. For more information, see related documents. However, there are several areas that have a very large relationship with the driver, which must be explained. PCI devices comply with the PCI standard. All PCI devices in this section are the same. Each PCI device has a register storing the configuration space. The format is the same, for example, the first register is always the manufacturer's number. For example, RealTek is the 10ec, while Intel is the other number. These are all numbers applied by merchants like standard organizations. They must be different. I can identify the manufacturer and device number by configuring the space. No matter what platform, x86, or PPC, they are all in the same standard format. Of course, the uniform format of these PCI configuration spaces is not enough. For example, humans all have noses and eyes, but not all have the same length of noses and eyes. The NIC device is a PCI device and must comply with the rules. The PCI configuration space is integrated into the device, but it is a NIC and must be integrated with a register that can control the NIC. Register access becomes a problem. In Linux, we map these registers to the primary memory virtual space. In other words, our CPU memory access command can access these control registers in the peripherals. To sum up, there are two types of PCI devices. One is the configuration space, which is the unified format of the operating system or BIOS control peripherals. CPU commands cannot be accessed, to access this space, you need to use the BIOS function. In fact, the Linux function to access the configuration space drives the BIOS to complete read/write access through CPU commands. The other type is the general control register space. After this part is mapped, the CPU can be accessed to control the operation of the device.
Now let's go back to the second step of pci_register_driver above. If we find the opposite number between the relevant device and our pci_device_id structure array, it means we have found the service object, then call rtl8139_init_one, it mainly does seven things:
① Create a net_device structure so that it represents the network device in the kernel. However, the reader may ask, pci_dev also represents this device. What is the difference between the two? As we have discussed above, Nic devices must comply with PCI specifications, it is also responsible for its role as a NIC device, so it is divided into two parts, pci_dev is used to be responsible for the PCI specification of the NIC, here, net_device is responsible for the network device of the network card.
Dev = init_etherdev (null, sizeof (* TP ));
If (Dev = NULL ){
Printk ("unable to alloc new Ethernet \ n ");
Return-enomem;
}
TP = Dev-> priv;
In Linux/Drivers/NET/net_init.c, The init_etherdev function allocates net_device memory and performs initial initialization. It is worth noting that a member of net_device, priv, represents private data of different NICs. For example, Intel nic and RealTek NIC are represented by net_device in the kernel. However, they are different. For example, Intel and RealTek use different methods to implement the same function, which are reflected by priv. So here we will compare it with net_device. When memory is allocated, all the members except priv in net_device are fixed, and the size of priv can be arbitrary. Therefore, the size of priv should be passed during allocation.
② Enable this device (in fact, the device register ing function is enabled)
Rc = pci_enable_device (pdev );
If (RC)
Goto err_out;
Pci_enable_device is also an interface developed by the kernel. The code is in drivers/PCI. in C, the author traces and finds that this function is mainly used to set the 0-bit and 1-position of the command domain in the PCI configuration space to 1, so as to enable the device, because the official Datasheet of rtl8139 shows that the two functions are to enable memory ing and I/O ing. If not, the function of ing the control register space to the memory space discussed above is blocked, which is very unfavorable to us. In addition, pci_enable_device has also enabled some interruptions.
③ Obtain various resources
Mmio_start = pci_resource_start (pdev, 1 );
Mmio_end = pci_resource_end (pdev, 1 );
Mmio_flags = pci_resource_flags (pdev, 1 );
Mmio_len = pci_resource_len (pdev, 1 );
Readers may wonder when our registers are mapped to the memory. In this case, during hardware power-on initialization, the BIOS firmware checks all PCI devices and assigns them a unique IP address, allow their drivers to map their registers to these addresses, which are written into the configuration space of each device by the BIOS because this activity is a PCI standard activity, therefore, they are naturally written into the configuration space of each device rather than the control register space with different styles. Of course, only the BIOS can access the configuration space. When the operating system is initialized, it allocates a pci_dev structure for each PCI device, and reads the address obtained by the bios and writes it to the address in the configuration space to the resource field in pci_dev. In this way, we will not need to access the configuration space after reading these addresses. We just need to access the configuration space directly with pci_dev. The four functions here are to read the relevant data directly from pci_dev, the code is in include/Linux/PCI. h. Definition:
# Define pci_resource_start (Dev, bar) (Dev)-> resource [(bar)]. Start)
# Define pci_resource_end (Dev, bar) (Dev)-> resource [(bar)]. End)
Each PCI device has a total of six address spaces (0-5). We usually only use the first two addresses. Here we pass parameter 1 to the bar to use the address space mapped by memory.
④ Map the obtained address
Ioaddr = ioremap (mmio_start, mmio_len );
If (ioaddr = NULL ){
Printk ("cannot remap mmio, aborting \ n ");
Rc =-EIO;
Goto err_out_free_res;
}
Ioremap is a function provided by the kernel to map peripheral registers to the primary memory. The address to be mapped has been read from pci_dev (previous step ), in this way, the ing is successful without conflict with other addresses. What is the effect after the ing? For example, if a network card has 100 registers, they are all connected together and their locations are fixed. Each register occupies 4 bytes, after a total of 400 bytes of space is mapped to the memory, ioaddr begins with this address (note that ioaddr is a virtual address, while mmio_start is a physical address, which is obtained by the BIOS, it must be a physical address, but the CPU does not recognize the physical address or virtual address in protection mode.) ioaddr + 0 is the address of the first register, ioaddr + 4 is the second register address (each register occupies 4 bytes), and so on, we can access all the registers in the memory and then manipulate them.
⑤ Restart the NIC Device
Restarting the NIC device is an important part of the NIC device initialization. The principle is to write commands to the register. (Note that writing registers instead of configuring the space, because it has nothing to do with PCI), the Code is as follows:
Writeb (readb (ioaddr + chipcmd) & chip1_clear) | cmdreset, ioaddr + chipcmd );
Yes. We can see the second parameter ioaddr + chipcmd, and chipcmd is a displacement, so that the address exactly corresponds to the chipcmd register. You can check the official datasheet to get the displacement, the value defined in the program is chipcmd = 0x37, which is consistent with datasheet. We can set the corresponding bit (reset) in this command register to 1 to complete the operation.
6. Obtain the MAC address and store it in net_device.
For (I = 0; I <6; I ++) {/* hardware address */
Dev-> dev_addr [I] = readb (ioaddr + I );
Dev-> broadcast [I] = 0xff;
}
We can see that the read address is from ioaddr + 0 to ioaddr + 5. When you look at the official datasheet, you will find that the first 6 bytes of the Register address space are the MAC address of the NIC device, the MAC address is the physical address that identifies the NIC in the network. This address will be used for receiving and transmitting packets in the future.
7. register some major functions with net_device.
Dev-> open = rtl8139_open;
Dev-> hard_start_xmit = rtl8139_start_xmit;
Dev-> stop = rtl8139_close;
Dev (net_device) indicates a device. After these functions are registered, rtl8139_open is used to open the device. rtl8139_start_xmit is called when the application sends data to the outside through the device, actually, this function is called in the network protocol layer, which involves the content of the Linux network protocol stack. We will not discuss it any more. We are only responsible for implementing it. Rtl8139_close is used to disable this device.
Now, we have finished introducing the rtl8139_init_one function. After the device Initialization is complete, we can activate our device through the ifconfig eth0 up command. This command directly causes the rtl8139_open call we just registered. This function activates the device. This function mainly does three things.
① Register the interrupt handling function for this device. When the network adapter sends or receives data, it notifies us in the form of interruptions. For example, if data is transmitted from the network cable, the interruption also notifies us, then there must be a function to process the interrupt to receive the data. The Linux interrupt mechanism is not a detailed description. If you are interested, refer to Linux Kernel Source code Scenario Analysis, But we must note that there is a very important resource, that is, the allocation of the interrupt number, which is the same as the memory address ing, the interrupt number is also allocated by the BIOS during the initialization phase and written to the configuration space of the device. Then, Linux reads the interrupt number from the configuration space when setting up pci_dev and writes it to the IRQ member of pci_dev, therefore, we need to get the interrupt number directly from pci_dev to register the interrupt program.
Retval = request_irq (Dev-> IRQ, rtl8139_interrupt, sa_shirq, Dev-> name, Dev );
If (retval ){
Return retval;
}
The interrupt handler function we registered is rtl8139_interrupt. That is to say, when the NIC is interrupted (such as data arrives), the interrupt controller 8259a sends the interrupt number to the CPU. The CPU finds the handler Based on the interrupt number, here is rtl8139_interrupt, and then execute. Rtl8139_interrupt is also defined in our program, which is an important obligation of the driver and a basic function. The request_irq code is in arch/i386/kernel/IRQ. C.
② Allocate the cache space for sending and receiving
According to the official documents, the process of sending a data packet is as follows: first, copy the data packet from the application to a continuous memory (this memory is the cache we will allocate here ), then, write the memory address into the data sending Address Register (tsad) of the NIC. The offset of this register is txaddr0 = 0x20. When the length of the data packet is written into another register (TSD), its offset is txstatus0 = 0x10. Then, the data in the memory is sent to the internal sending buffer (FIFO) of the NIC, and then the sending buffer sends the data to the network cable.
Now the purpose of creating such a sending and receiving buffer memory is obvious.
TP-> tx_bufs = pci_alloc_consistent (TP-> pci_dev, tx_buf_tot_len,
& TP-> tx_bufs_dma );
TP-> rx_ring = pci_alloc_consistent (TP-> pci_dev, rx_buf_tot_len,
& TP-> rx_ring_dma );
TP is the priv pointer of net_device, tx_bufs is the first address of the sending buffer memory, and rx_ring is the first address of the receiving cache memory. They are all virtual addresses, the last parameter tx_bufs_dma and rx_ring_dma are the physical addresses of the memory. Why do the same thing use a virtual address to indicate that it also uses a physical address? This is the case. When the CPU program uses this address, it uses a virtual address, the NIC device uses a physical address to access data in the memory (because the NIC is relatively simple to the CPU ). The pci_alloc_consistent code is in Linux/ARCH/i386/kernel/pci-dma.c.
③ Sending and receiving buffer initialization and nic operations
Rtl8139 has 4 sending Descriptors (including 4 sending buffer base address registers (TSAD0-TSAD3) and 4 sending Status Register (TSD0-TSD3 ). That is to say, we need to divide the allocated buffer into four equal points and write the addresses of the four spaces into the relevant registers. The following code completes this operation.
For (I = 0; I <num_tx_desc; I ++)
(Struct rtl8139_private *) Dev-> priv)-> tx_buf [I] =
& (Struct rtl8139_private *) Dev-> priv)-> tx_bufs [I * tx_buf_size];
The above code separates the virtual space of the sending buffer.
For (I = 0; I <num_tx_desc; I ++)
{
Writel (TP-> tx_bufs_dma + (TP-> tx_buf [I] TP-> tx_bufs), ioaddr + txaddr0 + (I * 4 ));
Readl (ioaddr + txaddr0 + (I * 4 ));
}
The above code separates the physical space of the sending buffer and writes it to the relevant register, in this way, when the network adapter starts to work, it can quickly locate and find the memory and access their data.
Writel (TP-> rx_ring_dma, ioaddr + rxbuf );
The above code writes the physical address of the receiving buffer to the relevant registers, so that the NIC can accurately transfer the data from the NIC to the memory space after receiving the data, wait for the CPU to pick them up.
Writeb (readb (ioaddr + chipcmd) & chip1_clear) |
Cmdrxenb | cmdtxenb, ioaddr + chipcmd );
After reset the device, we need to activate the send and receive functions of the device. The above code writes the corresponding value to the relevant register and activates these functions of the device.
Writel (tx_dma_burst <txdmashift), ioaddr + txconfig );
The above code is to write the tx_dma_burst <txdmashift value to the txconfig (displacement is 0x44) Register of the NIC. The translation is 6 <8, it is to set the three digits from 8th to 10th to 110 to 110. According to the canonicalized documents, 6 is 1024, which indicates that the data volume of one DMA Operation is bytes.
In addition, in this phase, we set the mode for receiving data, and enable the interrupt, which is limited by the readers themselves.
The data sending and receiving phase is as follows:
When a network application sends data to the network, it uses the network protocol stack of Linux to solve a series of problems and finds the network adapter device representative net_device, the structure is used to locate and control the NIC device to send packets. Specifically, the net_device hard_start_xmit member function is called. This is a function pointer, in our driver, it points to rtl8139_start_xmit, which completes our sending work. Next we will analyze this function. It has done four things in total.
① Check the length of the data packet to be sent. If it cannot reach the length of the Ethernet frame, measures must be taken to fill it.
If (SKB-> Len <eth_zlen) {// If data_len <60
If (SKB-> Data + eth_zlen) <= SKB-> end ){
Memset (SKB-> Data + SKB-> Len, 0x20, (eth_zlen-SKB-> Len ));
SKB-> Len = (SKB-> Len> = eth_zlen )? SKB-> Len: eth_zlen ;}
Else {
Printk ("% s :( SKB-> Data + eth_zlen)> SKB-> end \ n" ,__ function __);
}
}
SKB-> data and SKB-> end determine the content of the package. If the total length of the package (SKB-> end-SKB-> data) does not meet the requirements, if you want to fill it out, there is no place to fill it out, and an error will be returned. Otherwise, it will be filled in.
② Copy the package data to the sending cache we have established.
Memcpy (TP-> tx_buf [entry], SKB-> data, SKB-> Len );
Here, SKB-> data is the data address of the data packet, while TP-> tx_buf [entry] is our sending cache address, which completes the copy, if you forget this content, go back to the previous introduction.
③ If the address and data are available, we need to let the network card know the length of the package to ensure that the data is not much precise and captured from the cache and moved to the network card, this is done by writing the sending Status Register (TSD.
Writel (TP-> tx_flag | (SKB-> Len> = eth_zlen? SKB-> Len: eth_zlen), ioaddr + txstatus0 + (Entry * 4 ));
We have written the length of this package and some control information into the status register, so that the NIC work has a basis.
④ Determine whether the sending cache is full. If the sending cache is full, the data will be overwritten and the sending will be stopped.
If (TP-> cur_tx-num_tx_desc) = TP-> dirty_tx)
Netif_stop_queue (Dev );
After talking about sending, we will start to talk about receiving. When there is data coming from the network cable, the NIC will interrupt. The calling of the interrupt service program is rtl8139_interrupt, which mainly involves three things.
① Read the status value from the NIC interrupt Status Register for analysis. Status = readw (ioaddr + intrstatus );
If (Status & (pcierr | pcstimeout | rxunderrun | rxoverflow |
Rx1_oover | txerr | txok | rxerr | rxok) = 0)
Goto out;
The code above indicates that if none of the above nine cases indicates that there is no good processing, exit.
② If (Status & (rxok | rxunderrun | rxoverflow | rx1_oover)/* RX interrupt */
Rtl8139_rx_interrupt (Dev, TP, ioaddr );
In the above four cases, the receiving signal is called rtl8139_rx_interrupt for receiving.
③ If (Status & (txok | txerr )){
Spin_lock (& TP-> lock );
Rtl8139_tx_interrupt (Dev, TP, ioaddr );
Spin_unlock (& TP-> lock );
}
If the signal is transmitted, call rtl8139_tx_interrupt to handle the problem.
Next, let's take a look at the rtl8139_rx_interrupt function for receiving interruption. In this function, we mainly do the following four things:
① This function is a large loop. The loop condition is that as long as the receiving cache is not empty, data can be read again, and the loop will not stop. After the read is empty, the system jumps out.
Int ring_offset = cur_rx % rx_buf_len;
Rx_status = le32_to_cpu (* (u32 *) (rx_ring + ring_offset ));
Rx_size = rx_status> 16;
The above three lines of code calculate the length of the package to be received.
② Allocate the data structure of the package based on the length
SKB = dev_alloc_skb (pkt_size + 2 );
③ If the allocation is successful, copy the data from the receiving cache to the package.
Eth_copy_and_sum (SKB, & rx_ring [ring_offset + 4], pkt_size, 0 );
In include/Linux/etherdevice. H, this function actually calls memcpy ().
Static inline void eth_copy_and_sum (struct sk_buff * DEST, unsigned char * SRC, int Len, int Base)
{
Memcpy (DEST-> data, SRC, Len );
}
Now we know that & rx_ring [ring_offset + 4] is the receiving cache and source address, while SKB-> data is the data address of the package and the destination address, which is clear at a glance.
④ Send this package to the Linux protocol stack for further processing
SKB-> protocol = eth_type_trans (SKB, Dev );
Netif_rx (SKB );
After the netif_rx () function is executed, the data in this package is out of the NIC drive category and enters the Linux network protocol stack. The Ethernet frame header and IP Address Header of these packets are, the TCP header is removed and the data is finally sent to the application. However, the protocol stack is not covered in this article. The netif_rx function is in net/CORE/dev. C.
Rtl8139_remove_one is basically the inverse process of rtl8139_init_one.
This article outlines the Linux driver framework. If you have any questions, contact me. QQ: 591970467.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.