I/O architecture and Device Drivers (5)

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

4.5 Direct Memory Access (DMA)

In the original PC architecture, the CPU is the only bus controller in the system. To extract and store the value of the ram storage unit, the CPU is the only hardware device that can drive the address/Data Bus, with the emergence of modern bus architectures such as PCI, every peripheral device can act as the bus master controller if appropriate circuits are provided. Therefore, all PCs now contain an auxiliary DMA circuit, which can be used to control the transmission of data between RAM and I/O devices.

Once the DMA is activated by the CPU, data can be transmitted by itself. After the data transmission is complete, the DMA sends an interrupt request. When the CPU and DMA access the same memory unit at the same time, a hardware circuit named Memory arbitration is used to resolve the conflict.

The most frequently used DMA is the disk drive and other devices that need to transmit a large number of bytes at a time. Because the DMA setting takes a long time, it is more efficient to directly use the CPU when transmitting a small amount of data.

4.5.1. synchronous DMA and asynchronous DMA

The device driver can use DMA in two ways: Synchronous DMA and asynchronous DMA. First, data transmission is triggered by processes. Second, data transmission is triggered by hardware devices.

Using synchronous DMA such as a sound card, the user application writes the sound data to the device file corresponding to the digital signal processor DSP of the sound card. The sound card driver collects the written samples in the kernel buffer. At the same time, the driver commands the sound card to copy these samples from the kernel buffer to the pre-scheduled DSP. When the sound card completes data transfer, it will cause an interruption, and the driver will check whether there are samples to be played in the kernel buffer; if so, the driver will start a DMA data transfer again.

Asynchronous DMA is used, such as the NIC, which receives frames from a LAN. The NIC stores the received frames in its own I/O shared storage, and then causes an interruption. After the driver confirms the interruption, the command Nic copies the received frames from the I/O shared memory to the kernel buffer. When the data transfer is complete, the NIC will cause a new interruption, and the driver will notify the new frame to the upper kernel layer.

4.5.2. Auxiliary Functions transmitted by DMA

When a driver is designed for a device using a DMA transfer method, the code written by the developer should be unrelated to the architecture and bus (in the case of a DMA transfer method.Because the kernel provides a variety of DMA auxiliary functions, the above objectives can be achieved now. These helper functions hide the differences in DMA implementation mechanisms of different hardware architectures.

The DMA auxiliary function has two subsets: The old subset provides a function unrelated to the architecture for the PCI device, and the new subset ensures that it is irrelevant to both the bus and the architecture. Introduction:

1. Bus address

Each DMA data transmission (at least) requires a memory buffer, which contains the data to be read or written by hardware devices. Generally, before starting a data transfer, the device driver must ensure that the DMA circuit can directly access the RAM memory unit.

There are now three types of memory addresses: Logical Address, linear address, and physical address. The first two are used inside the CPU, and the last one is the memory address used by the CPU to physically drive the data bus. But there is also the fourth type of memory address, known as the BUS address, which is the memory address used by hardware devices other than the CPU to drive the data bus.

Basically, why should the kernel care about the bus address? This is because the CPU is not required for data transmission during DMA operations. I/O devices and DMA circuits directly drive the data bus. Therefore, when the core starts the DMA Operation, it is necessary to write the involved memory buffer BUS address or the appropriate dma I/O port, or write the appropriate I/O port to the I/O device.

In the 80x86 architecture, the BUS address is the same as the physical address. However, other architectures, such as sun or HP Alpha, both include an I/O Memory Management Unit (IO-MMU) hardware circuit, similar to the microprocessor paging unit, maps physical addresses to bus addresses. Io-MMU must be set before all I/O drivers using DMA start a data transfer.

Different bus have different BUS address sizes. The isa bus address is 24-bit long. Therefore, in the 80x86 architecture, DMA transfer can be completed in 16 MB of physical memory-that is why the memory buffer used by DMA is allocated in the zone_dma memory zone (with the gfp_dma flag configured ). The original PCI standard defines a 32-bit BUS address. However, some PCI hardware devices were originally designed for the ISA bus, so they still cannot access the RAM memory units above the physical address 0x00ffffff. The new PCI-X standard uses a 64-bit BUS address and allows direct access to DMA Circuits

Memory with higher addressing.

In Linux, the Data Type dma_addr_t represents a common BUS address. In the 80x86 architecture, dma_addr_t corresponds to a 32-bit long integer, unless the kernel supports Pae. In this case, dma_addr_t represents a 64-bit integer.

Pci_set_dma_mask () and dma_set_mask () auxiliary functions are used to check whether the bus can receive bus addresses of a given size (mask). If yes, the peripheral device specified by the bus layer is notified to use the BUS address of this size.

2. High-speed cache consistency

The system architecture does not need to provide a consistency protocol between the hardware cache and the DMA circuit at the hardware level. Therefore, when performing the DMA ing operation, the DMA auxiliary function must consider the hardware cache. Why? Assume that the device driver fills in some data in the memory buffer, and then immediately commands the hardware device to read the data through DMA transmission. If the DMA accesses these physical RAM memory units and the content of the corresponding hardware high-speed cache row (between CPU and RAM) has not been written into RAM, the hardware device reads the old value in the memory buffer.

The device driver developer can use two methods to process the DMA buffer, that is, select between the two DMA ing types:

L consistent DMA ing

Each write operation performed by the CPU on the RAM memory unit is immediately visible to the hardware device. And vice versa.

L stream DMA ing

In this ing mode, the device driver must pay attention to the high-speed cache consistency problem, which can be solved by using appropriate synchronous auxiliary functions, also known as "Asynchronous"

There is no high-speed cache consistency problem when using dma in the 80x86 architecture, because the device driver itself "snoop" The accessed hardware high-speed cache. Therefore, the drivers designed for hardware devices in the 80x86 architecture will choose one of the two aforementioned DMA ing methods: they are essentially equivalent.

In some model systems of MIPs, iSCSI, and PowerPC, hardware devices usually do not snoop on hardware high-speed caching, which leads to high-speed cache consistency.

In general, it is important to select an appropriate DMA ing method for drivers unrelated to the architecture.

Generally, if the CPU and DMA processor access a buffer in an unpredictable way, the consistent DMA ing method (such as the buffer of the SCSI adapter's command data structure) must be forced ). In other cases, stream DMA ing is more desirable because it is difficult to process consistent DMA ing in some architectures and may lead to lower system performance.

L auxiliary functions of consistent DMA ing

Generally, the device driver allocates a memory buffer and establishes a consistent DMA ing during the initialization phase. The ing and buffer are released upon uninstallation. To allocate memory buffers and establish consistent DMA ing, the kernel provides the pci_alloc_consistent () and dma_alloc_coherent () functions that depend on the architecture. Both return the linear and bus addresses of the new buffer. In the 80x86 architecture, they return the linear and physical addresses of the new buffer. To release the ing and buffer, the kernel provides the pci_free_consistent () and dma_free_coherent () functions.

L auxiliary functions of stream DMA ing

The memory buffer for stream DMA ing is usually mapped before data transfer, and is unmapped after transfer. It is also possible to maintain the same ing during several DMA transfers, but in this case, the device driver developer must know the hardware high-speed cache between the memory and the peripheral device.

To start a streaming DMA data transfer, the driver must first use the partition page box distributor or the general memory distributor to dynamically allocate the memory buffer. Then the driver calls pci_map_single () or dma_map_single () to establish a stream DMA ing. These two functions receive the linear address of the buffer as their parameters and return the corresponding BUS address. To release the ing, the driver calls the corresponding pci_unmap_single () or dma_unmap_single () function.

To avoid high-speed cache consistency, if necessary, the driver should call pci_dma_sync_single_for_device () or dma_sync_single_for_device () before starting the DMA data transfer from Ram to the device () refresh the cache row corresponding to the DMA buffer. Similarly, before a DMA data transfer from a device to ram is completed, the device driver cannot access the memory buffer. On the contrary, if necessary, before reading the buffer, the driver should call pci_dma_sync_single_for_cpu () or dma_sync_single_for_cpu () to invalidate the corresponding hardware cache line.In the 80x86 architecture, the above functions do almost nothing, because the consistency between hardware cache and DMA is maintained by hardware..

Even high-end memory buffers can be used for DMA transfer. developers use the pci_map_page () or dma_map_page () functions, the parameter passed to it is the descriptor address of the page where the buffer is located and the offset address of the buffer in the page. Correspondingly, to release the ing of high-end memory buffers, developers use the pci_unmap_page () or dma_unmap_page () functions.

4.6 Kernel support level

The Linux kernel does not fully support all possible I/O devices. Generally, there are three possible methods to support hardware devices:

L not supported at all

Applications Use appropriate In and Out assembly language commands to directly interact with the device's I/O Ports.

L minimum support

The kernel does not recognize hardware devices, but can recognize its I/O interfaces. The user program regards the I/O interface as a sequential device capable of reading and writing the upstream stream.

L extension support

The kernel recognizes hardware devices and processes the I/O interface itself. In fact, such devices may not have corresponding device files.

The first method has nothing to do with the driver of the kernel device. The most common example is the traditional image display Processing Method of the X Window System, which is highly efficient, however, hardware interruptions caused by the use of I/O devices on X servers are limited. To allow the X server to access the requested I/O port, other efforts are still needed, such as iopl () and ioperm () system calls authorize the process to access the I/O Ports. Only the root user can call these two system calls. However, by setting the setuid flag for executable files, common users can also use these programs.

The new Linux version supports several widely used graphics cards. The/dev/FB device file provides an abstraction for the frame buffer of the graphics card and allows the application software to access it without having to know anything about the I/O port of the graphic interface. In addition, the kernel provides direct rendering infrastructure (DRI), which allows application software to fully explore the hardware features of the 3D acceleration graphics card.

The minimum support method is used to process external hardware devices connected to the general I/O interfaces. The kernel processes I/O interfaces by providing device files, and applications process external hardware devices by reading and writing device files.

The minimum support is better than the extended support because it keeps the kernel as small as possible. In PC, this method is used only for string/parallel processing. The minimum supported application scope is limited, because this method is not available when peripherals must frequently interact with the internal data structure of the kernel. In this case, you must use extension support.

Generally, any hardware device directly connected to the I/O bus (such as a built-in hard disk) must be processed according to the extended support method: the kernel must provide a device driver for each of these devices. USB, PCMCIA, or SCSI interfaces. In short, all external devices connected to General I/O interfaces except the serial port and parallel port must be extended.

It is worth noting that system calls related to standard files, such as open (), read (), and write (), do not always allow applications to fully control underlying hardware devices. In fact, the "minimum public Denominator" method of VFS does not contain special commands required by certain devices, or does not allow applications to check whether the devices are in a special internal state.

The introduced IOCTL () system call can meet such requirements. In addition to the file descriptor of the device file and another 32-bit parameter that represents the request, this system call can also receive any number of additional parameters. For example, a special IOCTL () request can be used to obtain the volume of the CD-ROM or to bring up the CD-ROM media. Applications can use this type of IOCTL () to request the user interface of a CD player.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

I/O architecture and Device Drivers (5)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

I/O architecture and Device Drivers (5)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support