DMA is a hardware capability that enables hardware to read and write data directly from main memory, i.e. it can directly use main memory for I/O without the need for processor intervention, which can save processor resources and improve IO throughput across the system because IO operations are relatively slow, If processor resources are used for each IO, there is no thought that it will consume a large amount of CPU time on a single IO, resulting in decreased system IO performance.One, the DMA work methodFor I/O, there are two modes of operation at the input end:
The software initiates a read request, then the hardware should be requested (memory more in this way) hardware generated input events, and then hardware processing (more than the way the network card) on the output, are the software unsolicited write requests (this is obvious, because the output of the data is obviously to be prepared by the software).
Because DMA is a way of I/O, this is the scenario in which it works.
The way DMA Works is:
When entered, the software prepares a memory area (DMA buffer), the hardware is then told that the hardware is written to this part of the data in a DMA way, the software prepares a memory area (DMA buffer) that contains the output data, and then tells the hardware that the hardware obtains this part of the data by DMA and outputs it. In essence this is how DMA works. On the input side, however, the timing of the different I/O working methods to prepare the DMA buffer varies.
II. Allocating DMA cachesWhen using DMA, it is important to note that if the DMA buffers are larger than one page, they must occupy a contiguous physical memory page because the device will need to pass through the bus to which it is connected (a typical bus is a PCI bus), that is, the device needs to access this part of the address through the bus In some architectures, the bus needs to use physical addresses, so for portability, the total use of physical addresses is always good.
The mechanism for allocating buffers can be when the system is started or when the system is running, and the driver's implementation needs to make a choice based on its own circumstances. The driver must ensure that it allocates the correct buffer (the allocation token GFP_DMA can help allocate memory from the DMA zone).
Kernel parts that are programmed into the core can reserve large chunks of memory for themselves when the system is started, but if a kernel part compiled into a kernel module needs to use large chunks of memory, the method does not apply, and it can be used in a different way: Assuming the system has 4G of memory in total, A kernel module wants to reserve 100M of memory area for itself, you can set the startup parameter mem=3.9g when the system starts, so the kernel will not use the last 100M, and then the kernel part can use Ioremap to get the last 100M of memory.
Since the bus uses a physical address and the program uses a virtual address, a conversion is required between the two, and the kernel provides the following two functions to convert between:
unsigned long virt_to_bus (volatile void *address);
void *bus_to_virt (unsigned long address);
Here's how the DMA buffer is allocated, and the way in which the bus address and the virtual address are converted is a low level interface in which the user needs to be familiar with the hardware and the structure, saying that the driver has to make sure that he knows everything. There is actually a better solution: to facilitate the use of DMA, the kernel provides a common DMA layer, preferably using the interface of the common DMA layer, which simplifies the drive-writing effort.
third, the Common DMA layerThe way the bus works on different architectures, how the memory is allocated, how the cache can be handled is different, the kernel provides a DMA layer that is independent of the bus and architecture, and it hides most of the problems, so it should be the first choice when using DMA.
3.1 Set Hardware DMA capabilityThe Universal DMA layer assumes that the device can perform DMA on a 32-bit address, and if a device cannot perform DMA on a 32-bit address, it should call the
int dma_set_mask (struct device *dev, u64 mask);
To set your own DMA-capable address, such as if the device is DMA only on 16-bit addresses, you should set mask to 0xFFFF.
The return value of the function indicates whether the kernel supports DMA on the specified mask, and if 0 is returned, the kernel supports such DMA, and if 0 is returned, the kernel does not support DMA and the device will no longer be able to perform the DMA operation.
If the device supports DMA on a 32-bit address, you do not have to perform the function.
3.2 DMA MappingThe DMA map is associated with the virtual address of the DMA buffer that will be allocated and the address (that is, the bus address) that is generated for the device.
It is mentioned that Virt_to_bus can be used to turn virtual addresses into bus addresses, but it is not always true because some architectures support IOMMU, and IOMMU hardware provides a set of mapping registers for the bus. Iommu manages physical memory within the range of address spaces that the device can access, which makes the physically dispersed buffers likely to be contiguous to the device. Virt_to_bus is not working in this way. The Universal DMA layer includes support for the use of Iommu. Therefore, using the Universal DMA layer is simpler and less error-prone.
DMA mappings must address the issue of cache consistency, which is a concern for all operations involving low-level memory access, because the processor caches the most recently used memory, which can cause problems if the cache and its corresponding main memory data are inconsistent. The Universal DMA layer completes this work.
The DMA map uses data structure dma_addr_t to represent the bus address. It is used by the bus and the driver should not use it.
Depending on the lifecycle of the DMA buffer, there are two types of DMA mappings:
3.2.1 Consistent DMA MappingsThe mapping of this type has the same cycle as the driven lifecycle. This mapped buffer must be accessible to both the CPU and the peripherals. Therefore, a consistent mapping must be established in the consistency cache, and the mapping of that type is expensive to establish and use.
A consistent mapping can be established through dma_alloc_coherent, with the following prototype:
void * Dmam_alloc_coherent (struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t GFP);
It completes the allocation and mapping of buffers. Meaning of each parameter:
Dev: Device device structure size: Buffer size in bytes Dma_handle: The bus address associated with the buffer GFP: allocation token The function returns the virtual address of the buffer.
When you are finished, you need to use Dmam_free_coherent to free the DMA buffer, which is the prototype:
void dmam_free_coherent (struct device *dev, size_t size, void *vaddr,dma_addr_t dma_handle);
The meaning of each parameter is the same as when it is allocated.
In addition to the above two APIs, the kernel also provides a mechanism for generating small, consistent DMA mappings-DMA pools. It can generate a smaller, consistent DMA buffer.
When using buffers in the DMA pool, you need to first create a DMA pool, which is created with Dma_pool_create and released with Dma_pool_destory. The prototypes are as follows:
struct Dma_pool *dma_pool_create (const char *name, struct device *dev,size_t size, size_t align, size_t boundary);
The parameters have the following meanings:
Name of the NAME:DMA pool dev: Device data structure pointer size: the size of the buffer allocated from the DMA pool align: the principle boundary to follow when allocating from this pool: if it's not 0, Then the memory returned from the DMA pool cannot cross the boundary of the boundary of 2. void Dma_pool_destroy (struct dma_pool *pool);
When used, the DMA cache needs to be allocated from the DMA pool and the DMA cache is allocated from the DMA pool using the function Dma_pool_alloc, which is the following prototype:
void *dma_pool_alloc (struct dma_pool *pool, gfp_t mem_flags, dma_addr_t *handle);
The parameters have the following meanings:
Pool: DMA pool from which to allocate mem_flags: allocation tag handle: the corresponding bus address of the buffer the return value of the function is the virtual address of the buffer.
When you use the DMA buffer allocated from the DMA pool, you need to use Dma_pool_free to release it. The prototype is as follows:
void Dma_pool_free (struct dma_pool *pool, void *vaddr, dma_addr_t DMA); its parameter meaning is the same as Dma_pool_alloc
3.2.2 Streaming DMA mappingA streaming DMA mapping is typically established for a separate DMA operation. On some architectures, streaming DMA mappings are optimized, which, of course, requires strict access rules. When using DMA mappings, you should select streaming DMA first because, on systems that support mapping registers, each DMA map requires one or more mapping registers on the bus. Consistent mapping has a long declaration cycle, which can take up these valuable resources for a long time, which is sometimes a waste. On some hardware, streaming mappings can be optimized using methods that are not available in a consistent mapping.3.2.2.1 to establish a streaming DMA mapThe interface of a streaming map is more complex than a consistent mapping because:
Streaming mappings should be able to work with buffers that are already allocated by the drive, and thus have to deal with those that are not the addresses they choose (but have been driven for allocation). On some schemas, a streaming map can have multiple discrete pages and multiple "scatter/aggregate" buffers. When creating a streaming map, you must specify the flow direction of the data. The kernel defines some enumerated types for this purpose:
Dma_to_device Dma_from_device dma_bidirectional Dma_none except for the last Dma_none the other several meanings are obvious, and the last one is for debugging purposes only.
Drivers should not always use dma_bidirectional, because in some architectures, this can lead to a sharp drop in performance.
When there is only one buffer to transfer, the function dma_map_single is used to map it, and its prototype is as follows:
dma_addr_t dma_map_single (struct device *dev, void *ptr, size_t size, enum dma_data_direction direction);
It establishes a streaming map and associates the kernel virtual address with the bus address. After this step is completed, the kernel guarantees that all the data contained in the buffer has entered main memory instead of the CPU cache (i.e. cache). Each parameter has the following meanings:
Dev: Device data structure pointer PTR: Pointer to DMA buffer size: direction: The direction of data flow after transmission, to remove the mapping with Dma_unmap_single, the prototype is as follows:
void Dma_unmap_single (struct device *dev, dma_addr_t dma_addr, size_t size, enum dma_data_direction direction);
It deletes the specified mapping, and the parameter meaning is the same as the mapping.
Streaming DMA Rules:
Buffers can only be used for direction specified data transfer once the buffer is mapped, it belongs to the device and not to the processor. The driver cannot access the buffer in any way until the mapping is deleted. This mapping cannot be removed during DMA activity, that is, when the device is still using the buffer. The kernel also provides a way for the driver to access the contents of the streaming DMA buffer before undoing the mapping, by first calling Dma_sync_single_for_cpu, after the function is called, and the CPU has the buffer, so the buffer can be accessed; Call Dma_sync_single_for_device to return control of the buffer to the device.
void dma_sync_single_for_cpu (struct device *dev, dma_addr_t dma_handle, size_t size, enum dma_data_direction direction);
void Dma_sync_single_for_device (struct device *dev, dma_addr_t addr, size_t size, enum dma_data_direction dir);
3.2.2.2 single page flow mapping
The Universal DMA Framework also provides an API for DMA mapping and cancellation of a single page, with the following APIs:
dma_addr_t dma_map_page (struct device *dev, struct page *page, size_t offset, size_t size, enum dma_data_direction dir);
The parameter meaning is as follows:
Dev: Device data structure Pointer page: pointing to the page pointer offset as a dam buffer: Where does the map start from page size: Map area dir: Data flow direction The function returns the virtual address of the buffer. As you can see from the parameters, it is possible to specify that only a portion of a page is mapped, but it is not recommended because the page is the kernel that manages the physical memory and the kernel is based on it to provide consistent control, and mapping only one page can cause consistency problems.
void Dma_unmap_page (struct device *dev, dma_addr_t addr, size_t size, enum dma_data_direction dir);
This function is used to cancel the mapping
The 3.2.2.3 divergence/Convergence Mapping Universal DMA Framework also provides a special type of streaming DMA mapping mechanism-divergence/aggregation mapping. This mechanism allows the creation of DMA mappings for multiple buffers at once. The prototype is as follows:
int Dma_map_sg (struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction direction);
The parameters have the following meanings:
Dev: Device data structure pointer SG: Buffer list of the first buffer in the pointer nets:sg how many buffers direction: Data flow direction The return value of the function is the number of buffers successfully mapped. If some buffered physical or virtual addresses are adjacent to the scatter/Converge list, and Iommu can map them to a single block of memory, the return value may be smaller than the input value nents.
The data structure scatterlist contains information about each buffer, which is defined as follows:
struct Scatterlist {
#ifdef config_debug_sg
unsigned long sg_magic;
#endif
unsigned long page_link;
unsigned int offset;
unsigned int length;
dma_addr_t dma_address;
#ifdef config_need_sg_dma_length
unsigned int dma_length;
#endif
};
Note If the SG has already been mapped, it cannot be mapped again, and mapping will damage the information in the SG. For each buffer in SG, the function will generate a device bus address correctly for it, the driver should use the bus address, and the kernel provides two related macros:
dma_addr_t sg_dma_address (struct scatterlist *sg);
Used to return bus (DMA) addresses from Scatterlist.
unsigned int sg_dma_len (struct scatterlist *sg);
Used to return the length of this buffer.
void Dma_unmap_sg (struct device *dev, struct scatterlist *list, int nents, enum dma_data_direction direction);
This function is used to cancel the divergence/aggregation mapping. Netns must be equal to the value passed to Dma_map_sg, not the value returned by DMA_MAP_SG.
Similar to a single mapping, if the CPU must access a buffer that has already been mapped, you must first have the CPU obtain these buffers, and the corresponding APIs are as follows:
void dma_sync_sg_for_cpu (struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction direction);
void Dma_sync_sg_for_device (struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction direction);
Four, DMA controller (DMAC)The DMA controller has information about the DMA transfer, such as the direction of the transfer, the memory address, and the size of the transmitted data. It also includes a counter to track the status of the delivery in progress. When the controller receives a DMA request signal, it obtains control of the bus and drives the signal line so that the device can read and write data.
When an external peripheral wants to transmit data, it must first activate the DMA request line, and the actual transmission is managed by the Dmac. When the DMA controller selects the device, that is, the device can access the bus, it reads and writes on the bus, and when read and write is complete, the device is often notified by interruption. The driver of the peripherals is responsible for providing the DMAC with the direction of transmission, the bus address, and the size of the transmitted data. The driver of the peripherals is also responsible for preparing the transmitted data and responding to interrupts at the end of the DMA.
The DMA controller includes multiple (4) DMA channels, each associated with a set of DMA registers used to hold the information required for the DMA operation, so the number of dam channels determines the number of DMA that can be managed simultaneously by the DMA controller. The size of each DMA transfer is saved in the DMA controller, indicating how many bus cycles are required for each transmission, and the bus cycle * bus bandwidth can get the size of the data transmitted each time. The DMA controller is a system-wide resource, and the DMA resource exists as a channel. The kernel provides a set of APIs to manage this resource.
4.1 registering DMASimilar to break-through, the kernel provides an API for applying to trial DMA channels. The corresponding APIs are as follows:
int REQUEST_DMA (unsigned int chan, const char *dev_id);
The meaning of each parameter:
Chan: The requested channel number. is a value that is less than max_dma_channels
DEV_ID: Used to identify who is requesting DMA channel resources.
return 0 when function succeeds
void FREE_DMA (unsigned int channel);
This function is used to free DMA channel resources.
In general, if DMA also requires interrupts, it is recommended that you first request the interrupt resource, then request the DMA resource, release the DMA resources, and then release the interrupt resources.
4.2 Setting up the DMA controllerIf you want to use DMA (such as DMA reads or DMA writes) after the DMA resource is applied, the device driver needs to properly set up the DMA controller so that it can work.
The DMA controller is a shared resource, and it does not support concurrent settings, so the DMA controller is protected by a spin lock dma_spin_lock. Device drivers can use the spin lock using the following two functions:
unsigned long claim_dma_lock ();
It is used to obtain a DMA spin lock whose return value must be passed to the function that frees the DMA spin lock when the DMA spin lock is released.
void Release_dma_lock (unsigned long flags);
It is used to free a DMA spin lock.
Spin locks are used to protect the DMA controller, so when a driver sets the DMA controller, a spin lock must be held. The APIs that are set up for DMA controllers include:
void Set_dma_mode (unsigned int channel, char mode);
Sets the transfer mode for the DMA channel channel.
void set_dma_addr (unsigned int channel, unsigned int addr);
This function is used to set the bus address of the DMA channel channel
void Set_dma_count (unsigned int channel, unsigned int count);
This function is used to set the number of bytes to be transferred by the DMA channel channel.
void ENABLE_DMA (unsigned int channel);
This function is used to enable DMA channels that can be specified
void DISABLE_DMA (unsigned int channel);
This function is used to turn off the specified DMA channel more APIs, as detailed in the relevant BSP Dma.h