STM32 GPIO Fast data transfer with DMA

Source: Internet
Author: User


AN2548-DMA controller with STM32F101XX and stm32f103xx


DMA is a device on Amba's Advanced High Performance bus (AHB) with 2 AHB ports:



One is from the port, the DMA is configured, and the other is the primary port, allowing DMA to transfer data between different slave devices.



The role of DMA is to complete the data transfer in the background without the intervention of the CORTEX-M3 core.



During the transfer of data, the main processor can perform other tasks, only after the end of the entire data block transfer,



The operation of the main processor is interrupted when this data needs to be processed.



It can achieve a large amount of data transmission in the case of small impact on system performance.



DMA is primarily used to implement a centralized data buffer store for different peripheral modules (usually in the SRAM of the system).



Compared with distributed solutions (each peripheral needs to implement its own data storage),



This solution is better than both the chip area and the power consumption.



The STM32F10XXX DMA controller leverages the benefits of the CORTEX-M3 Harvard architecture and multi-tier bus systems,



Achieves very low DMA data transfer latency and CPU response interrupt latency.


Key Features of DMA


DMA has the following characteristics:


    • 7 DMA channels (channels 1 to 7) support one-way data transfer from source-to-destination
    • Hardware DMA channel priority and programmable software DMA channel priority
    • Supports memory-to-memory, memory-to-peripherals, peripheral-to-memory, peripheral-to-peripheral data transfer (storage can be SRAM or flash memory)
    • Ability to control hardware/software transfers
    • Automatic addition of memory and peripheral pointers during transmission
    • Programmable transmission Data Word length
    • Automatic Bus error Management
    • Cyclic mode/non-cyclic mode
    • Transmission of up to 65,536 data words


DMA is designed to provide a relatively large data buffer for all peripherals, typically located in the system's SRAM.



Each channel is assigned to a unique peripheral at a specific time, and peripherals connected to the same DMA channel



(Channel 1 to Channel 7 in table 1) is not able to use the DMA function at the same time.





Performance analysis


The STM32F10XXX has two main modules--CORTEX-M3 processors and DMA.



They connect to the bus from bus, Flash bus, SRAM Bus and AHB system via bus matrix.



Connect to two APB buses from the other end of the bus to serve all of the embedded peripherals (see Figure 1).



The bus matrix has two main features to maximize system performance and reduce latency:
Polling priority scheme
Multi-layer structure and bus misappropriation


Polling priority scheme


The Nvic and CORTEX-M3 processors enable high-performance, low-latency interrupt scenarios.



All CORTEX-M3 instructions are either single-cycle execution instructions or can be interrupted at the bus cycle level.



To maintain this advantage at the system level, the DMA and bus matrices must ensure that the DMA is not able to occupy the bus for a long time.



The polling priority scheme ensures that, if necessary, the CPU is able to access other slave buses every two bus cycles.



Therefore, the maximum bus system delay for the first data in the CPU appears to be a bus cycle (maximum two APB clock cycles).


Multi-layer structure and bus misappropriation


The multilayer architecture allows two primary devices to perform simultaneous data transfers as long as they are addressed to different slave modules.



Based on the CORTEX-M3 Harvard architecture, this multilayer structure improves the parallelism of data,



This reduces code execution time and optimizes DMA efficiency.



Since the flash memory is referred to by a completely independent bus,



So DMA and CPU only compete when they need to access the same data from the bus.



In addition, when other DMA controllers operate in burst mode, the DMA data transfer of the STM32F10XXX only uses a single bus cycle (bus misappropriation).



When using the bus to misappropriate the access mechanism, the maximum time that the CPU waits for data access is very short (one bus cycle).



Typically, CPU-to-SRAM access is performed alternately with the DMA operation, while the CPU accesses the SRAM while the DMA accesses the peripherals via the APB bus.



Although the burst mode using DMA can improve the speed of data transfer (DMA access peripheral),



However, it is inevitable that the CPU execution speed is slowed down. The difference between bus misappropriation and burst mechanism is shown.
Figure 2 Bus misappropriation mechanism and burst mechanism for DMA transmission






The extreme situation occurs when the CPU copies a piece of data from one place in memory to another place in memory.



In this case, the execution of the software must wait until the entire DMA transfer is complete.



In fact, most of the time the CPU is doing data processing (register read/write),



Less data access, so the CPU and DMA access to data is alternately.



Stm32f10xxx the inherent parallelism of the bus structure, plus the DMA bus misappropriation mechanism,



Ensures that the CPU will not wait long for the data to be read from the SRAM.



DMA using the bus misappropriation mechanism can therefore use the bus more efficiently, thus significantly reducing the time it takes to execute the software.


DMA delay


DMA completes the data transfer from the peripheral to the SRAM memory in three steps:


    • 1. DMA Request Quorum
    • 2. Reading data from Peripherals (DMA source)
    • 3. Write the read data to the SRAM (DMA target)



When DMA transmits data from memory to peripherals (such as SPI transfer), the following steps are made:


    • 1. DMA Request Quorum
    • 2. Reading data from SRAM (DMA source)
    • 3. Write the read data to the peripheral via the APB Bus (DMA target)


The total time to service each DMA channel,



TS = TA + TACC + tsram here,



TA is the time of arbitration, TA = 1 AHB Clock cycles



TACC is the access peripheral time, TACC = 1 AHB clock cycles (bus matrix quorum) + 2 APB clock cycles (actual data transfer) + 1 AHB clock cycles (bus sync)



Tsram is the time to read and write SRAM, Tsram = 1 AHB clock cycles (bus matrix quorum) + 1 AHB clock cycles (single read/write operation) or + 2 AHB clock cycles (the case of read-ahead SRAM and write SRAM)



The DMA controller compares the priority of all pending DMA requests when the DMA channel is idle or the 3rd step of the previous DMA channel is complete



(The software priority is compared first, the software priority is compared with the hardware priority),



The high-priority channel will be serviced, and the DMA begins the 2nd step operation.



When a channel is serving (2nd, 3-step operation is in progress), no other channel can be serviced, regardless of its priority.



When at least two DMA channels are enabled at the same time, the DMA latency of the highest priority channel is the time of transmission (excluding the quorum phase).



Plus the next time the DMA channel that will be serviced (the highest-priority channel suspended) is transferred.


Data bus bandwidth Throttling


Data bus bandwidth limitations are mainly due to the fact that the APB bus is slower than the system SRAM and the AHB bus speed.



For the highest priority DMA channel, the following two scenarios must be considered: (see Figure 3)
1. When more than one DMA channel is enabled, the highest priority channel must occupy a lower data bandwidth on the APB bus than 25% of the maximum APB transfer rate.
All time for the APB bus transmission must be taken into account, i.e. 2 APB clock cycles plus 2 AHB clock cycles used for quorum/synchronization.
2. Although high-speed/high-priority DMA transmissions typically occur on APB2 (faster APB bus), CPUs and other DMA channels can access peripherals on the APB1.
Approximately 3/4 of the APB transmissions are performed on the APB1, and the minimum APB2 frequency relies on the fastest DMA channel data bandwidth.



The largest APB clock divider factor is given by the following equation: FAHB > (2 x N2 + 6 x N1 + 6) x Bmax
If N2 < N1 N1 < (FAHB/BMAX)/8 where FAHB is the AHB clock frequency, N1 and N2 are APB1 and APB2 clock divide factor, Bmax is the maximum data bandwidth on the APB2, in units of transmission/sec.
Figure 3 Occupancy of the APB bus during DMA transfer





Channel Priority Selection


In order to achieve continuous transmission of peripheral data, the relevant DMA channel must be able to maintain the data transfer rate of peripheral



Ensure that the DMA service has a delay of less than two consecutive peripheral data intervals.



High-speed/high-bandwidth peripherals must have the highest DMA priority,



This ensures that the maximum data latency is tolerable for these peripherals and avoids overflow and underflow situations.



In the case of the same bandwidth requirement, it is recommended to assign a higher priority to peripherals that work in slave mode (which cannot control the data transfer speed) .



A relatively low priority is assigned to peripherals that work in main mode (capable of controlling data flow) .



By default, the channel and hardware priorities (from 1 to 7) are allocated,



are allocated in the order of highest priority assigned by the fastest peripherals.



Of course, this allocation may not apply in certain applications;



At this point, the user can configure the software priority for each channel (in 4, from very high to low),



The software priority takes precedence over the hardware priority level.



When using several peripherals at the same time (with or without DMA), the user must ensure that the internal system is able to maintain the total data bandwidth required by the application.



You must weigh the following two factors to find a compromise:


    • Application requirements for each peripheral
    • Internal data bandwidth
Application Requirements


In the case of the SPI interface, the SPI interface data bandwidth is obtained by dividing the baud rate by the length of the SPI data Word.



(Because the data is one that is immediately followed by the next transmission). Assuming that the SPI baud rate is 18Mbps,



The data is transmitted in 8-bit mode, and the operation is configured in a single-



Internal data bandwidth requirement is 2.25M transmit/sec;



If the SPI is configured in 16-bit mode, the data bandwidth will be 1.125M transmit/sec.



Note: When using the SPI 16-bit mode, the same baud rate,



The data bandwidth is divided by 2, which requires only 1.125m/seconds of transmission.



It is highly recommended to use 16-bit mode as much as possible to reduce bus occupancy and power consumption.


Internal data bandwidth


The internal data bandwidth relies on the following two conditions:


    • Bus frequency-The available data bandwidth is proportional to the bus clock frequency
    • Bus type ─AHB data transfer requires 2 clock cycles (SRAM write-first read access requires 3 cycles).
      Data transmission to peripherals via an APB bus takes 2 APB clock cycles plus two AHB clock cycles for bus quorum and data synchronization.


It is recommended that the DMA take up to 2/3 of the bus to maintain a reasonable level of system and CPU performance.


Gpio for fast data transfer using DMA


This example demonstrates how to use different peripherals for DMA requests and data transfers.



This mechanism allows for a simple fast parallel synchronization interface without using the CPU.



Timer 3 and DMA channel 6 connected to the Tim3_trig to implement the interface to obtain the data.



16-bit parallel data can be obtained on the Gpio port.



An external clock signal acts on the external trigger input of the timer 3,



On the rising edge of the external trigger, the timer generates a DMA request.



Since the GPIO data register address is set to the peripheral address of DMA channel 6,



The DMA controller reads the data from the GPIO port at each DMA request and stores it in the SRAM buffer.






This example shows how to use different peripherals for DMA request and data transfer.



This mechanism allows to implement simple fast parallel synchronous interfaces without using the CPU (for example a camera Interface).



Timer 3 and DMA1 channel 6 connected to Tim3_trig is used to implement this data acquisition interface.



An 16-bit parallel data are available on the GPIO port and a external clock signal applied on the external trigger input O F Timer 3.



On the rising edge of the external trigger, the timer generates a DMA request.



As the GPIO data register address is set to DMA1 Channel 6 peripheral address,



The DMA controller reads the data from the GPIO port to each DMA request, and stores it to an SRAM buffer.



This example shows how the DMA can is used to acquire data from a GPIO (parallel) port,



Synchronised with an (external) clock signal. (For the sake of this demo, the clock was generated by software toggling GPIOA pin 6).



The control of the DMA channel is do through the TIM3 Channel 1 (Input Capture Mode) which is using the DMA channel 6.



This was a non standard utilisation of the DMA as the peripheral controlling the DMA request (TIM3) is



Neither the source, nor the destination of the DMA data transfer.



Instead, the data source is a Gpio port (PD0-PD15)-programmed in Gpio input mode and



The data destination is a RAM buffer (accessed by the DMA Channel 6 in circular mode)



The TIMCLK frequency is set to a MHz, and used in Input CAPTURE/DMA Mode.



The system clock is set to 72MHz.






Hardware and Software Environment



-This example runs on stm32f10x high-density, stm32f10x medium-density and stm32f10x low-density Devices.

-This example have been tested with STMicroelectronics stm3210b-eval evaluation boards
And can is easily tailored to any other supported device and Development Board.

-Stm3210b-eval Set-up
-Connect a signal generator on pd.00 to pd.15.
-In the example, the pa.06 (Capture clock signal) are driven internally (by SW).
By removing the SW control on this pin (leaving the GPIO in input floating mode), an external clock signal can be used.
Alternativelly, pa.06 May is driven externally (leaving pa.06 in input floating mode-alternate function).


 
#include "stm32f10x.h"

TIM_TimeBaseInitTypeDef TIM_TimeBaseStructure;
TIM_ICInitTypeDef TIM_ICInitStructure;
DMA_InitTypeDef DMA_InitStructure;
__IO uint16_t Parallel_Data_Buffer[ 512 ];
ErrorStatus HSEStartUpStatus;


void RCC_Configuration( void );
void GPIO_Configuration( void );

int main( void )
{  
  /* System Clocks Configuration ---------------------------------------------*/
  RCC_Configuration( );  
  /* GPIO Configuration ------------------------------------------------------*/
  GPIO_Configuration( );  
  /* DMA Channel6 Configuration ----------------------------------------------*/
  DMA_InitStructure.DMA_PeripheralBaseAddr = (uint32_t) &GPIOD->IDR;
  DMA_InitStructure.DMA_MemoryBaseAddr = (uint32_t) Parallel_Data_Buffer;
  DMA_InitStructure.DMA_DIR = DMA_DIR_PeripheralSRC;
  DMA_InitStructure.DMA_BufferSize = 512;
  DMA_InitStructure.DMA_PeripheralInc = DMA_PeripheralInc_Disable;
  DMA_InitStructure.DMA_MemoryInc = DMA_MemoryInc_Enable;
  DMA_InitStructure.DMA_PeripheralDataSize = DMA_PeripheralDataSize_HalfWord;
  DMA_InitStructure.DMA_MemoryDataSize = DMA_MemoryDataSize_HalfWord;
  DMA_InitStructure.DMA_Mode = DMA_Mode_Circular;
  DMA_InitStructure.DMA_Priority = DMA_Priority_VeryHigh;
  DMA_InitStructure.DMA_M2M = DMA_M2M_Disable;
  DMA_Init( DMA1_Channel6, &DMA_InitStructure );  
  /* Enable DMA Channel6 */
  DMA_Cmd( DMA1_Channel6, ENABLE );
  
  /* TIM3 Configuration ------------------------------------------------------*/
  /* TIM3CLK = 72 MHz, Prescaler = 0, TIM3 counter clock = 72 MHz */
  /* Time base configuration */
  TIM_TimeBaseStructure.TIM_Period = 256;
  TIM_TimeBaseStructure.TIM_Prescaler = 0;
  TIM_TimeBaseStructure.TIM_ClockDivision = 0;
  TIM_TimeBaseStructure.TIM_CounterMode = TIM_CounterMode_Up;
  TIM_TimeBaseInit( TIM3, &TIM_TimeBaseStructure );
  
  /* Input Capture Mode configuration: Channel1 */
  TIM_ICInitStructure.TIM_Channel = TIM_Channel_1;
  TIM_ICInitStructure.TIM_ICPolarity = TIM_ICPolarity_Rising;
  TIM_ICInitStructure.TIM_ICSelection = TIM_ICSelection_DirectTI;
  TIM_ICInitStructure.TIM_ICPrescaler = TIM_ICPSC_DIV1;
  TIM_ICInitStructure.TIM_ICFilter = 0;
  TIM_ICInit( TIM3, &TIM_ICInitStructure );  
  /* Enable TIM3 DMA */
  TIM_DMACmd( TIM3, TIM_DMA_CC1, ENABLE );  
  /* Enable TIM3 counter */
  TIM_Cmd( TIM3, ENABLE );
  
  while ( 1 )
  {
    /* Trigger TIM3 IC event => DMA request by toggling PA.06 */
    GPIO_ResetBits( GPIOA, GPIO_Pin_6 );
    GPIO_SetBits( GPIOA, GPIO_Pin_6 );
  }
}


void RCC_Configuration( void )
{
  /* RCC system reset(for debug purpose) */
  RCC_DeInit( );
  
  /* Enable HSE */
  RCC_HSEConfig( RCC_HSE_ON );
  
  /* Wait till HSE is ready */
  HSEStartUpStatus = RCC_WaitForHSEStartUp( );
  
  if ( HSEStartUpStatus == SUCCESS )
  {
    /* Enable Prefetch Buffer */
    FLASH_PrefetchBufferCmd( FLASH_PrefetchBuffer_Enable );
    
    /* Flash 2 wait state */
    FLASH_SetLatency( FLASH_Latency_2 );
    
    /* HCLK = SYSCLK */
    RCC_HCLKConfig( RCC_SYSCLK_Div1 );
    
    /* PCLK2 = HCLK */
    RCC_PCLK2Config( RCC_HCLK_Div1 );
    
    /* PCLK1 = HCLK/2 */
    RCC_PCLK1Config( RCC_HCLK_Div2 );
    
    /* ADCCLK = PCLK2/4 */
    RCC_ADCCLKConfig( RCC_PCLK2_Div4 );
    
    /* PLLCLK = 8MHz * 9 = 72 MHz */
    RCC_PLLConfig( RCC_PLLSource_HSE_Div1, RCC_PLLMul_9 );
    
    /* Enable PLL */
    RCC_PLLCmd( ENABLE );
    
    /* Wait till PLL is ready */
    while ( RCC_GetFlagStatus( RCC_FLAG_PLLRDY ) == RESET )
    {
    }
    
    /* Select PLL as system clock source */
    RCC_SYSCLKConfig( RCC_SYSCLKSource_PLLCLK );
    
    /* Wait till PLL is used as system clock source */
    while ( RCC_GetSYSCLKSource( ) != 0x08 )
    {
    }
  }
  
  /* Enable TIM3 clock */
  RCC_APB1PeriphClockCmd( RCC_APB1Periph_TIM3, ENABLE );  
  /* Enable DMA clock */
  RCC_AHBPeriphClockCmd( RCC_AHBPeriph_DMA1, ENABLE );  
  /* GPIOA and GPIOD clock enable */
  RCC_APB2PeriphClockCmd( RCC_APB2Periph_GPIOA | RCC_APB2Periph_GPIOD, ENABLE );
  
}


void GPIO_Configuration( void )
{
  GPIO_InitTypeDef GPIO_InitStructure;
  
  /* GPIOA Configuration: PA6 GPIO Output -> TIM3 Channel1 in Input  */
  GPIO_InitStructure.GPIO_Pin = GPIO_Pin_6;
  GPIO_InitStructure.GPIO_Mode = GPIO_Mode_Out_PP;
  GPIO_InitStructure.GPIO_Speed = GPIO_Speed_50MHz;
  
  GPIO_Init( GPIOA, &GPIO_InitStructure );
}








STM32 GPIO Fast data transfer with DMA


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.