FPGA-based FFT Processor Design
[Date: 2008-10-23] |
Source: foreign electronic components by Yang Xing, Xie Zhiyuan, and Rong Li |
[Font:Large Medium Small] |
1 Introduction
With the rapid development of digital technology, digital signal processing has penetrated into various disciplines. In digital signal processing, many algorithms, such as correlation, filtering, spectral estimation, and convolution, can be implemented by converting to discrete Fourier transform (DFT, this provides a transformation tool for discrete signal analysis theoretically. However, it is difficult to implement due to the large amount of DFT computing. The proposal of Fast Fourier (FFT) greatly reduces the computational workload, fundamentally changes the position of Fourier transformation, and becomes one of the core technologies in digital signal processing, it is widely used in fields such as radar, observation, tracking, high-speed image processing, secure wireless communication, and digital communication.
At present, the hardware implementation of FFT algorithms mainly include: General Digital Signal Processor (DSP), FFT dedicated devices and field programmable gate array (FPGA ). The DSP is flexible in software-only implementation and is suitable for complicated process algorithms, such as channel compilation codes and QAM ing algorithms in communication systems. It takes a lot of time for the DSP to complete the FFT operation, reducing the data throughput of the entire system, and making full use of the flexibility of the DSP software. Using FFT dedicated devices, although the speed can meet the requirements. However, its peripheral circuits are complex, have poor scalability, and are expensive. With the development of FPGA, it has rich resources and is easy to organize the flow and parallel structure. It combines the real-time requirements of FFT with the flexibility of FPGA device design to realize the optimal configuration of parallel algorithms and hardware structures, it not only improves processing speed, but also provides high flexibility. Features of low development cost, short development cycle, and simple upgrade. For the actual needs of FFT operation in an OFDM system, an FPGA-based design is proposed to implement the FFT algorithm. The 16-bit long data, 64-point FFT, is used as an example, the Quartus II software is integrated and simulated.
2. FFT principle and algorithm structure
FFT is a fast algorithm for Discrete Fourier Transform (DFT. For the finite-time question sequence x (N) of the n-point discretization, its Fourier transformation is:
To complete the DFT of N points, N2 complex multiplication and N (N-1) complex addition are required. When the number of points is large, the amount of computing is large, so it is difficult to implement real-time signal processing. The basic idea of FFT is to use the periodicity, symmetry, particularity of the rotation factor WN, as well as the interchangeable nature of the cycle N, and divide the series DFT Operation with the length of N points into the short series DFT Operation one by one, merge the same items, greatly reducing the calculation workload.
There are two types of FFT algorithms: one is an integer power algorithm for N = 2, such as a base 2 algorithm, a base 4 algorithm, a real-factor algorithm, and a splitting algorithm: another type is the integer power algorithm of N = 2, represented by winograd. In hardware implementation, we must consider not only the size of the algorithm's computing workload, but also the complexity and modularity of the algorithm. The algorithm with simple control and regular implementation is superior to the algorithm that only reduces the computational workload in the hardware system. The FPGA design scheme of existing FFT algorithms is basically aimed at the first type of algorithms. However, although the second type of algorithms has important theoretical value, hardware is not easy to implement. Because the number of design points is not too large, the area and cost of the FFT processor are considered comprehensively. Therefore, the base 2 fast Fourier algorithm (based on 2DIT-FFT) extracted by time is used ).
For a sequence x (N) with a length of n = 2 m, where m is an integer, x (n) is divided into two groups based on parity, that is: n = 2r and n = 2r + 1, r = ,..., N/2-1, so:
Therefore, A (k) and B (k) can fully represent X (k ). And so on, which can be traced back to the two-point FFT, so that the entire N-point FFT algorithm is decomposed into log 2N operations, each with N/2 base 2 dish operations. Figure 1 is a flow chart of DIT-FFT operations with N = 8.
3 structure design of FFT Processor
The design solutions implemented by FFT include sequential processing, cascading processing, parallel processing, and array processing. Sequential processing uses only one butterfly unit for each operation. The processing method is simple and the operation speed is slow. Cascade, parallel, and array processing are faster, but they occupy a large amount of resources. Considering that the number of computing points in this design is small, an improved sequential processing scheme is adopted to control data transmission during the FFT processing process based on the original sequential processing. This structure inherits the advantages of simple sequence processing circuit, less resource occupation, and faster cascade processing. The top-down method is used to modularize the processor, as shown in figure 2.
4 module design and Integrated Simulation
The entire FFT processor is composed of memory, butterfly operation units, rotation factor units, control units, and data control units. Each unit operates through the control and enabling signals generated by the control unit.
4.1 butterfly operation unit
The butterfly operating unit is an important part of the entire FFT processing unit, which directly affects the performance of the entire FFT unit. As shown in figure 3 of the butterfly signal extracted from base 2 time, p and q are the data sequence numbers. xm (p) and xm (q) are the input of the m-level butterfly operation, xm + 1 (p) and xm + 1 (q) are outputs of the butterfly operation, and WrN is the corresponding rotation factor.
It can be seen from the above formula that a base-2-butterfly operation requires one compound multiplication and two complex addition. In order to increase the computing speed, parallel operations are adopted. Four Real-number multiplier, three real-number processors, and three real-number reducers are used. Set input data: x1 = x1_r + jx1_im, x2 = 2_r + jx2_im, and the rotation factor is WrN = c-jd. Then, the output values are y1 = y1_r + jy1_im and y2 = y2_r + jy2_im. The implementation of Butterfly-type computation unit 4 is shown.
Select the 16-bit binary complement at the specified point for the data format. The multiplier speed must be considered during the design, which will directly affect the operation speed of the entire FFT processing unit. The designed multiplier is generated using the macro Unit provided in the Quartus Ⅱ development software. The two input values of the multiplier are 16 bits and the output value is 32 bits. Because the multiplier contains a rotation factor. therefore, after multiplication, we should not change the input amplitude, that is, the output of the multiplier is still 16 bits. Therefore, we need to intercept the output data and intercept the 16 bits as the input of the addition (subtraction) operator.
4.2 storage unit
Memory is an essential unit in the FFT processing unit. The input and output of butterfly operation data and the storage of intermediate results must go through the memory, therefore, their frequent read/write operations have a great impact on the overall FFT processing speed. In Figure 2, memory A and Memory B are composed of RAM and state machines with data bus, address bus, and trigger clock respectively. Memory A receives external input data, and memory B is the intermediate result Unit. In addition to the first butterfly operation, the input and output of each level of data go through the memory. Adding two data controllers to the two memory and butterfly computing modules can read the next set of butterfly computing data while writing the previous set of intermediate results, thus improving the processing speed of FFT.
4.3 rotation factor Unit
The rotation factor unit is used to store the rotation factor WRN = exp (-J2 π R/n) required by the FFT operation ). In Matlab, the rotation factors are divided into real and virtual parts. Since they are smaller than 1 decimal points, they need to be fixed in the design. The process is to increase the rotation factor by 214 times. The integer part is converted to a 16-Bit fixed point and saved in. HEX file format. It is designed using the megawizard tool of Quartus ⅱ. And the. HEX file. Based on the symmetry and periodicity of the rotation factor, when using ROM to store the rotation factor, you can only store a part of the rotation factor table and query the rotation factor required for each butterfly operation by changing the address.
4.4 Control Unit
The control unit is used to coordinate and drive various modules and plays a key role in FFT operations. Memory A, the Read and Write signals of the rotation factor unit and data controller, and memory B are all generated by the control unit. The control unit is implemented through a finite state machine (FSM) and uses two internal counters to control the flip of the state machine. The control unit has a separate input clock and can generate corresponding control signals.
4.5 Integrated Simulation
The company uses the Altera Quartus ⅱ software as the development platform and the ep1s25 FPGA in the Stratix series as the core device. It adopts the white-top-down design idea and the VHDL language, design, Synthesis, and simulation of each module unit. To simplify the design, only a set of 64 plural numbers are input in the data input clock, and the remaining input is set to 0, and the real and virtual parts are limited to ± L, ± 2, ± 3, within ± 4, E5. To prevent overflow, multiply the input data by a certain percentage factor of 2-9, and then multiply the value by 2 15 to a hexadecimal number. The output result 5 is shown. Note that the simulation result is multiplied by 2-6. Comparing the simulation results with the MATLAB calculation results, the data is basically the same, indicating that the design is correct. The error mainly comes from the data truncation and the approximation of the rotation factor.
5 conclusion
The FFT algorithm is an important operation in digital signal processing. It is widely used in the fields of radar, observation, tracking, high-speed image processing, secure wireless communication, and digital communication. The design scheme of a FPGA-based 64-point FFT processor is discussed here. The real and virtual parts of the input data are expressed in 16-bit binary numbers, and the 2dit-fft algorithm is used, the module of each processor is designed based on the Altera us ⅱ software of Altera. The ep1s25 FPGA in the Stratix series is integrated and simulated, and the calculation result is correct. FPGA-based FFT algorithms are superior in terms of volume, speed, and flexibility.