Design and Implementation of a real-time image processing platform based on FPGA + DSP

Source: Internet
Author: User
Design and Implementation of a real-time image processing platform based on FPGA + DSP
[Date:] Source: Electronic Technology Application Author: Lu Changhua, Shi hongyuan, Liang yinhai, Yin Jun [Font:Large Medium Small]

 

Before drug filling and production, you must inspect the medicinal glass bottles and remove nonconforming products to encapsulate the medicines. Most of the specifications of medicinal controlled glass bottles vary widely in China, such as the height of the bottle, the thickness of the bottle bottom and the wall, and the verticality of the bottle, the use of imported equipment is not very effective. Therefore, it is urgent to develop an Automatic Testing System for empty bottles by using the Pharmaceutical Control bottle detection device that meets the national conditions of China.
The system requires that the online detection speed reach 25 bottles/S. In addition, it requires detection and real-time processing of metric indicators of the control bottle, such as the bottle bottom, bottle mouth, bottle body, and bottle body size. Therefore, it puts forward high requirements on data collection, storage, transmission speed, processing speed and accuracy.
Commonly used data collection solutions often use single-chip microcomputer or DSP as the Controller to control the work of the module/number converter (ADC), memory and other peripheral circuits [1]. However, it is difficult to meet the requirements of multi-channel high-speed data collection systems due to the instruction cycle and processing speed of single-chip microcomputer. Although DSP can achieve high-speed data collection, frequent interruptions affect DSP performance and increase system costs. In addition, in real-time image processing systems, the underlying signal has a large amount of data and requires a high processing speed. However, the computing structure is relatively simple. It is suitable to use field-programmable gate array (FPGA) for hardware implementation; A high-level processing algorithm is characterized by a relatively small amount of data processed, but the algorithm, calculation formula, and control structure are much more complex, and DSP can meet the requirements.
Therefore, the design and implementation of a multi-channel high-speed data acquisition and real-time image processing system based on FPGA + DSP are introduced in this paper.
1. system hardware structure design scheme
The Multi-Channel Synchronous high-speed Collection and Processing System Structure 1 is shown in this article. The system consists of four modules: Collection, processing, display, and system control. After the multi-channel analog video signal is converted by A/D array, It is input to the processing module for image processing. After the processing result is converted by D/A, it is displayed on the terminal monitoring device. The entire process (including the collection module, processing module, and display module) is coordinated by the system control module.


2. Collection Module
The common multi-channel data acquisition solution [2] is: (1) Multiple ADC devices are used, and each analog input corresponds to one ADC. (2) one high-speed ADC device is used, which is selected by multiple switches and sent to the ADC. Generally, high-speed acquisition is achieved by using CPLD or FPGA to control various ADC or multi-channel switches. However, the above solutions all have some problems: the corresponding peripheral circuit is huge and the interface is complex; generally, the data buffer is plug-in, which reduces the transmission speed of the system, at the same time, for high-precision, multi-channel, parallel conversion of A/D systems, the number of pins connected to FPGA increases, resulting in a serious waste of FPGA and other system resources and increased costs.
The system adopts a shared bus, synchronous collection, and time-sharing reading method [3], which improves the acquisition and transmission speed of the system, it can effectively control multi-channel and high-resolution parallel A/D Synchronous Acquisition, rationally utilizes FPGA system resources, and reduces hardware costs. Bus Sharing, synchronous collection, and time-sharing reading are mainly based on the thought of the time-sharing operating system, which reads the/D Conversion Result in a round robin based on the time slice. As shown in figure 1, in the hardware design, the multi-channel A/D converter shares the CLK of the sampled clock signal, the read/write control signal ad_wr, And the chip selection signal adc_cs; a/d1, A/D3, and A/D5 share the data bus adcb14 ~ 27. A/D0, A/D2, and A/D4 share the adcb0 ~ 13; A/D0, A/d1 shared output enabling signal adc_oe0, A/D2, A/D3 shared output enabling signal adc_oe1, a/D4, A/D5 shared output enable signal ADC _
Oe2. The Multi-Channel A/D converter shares the sampling clock signal adc_clk and the chip selection signal adc_cs, ensuring the synchronization of sampling. The shared data bus Saves FPGA pins and makes reasonable use of FPGA resources, by enabling the adc_oe signal separately, the conversion results are read in time when the data is valid after the/D conversion is completed, achieving the goal of parallel collection; the two-way A/D converter of different data bus shares the enabling signal, ensuring that the two-way A/D conversion results are read in parallel within the same time segment.
Select the ADC as follows:
The system requires that the online detection speed reach 25 bottles/s, that is, the detection time of each bottle is 40 ms. In addition, the accuracy of the detection of Medicinal Control bottles is an important factor. This requires high ADC conversion accuracy and time.
In this paper, the/D conversion chip uses TI's ads8364 chip. It is a 16-bit A/D conversion chip designed for high-speed, low-power, and six-channel synchronous sampling, there are a total of 64 pins, which are suitable for environments with relatively high noise. The maximum sampling rate is KS/S. Each input end has an ADC holding signal, which is used to ensure simultaneous sampling and conversion of several channels, the A/D conversion can be performed for single polarity or bipolar input voltages. The conversion of three holding signals (Holda, holdb, and holdc) can start the conversion of the specified channel. When three holding signals are selected at the same time, the conversion results will be stored in six registers. When the ads8364 chip uses a 5 MHz external clock to control the conversion, the sampling rate is 250 kHz, and the sampling and conversion can be completed within 20 clock cycles. For each read operation, the ads8364 chip outputs 16-bit data; the address/mode signal (A0, a1, a2) determines how to read data from the ads8364 chip. You can select single channel, periodic or FIFO mode; the conversion starts when the holdx of the ads8364 chip remains at least 20 NS low. This low level enables the sampling and holding amplifiers of each channel to be kept at the same time so that each channel starts conversion at the same time. When the conversion result is stored in the output register, the EOC output of the pin remains at a low level for half a clock cycle. The ads8364 chip uses a working voltage of + 5 V, fully Differential input channels with 80 dB common mode rejection, six 4 Ls continuously approaching analog-to-digital converters, and six differential sampling amplifiers. In addition, the refin and refout pins also contain the + 2.5v reference voltage and high-speed parallel interfaces. The difference input of the ads8364 chip can be in-vref ~ + Changes between vref. The analog input signal is input into the ads8364 chip in a differential op-amp at the signal input end to effectively reduce common mode noise and achieve high effective acquisition accuracy. Data can be read to the parallel output bus by setting/RD and/CS as low.
The conversion process of the ads8364 chip is as follows: when the/holdx of the ads8364 chip remains at least 20 NS low, the conversion starts. After the conversion result is saved to the output register, the output of the pin/eoc will remain low for half a clock cycle, prompting the data analysis processor to receive the conversion result, the processor reads data through the parallel output bus by setting/RD and/CS to a low level. In the process of receiving the conversion data, the timing of the operation of each pin of the ads8364 chip is very important.
3 Implementation of FPGA Logic Control
FPGA is the core logic control of the entire collection, processing, and display system, it mainly includes A/D array acquisition control, data storage and transmission control, image preprocessing, synchronous timing generation and control, image display control, and EMIF bus interface logic.
According to the above control requirements, the system uses the ACEX1K series EP1K50 chip of Altera. The EP1K50 chip is an FPGA Chip Suitable for complex logic and storage and buffering functions. It can work at a maximum frequency of 250 MHz. The series of chips are highly efficient and cost-effective, featuring the combination of LUT (search table) and EAB (embedded array. LUT-based logic provides optimized performance and efficiency for data path management, register strength, mathematical computing, or digital signal processing. While EBA can implement RAM, ROM, dual-port RAM or FIFO (first-in-first-out memory) functions.
3.1 A/D Control[3]
Through the analysis of A/D control, we can know that the read conversion result is reliable and stable in the half clock cycle with the sampling clock CLK being high. Due to the requirements of chip selection, address establishment time, and output activation time, three A/D converters can be controlled in the shared bus mode within half A cycle of A 5 MHz clock signal. Therefore, the six-way parallel data collection can be completed through two-way bus. Figure 2 shows the control sequence of the three-way A/D converter shared data bus, which is completed using the Quartus Ⅱ simulation tool. Among them, ADC_OE1, ADC_OE2, and ADC_OE3 are three A/D output enable signals. The time-sharing effective method is used to read the/D conversion results, and the length of each time slice is 30ns; the ADC_clk is A/D sampling clock, the In_clk is an external clock, and the Main_clk is output by the PLL as the main clock of the system. The clock cycle is 10ns, And the ADC_cs is A/D signal, it takes some time to establish the signal. To achieve Parallel Multi-Channel A/D sampling, the six-channel A/D chip selection signals are connected together, which is always effective; Reset is the FPGA Reset signal.


The image signal from the CCD sensor is converted by the ADS8364 chip. The conversion result is sent to the FPGA together with the separated line synchronous signal, field synchronous signal, and parity field signal. The infrared photoelectric sensor signal is also sent to FPGA, and is used together with the synchronous signal as the basis for System acquisition and logic control.
3.2 data storage and transmission control
The Pharmaceutical Control bottle detection system has high requirements on precision and speed. In order to enable the system to achieve high-speed data collection and real-time data processing, that is, the collection and processing run in parallel, A cache device must be added between A/D and DSP. Generally, dual-port memory or dual-addressing memory is used as the buffer device [4]. Although the dual-port RAM is easy to design, it is expensive. The dual-addressing method has high requirements on hardware design. Therefore, the system uses the design of embedded buffer memory in FPGA. Considering that FIFO has a faster read/write speed, and because the sampling/Write FIFO speed is inconsistent with the DSP's read FIFO speed, asynchronous FIFO is used as a slow memory.
Asynchronous FIFO memory has the following features: There are two ports used for read/write access, the read/write speed can be different, read/write operations can be performed at the same time without synchronization; Data writing and reading follow the principle of first-in-first-out, the read/write order is determined. The read/write address is completely determined by the address pointer in the FIFO, and no external address is required. The EMIF of DSP provides the ability to seamlessly interface the FIFO, making it easier to implement the data transmission support circuit in DMA mode.
The system designs two embedded buffer memory in FPGA to collect FIFO and display FIFO, making full use of the EMIF data transmission bandwidth to buffer the image data streams collected and displayed respectively. The acquisition cards of most visual processing systems use extended large-capacity FIFO, or a large number of SRAM and extended SDRAM as the frame storage Acquisition Scheme, but this reduces the transmission speed of the system, it also increases hardware costs. The design of a single FIFO collection and display solution reflects the advantages of the system. Tests show that the system can collect and display consecutive and Real-Time Images in FIFO mode and in FIFO mode.
The basic flow of data in the system: multi-channel analog image signal input to A/D array, after FPGA controls the ADC in A/D array and converts it into A 16-bit digital image stream conforming to the ITU-RBT601 standard, the smooth and de-noise preprocessing is realized by FPGA hardware, access the acquisition line in FPGA for data buffering in FIFO, and then the FIFO signals such as HF are used as the signal to start the DMA interruption in the DSP, and request the DSP to take the data away, and generate the interrupt signal to request the DSP to take away the data. Then the data is written to the frame memory (SDRAM) through the EMIF interface, and the DSP processes the data accordingly, and the processed data is still put into the SDRAM. On the other hand, the main control module in FPGA generates a line interruption signal generated by the display logic. After the DSP responds to the interruption, the DMA controller writes the data to the display line FIFO with a 32-bit width, under the control of the Display Synchronization sequence, the display row FIFO is output to the display interface, converted to an 8-bit digital image signal conforming to the ITU-RBT standard, and finally sent to the decoder for decoding and display.
4. DSP-based Image Processing Module
The DSP-based image processing module is the core of the real-time image processing system. The module mainly includes DSP devices, SDRAM image frame memory, and Flash program memory. In addition, necessary power control, JTAG port, Reset Control, clock system, etc.
The DSP chip TMS3206201 is widely used in real-time image processing because of its high-speed processing performance and rich in-chip resources. TMS3206201 is a high-speed fixed-point digital processing chip based on the TMS320C6X series. Its clock speed is 200 MHz and its peak performance can reach 2 400 MOPS. The structure of the tms320201 chip determines its suitability for real-time image processing. The main features are [4]: (1) the CPU core consists of 32-bit General registers and eight functional units, data transmission between multiple processing units relies on 32 32-bit General registers. (2) modified the Harvard bus structure. The TMS3206201 chip has a 256-bit program bus, two 32-bit data buses and a 32-bit DMA dedicated bus. The flexible bus structure relieves the data transmission bottleneck and limits the system performance. (3) Dedicated addressing unit. The IP address generation does not occupy the CPU any more. (4) 64 kB program memory and 64 KB data memory are integrated internally. If images are stored in the memory, the CPU can read data and process data more quickly.
The system not only realizes the image acquisition function, but also the image display function. Therefore, it has high requirements on data processing and transmission speed. The DMA transmission of DSP can be carried out intermittently, so that the DSP can have time to execute data processing and other tasks, thus improving the system performance. The C6201DMA controller of the DSP chip has four independent and programmable transmission channels that allow DMA operations on four different contents. One auxiliary DMA channel is responsible for communicating with the host, each DMA channel can transmit data in the ing space without the CPU. data transmission can be performed between on-chip memory, on-chip peripherals, or external devices.
To ensure continuous acquisition and continuous display of images, three frame storage zones are set in the expanded SDRAM of DSP; the DMA channel is used to transmit DMA data between the first-in-row and the second-out-in-row. Figure 3 shows the frame storage scheduling and DMA event link mechanism. The source address remains the same between the first-in-first-out (FIFO) and SDRAM of the collection row. The target address index is added with 1. The channel DMA0 is used for DMA transmission. After a frame of image data is full, the DMA event link mechanism is used, enable the channel DMA0 to reload the value of The Link parameter register of event B1, and start to receive a new frame of image data from the first-in-first-out, and store the data in the frame 2 of the SDRAM; after the storage is full, reload the value of The Link parameter register of event C1, receive the third frame of image data to frame 3, and reload the value of The Link parameter register of event A1 again. This loop allows continuous image data collection. Similarly, when the source address index is added between the FIFO and SDRAM, the target address remains unchanged, and the channel DMA1 performs DMA transmission from frame 1, frame 2, and frame 3 of the SDRAM, when an interruption event is triggered, a row of data is read each time and written to the display row FIFO in FPGA. Using the DMA event Link Mechanism, after a frame is transmitted, the Link parameter register of channel DMA1 automatically reloads the link parameters in event A2, event B2, and event C2, achieve continuous transmission of display data to achieve continuous display.


After practical verification, the system basically meets the precision and speed requirements of the Pharmaceutical Control bottle Detection System and achieves good results. In the application process, it is found that the system needs to be further studied: DSP programming needs to consider the system software and hardware resources, and should have some functions of the real-time operating system. Therefore, algorithm programming requires great skills to further improve system performance.
In addition, the hardware design has the following problems and solutions: (1) the aperture jitter introduced by the/D sampling circuit reference clock has an impact on the system, therefore, we consider using ecl or pecl Gate Circuits with lower aperture Jitter to reduce the aperture jitter problem. (2) because the FPGA interconnection is distributed, the hardware transmission delay is related to the system layout, which may produce certain burrs and lead to many harmful spikes. Therefore, it is very important to consider harmful sharp pulses. A simple method is to add D latches. (3) noise deteriorates the image quality, and makes the original uniform and continuous gray scale suddenly increase or decrease, forming some false object edges or outlines, resulting in blurred and drowned features of the image, it makes image analysis difficult. It can be done through image preprocessing. As long as the image noise is not too serious, the image quality can be improved by means of smooth and de-noise.
References
[1] Shen lanyun. Principles and Applications of High-speed data collection systems [M]. Beijing: People's post and telecommunications Press, 1995.
[2] Zhang guiqing, Zhu lei. FPGA-based real-time multi-channel synchronous data acquisition solution design and implementation [J]. measurement and control technology, (12 ).
[3] Zhang Dongsheng, Zhang Donglai. Design and Implementation of FPGA-based high-speed acquisition system [J]. electronic technology application, (5 ).
[4] Li fanghui, Wang Fei. Principle and Application of TMS320C6000 series DSPs [M]. Beijing: Electronics Industry Press, 2003.
[5] Texas Instruments. TMS320C6000 Imaging Developer's Kit (IDK) User's Guide [R/OL]. 2004.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.