DSP-based data scheduling strategy for image rotation Algorithms
[Date:] |
Source: Electronic Technology Application Author: Li Linlin, Feng Yan, He Yizheng |
[Font:Large Medium Small] |
Image rotation is a widely used digital image processing technology. With the increasing application level, the demand for high-resolution and large image rotation in embedded systems is also increasing. For example, in the display and processing of high-resolution digital map images in the aviation field, because the existing display chips do not support image rotation, real-time rotation of large-scale map images is required on an embedded platform with limited resources. Using the DSP Platform is an implementation method. Two problems need to be carefully considered during implementation. One is to use a Rotation Algorithm with a small amount of computing, second, give full play to the powerful parallel computing capabilities of the DSP platform.
Currently, there are many image rotation algorithms that effectively reduce the computational workload. The rotation method based on the image linear storage structure [1] is one of them. However, on the DSP platform, limited high-speed storage resources limit the efficiency of these algorithms, and efficient data scheduling is required based on the performance structure characteristics of the algorithm and DSP platform. For the image rotation problem, data scheduling also needs to overcome the problem that the DSP data access and CPU efficiency are seriously affected due to a large number of non-continuous image pixel address access. This is the particularity of image rotation, which does not exist in other image processing technologies. This article mainly discusses how to use the resource features of the TI Company's DSP chip to schedule Large Scale Data of efficient image rotation, so as to realize real-time DSP image rotation for large images.
1 Introduction to rotation method based on linear image storage structure
Currently, most image rotations are processed based on the view-point ing. The view area refers to the display area range on the screen. The method is to calculate the coordinate address value of the pixel in the source image after rotation, and then read the pixel value corresponding to the address in the source image, finally, the read pixel value is used for interpolation to obtain the final image after rotation. In fact, due to symmetry, the rotation of an image at any angle can be decomposed into a 90 ° or 180 ° or 270 ° rotation, plus a rotation of less than 45 °.
Traditional image rotation is generally achieved through matrix multiplication:
α indicates the rotation angle.
The rotation method based on the linear image storage structure proposed by Dr. Zhang [1] and others in this paper is a theoretically efficient method. The following is a detailed description.
Because the image is linearly stored, the relative positional relationship between each pixel is determined. 1 (a), before image rotation, any pixel P (x, y), P1 (x1, y1), P2 (x2, y2), and A (xA, yA) ry is the relationship between four vertices of a rectangle. Since the rotation transformation is a linear transformation, as shown in 1 (B), after the image is rotated, the relative location relationship between each pixel does not change,
Therefore, to perform Rotation Transformation on the image, you only need to perform matrix multiplication on the pixels in the first row and the first column using formula (1). For pixels other than the first row and the first column, use formula (2) to perform simple addition and subtraction operations. This avoids matrix multiplication for each pixel of the entire image, which can be reduced by 5 ~ Six times the CPU cycle.
In addition, the bilinear interpolation method is used for interpolation of non-integer pixel addresses after rotation calculation, which can basically meet the requirements of image quality.
2. Dsp Structure Optimization of image rotation
2.1 Structure Features
As shown in figure 2, the chip is based on the C64x kernel and adopts TI's second generation advanced ultra-long instruction string structure. It can work at a clock frequency of MHz. Each instruction cycle can run 8 32-bit commands in parallel, it can reach the peak computing speed of 4 800MIPS. With 64 enhanced DMA (EDMA) channels, the DMPS can efficiently transmit one-and two-dimensional data. The two-dimensional data transmission can be used for high-speed migration of a rectangular image data block.
The storage space of DMPS is composed of two levels: on-chip and off-chip storage systems. The on-chip storage space is divided into two layers: L1 and L2. The first layer L1 is the CPU Cache, and its access speed matches the CPU speed, including the independent L1P (16 KB) and L1D (16 KB); the second layer L2 (256KB) flexible RAM/Cache allocation. The off-chip memory has a 32-bit access address and uses the EDMA controller and the EMIF external memory interface for data access. The access speed of on-chip memory and off-chip memory varies greatly.
In addition, as a dedicated chip for multimedia processing, the DSP has a dedicated interface for video data I/O, making it easy to display and output video signals.
2.2 Structure Optimization of rotation Algorithms
The purpose of optimizing the algorithm structure based on the performance and structure characteristics of the DSPs is to make full use of the efficiency of the preceding large image rotation algorithm on the DSP platform, its core idea is to rationally optimize the storage space allocation and data transmission stream, so that the CPU can continuously process image data and eliminate the waiting delay during processing.
According to the structure characteristics of DSP, the efficiency of DSP can be maximized only when the data and programs are in the On-chip memory. In the big image rotation algorithm, because the amount of image data involved is much larger than the DSP's memory capacity, data such as the source image and the final-view image must be stored in the off-chip memory. In this case, in order to ensure the high-speed processing capability of the dsp cpu, the data stream must be optimized, and the source image must be segmented and moved to the chip for processing in sequence, and try to ensure that the image data block to be processed by the CPU has been prepared in the On-chip memory in advance. Therefore, the Ping-Pong dual Buffer technology is used in the Overall Optimization Structure of the algorithm, and EDMA is used in parallel with the CPU to hide the transmission time between the image data block in and out of the chip, enables the CPU to continuously process data without waiting idle.
Ping-Pong dual-buffer is a data transmission technology that uses two data buffers at the same time. It divides the SRAM into two blocks, one for storing the source image block, the other is used to store the rotated image blocks. Each block is divided into two areas (Ping area and Pong area), which are used in turn for image block transmission and processing. The specific parallel workflow 3 is shown in.
Figure 3 Ping-Pong dual-buffer Processing Technology
As for how to design and arrange the transmitted image data blocks in the Ping-Pong dual-buffer data transmission mechanism, the characteristics of image rotation must be considered, A specific DSP data scheduling policy for rotating algorithms is designed.
3. DSP data scheduling policy of the Rotation Algorithm
The data scheduling purpose of the rotation algorithm is to enable the algorithm to block the source image data regularly according to certain rules and transmit them to the DSP in sequence. After the calculation is completed, create a view image block, and then arrange the view image blocks in the same order to form a rotated view image. The whole process requires that the image data to be imported and called up be segmented in a regular manner, and the imported source image block should contain all the pixel data required during the process of calculating the image block, in particular, it is necessary to solve the problem of access to a large number of non-continuous image pixel addresses so that the performance of dsp edma and Ping-Pong dual-buffering technology can be correctly realized.
3.1 non-consecutive pixel address access
In the Ping-Pong dual-buffer data transmission mechanism, image data blocks are transmitted in the On-chip and off-chip storage space, mainly relying on EDMA settings for two-dimensional data transmission in the background. The data transmission of EDMA in Ping-Pong dual-buffer data transmission requires that the image blocks to be transmitted have a unified rule, that is, the transmission process of image data during each rotation should not be changed due to the rotation angle.
However, different from the source image, the pixel address arrangement of the rotated image does not have the feature of continuous address change, in addition, the pixel address of the image in the view varies with the rotation angle of the source image, without a fixed pattern, the Ping-Pong dual-buffer data transmission mechanism is difficult for EDMA data transmission operations, resulting in access to a large number of non-consecutive pixel addresses in the source image block. This problem is exclusive to image rotation. If it cannot be well solved, the Ping-Pong dual-buffer data transmission mechanism will not be able to play a role, and the actual DSP execution efficiency of the rotation algorithm will not be truly improved. Therefore, the key to achieving efficient image rotation is to schedule data that meets the requirements of the image block relationship.
3.2 DSP data scheduling policy of Rotation Algorithm
The source image data block and its scheduling strategy proposed in this paper are based on the source image block coverage. The idea is to achieve block-based processing of the source image and the view image, and the range of the source image block covers the view image block, in addition, it is easy to access and address pixel data in the source image block, so that the pixel address changes in the source image block have continuous characteristics, so as to give full play to the efficiency of dsp edma, it also satisfies the regularity of Ping-Pong data flow. The data scheduling policies of the Rotation Algorithm DSP are shown in Figure 4 and figure 5 respectively. The main points are as follows (clockwise rotation of the view is used as an example ):
(1) multipart image output from the viewport
As shown in (a), the image is divided into rectangular blocks as the basic unit of each rotation operation, and the image blocks are arranged in sequence.
(2) retrieval of source image blocks
4 (B) shows that each source image block corresponds to a view image block, the size of the source image block is four times the size of the image block in the viewport. (For example, if the size of the source image block is 20x20 pixels, the size of the source image block is 40x40 pixels ), the midpoint of the top border of the source image block corresponds to the vertex in the upper left corner after the corresponding image block is rotated. This ensures that the clockwise rotation angle is between 0 ° and ~ In any case between 90 °, the source image block always overwrites the corresponding rotated image block in the viewport.
(3) Relationship between two image block vertex addresses
Set the N-th source image block to fN (x, y), and the rotated image block to f'n' (x, y ), the correspondence between the local coordinate address value of the source image block and the local coordinate address value of the image block after rotation is as follows:
Width indicates the width of the source image block.
The clockwise rotation of the view is similar to this (5 ). There are two differences:
① The midpoint of the Left Border of the source image block corresponds to the vertex in the upper left corner after the corresponding image block is rotated;
② The correspondence between the local coordinate address value of the source image block and the local coordinate address value of the vertex of the image block in the viewport should be:
Height indicates the height of the source image block.
(4) image block scheduling
Formula (3) or formula (4) calculates the vertices in the upper left corner of the regular image block to be retrieved from the source image (that is, the starting address of the source image block ), then, the two-dimensional data transmission of EDMA is used to transfer the data to the L2 SRAM in the chip. It can be seen that the source image block is no longer tilted with the rotation angle, and its internal pixel arrangement is fixed, and the pixel address has the characteristics of continuous change, therefore, the Ping-Pong dual-buffer data transmission mechanism can be used to smoothly transmit two-dimensional EDMA data.
This Rotation Algorithm DSP data scheduling strategy based on the image block coverage in the view area effectively solves the problem of a large number of non-consecutive pixel address accesses in image rotation, and reflects the idea of space time change, EDMA's efficient data transmission ensures the high-speed computing speed of the entire rotation process.
4. Experiment and Result
In this experiment, a self-developed high-resolution image processing platform is used, and the main processing chip is tms320dm2-based. The clock is 600 MHz, and the off-chip is 64 mb sdram. In this experiment, the source image is input through debugging the JTAG port. After the rotated view image is converted from the VPORT port through D/A, it is output with a vga signal. In this experiment, the image is rotated in two different sizes (400x400 pixels and 1024x768 pixels, the corresponding source image data is a digital map image in BMP format of 1024x768 pixels and 1920x1920 pixels respectively, with an ascending interval of 0.005 radian rotation angles, respectively (1) the traditional pixel point-by-point matrix multiplication method, the linear Storage Structure Method Based on images, and the linear storage structure method based on the data scheduling policy in this article are compared, calculate the average running time of each frame and convert it to the frame rate, as shown in table 1.
From the experimental results, it can be seen that the rotation algorithm based on the linear image storage structure is indeed greatly reduced in the computational workload compared with the traditional point-by-point method, thus effectively improving the rotation speed, however, it still cannot meet the real-time requirements for large image rotation. After the algorithm structure and data scheduling are optimized using the data scheduling policy proposed in this paper, the DSP execution efficiency of the algorithm is significantly improved, which can meet the real-time requirements for DSP large image rotation.
In this paper, based on the performance and structure characteristics of the DSP, the image rotation algorithm has a large number of non-continuous image pixel address access problems that seriously affect the dsp cpu efficiency, an effective DSP data scheduling strategy based on the image block coverage of the Video View is proposed. The structure flow and data scheduling of the algorithm are optimized and adjusted, A Real-time high-quality big image rotation scheme is implemented on the Ti-DSP. Experiments show that the DSP data scheduling strategy proposed in this paper is suitable for image rotation algorithms, which ensures real-time rotation of DSP large images and meets practical requirements.
References
[1] Zhang kedai, Li Zhi. Study on Rapid implementation of image rotation [J]. Journal of the College of command technology, 1999, (10)-32.
[2] Hu huizhi, Ji taicheng. Data Transmission Optimization Design of DSP Video Processing System [J]. Journal of Taizhou Vocational and Technical College, 2006, (6)-30.
[3] Danielsson p e. High-Accuracy rotation of images [J]. graphical models and image processing, 1992,54 (4): 340-344.
[4] image data transmission optimization of Zeng qingru, Bi Yan, Wang Hongxun. tms320c64x edma [J]. TV technology, 2005, (278): 66-72.
[5] principles and applications of Li fanghui, Wang Fei, And he peyun. TMS320C6000 series DSPs (2nd edition) [M]. Beijing: Electronics Industry Press, 2003.