Real-time H.264 Video Encoder with Low Bit Rate Based on NiosII

Source: Internet
Author: User
Real-time H.264 Video Encoder with Low Bit Rate Based on niosii time: 264 11:55:24 Source: MCU and embedded system Author: Yang Chao, Chongqing University, Zhang Ling he wei

Introduction

H. as a new-generation video coding standard, the 264 Standard is a multi-Bit Rate video encoding standard, also known as the JVT/AVC standard. It can be used in high-Bit Rate HDTV and digital storage systems, it can also be used for low code
Rate of real-time communication system. In the same image quality, H.264 can save 264 to 20% compared with H.263 and MPEG 4 ~ 50% bit rate. In terms of its basic level, the complexity of the encoder is
About 10 times of H.263. H.264's excellent network affinity and excellent compression performance make it the first choice for video applications, but its huge computing workload has become a bottleneck for many applications. Based on
Niosi designs a coding system for real-time low bit rate applications. The system makes full use of the parallel design structure of FPGA and adopts H.264 standard encoding with high compression ratio for video data, which can meet the requirements of low
Bit Rate Real-time Encoding requirements.


1 H.264 coding System Structure Design

According to the principle and structure of H.264/AVC encoder, and considering the limitations of existing hardware resources and the application requirements of the design, the H.264/AVC coding system structure shown in Figure 1 is designed.


The video image captured by the camera is first processed by the video acquisition module, and the image data of the current frame is stored in the SRAM. Then, read the Macro Block MB (macroblock) from the SRAM.
The original image is read from the reference pixels in the Reconstruction frame based on the position in the MB image frame for intra-frame prediction, the prediction macro block is different from the former macro block pixel to get the prediction residual. Next, add the residual image
Line integer DCT transformation or had-amard transformation, and quantize the transformation output. On the one hand, the residual images output by quantization generate reconstruction images through reverse transformation and reverse quantization for intra-frame prediction as a reference.
On the one hand, after re-sorting and entropy encoding, the final output of the image compression code stream is obtained.
According to H.264/AVC standard, the entire coding system is divided into image acquisition, intra-frame prediction, transform quantization, entropy coding, and so on. Each module is processed through a pipeline, which can effectively improve the hardware execution efficiency.

2 Implementation of H.264 Encoder Based on Cyclone II FPGA

The system adopts the system-based system-embedded system-consists of video capture module and nioⅱ processor system. Using the DE2 Development Board of Altera as the development platform,
The NiosII processor is integrated into a system. Among them, the Niosi processor system is responsible for image acquisition control, image H.264 compression and coding. To ensure real-time performance
After analyzing the runtime of H.264 Software Algorithm, hardware acceleration is performed on the key algorithms of H.264 encoder using custom modules.
2.1 video capture module
Video Acquisition is the prerequisite for video image processing and transmission. The quality of the collected digital video image directly affects the video processing result. Figure 2 shows the video collection structure of the image processing system.


ADV7181B, a multi-standard video decoding chip of ADI, performs analog-to-digital conversion on the collected video images. The ADV7181B can automatically detect the base types such as NTSC, PAL, and SEC0M.
A digital video signal with a 16/8-bit compatible CCIR601/CCIR656 format based on sampling. It has six analog video input ports and uses a single
A 27 MHz crystal clock input. You can configure the operating mode of ADV7181B through the two-wire I2C interface.

When the system powers on, the I2C module is used to configure the internal registers of ADV7181B. Because the camera outputs a PAL analog video signal
Configure ADV7181B as a PAL analog video signal input and convert it to a digital video signal in the CCIR656 format. ADV7181B converts the real-time digital video image
The Brightness Signal, color signal (TD_DAT), line and field synchronization signal (TD_HS/VS) are simultaneously input to the FPGA chip. The image acquisition module extracts the required Digital Image Information and
Transfer to the SRAM with a storage capacity of 512 KB provided by the AlteraDE2 Development Board to cache the frames to be processed.
The following describes the design and implementation of the image acquisition module.
Based on the analysis of the hardware structure of the video capture module, the architecture of the video capture module shown in Figure 3 is designed. It can be seen that the image acquisition module mainly includes image extraction, color sampling rate conversion, Y/Cb/Cr image component separation, and image cache SRAM read/write control.


Under the control of video acquisition control information of H.264/AVC coding module, the image extraction sub-module extracts the required image data from the paled digital video image output by ADV7181B conversion.
The actual image size collected by the camera is 768x576 pixels, and the input signal of the video is scanned through the line. The base field and the even field are input successively in time. The image size processed by the system is 320x240.
To meet the processing requirements of the system.
Considering that there is little difference between the top field and the bottom field data in an image, when capturing an image, we only extract 240 consecutive adjacent pixels in the middle of the bottom field, to output 320x240 pixels of video image data. The specific extraction process is shown in step 4.

H.264/AVC supports line-by-line or line-by-line scanning of digital images in the 4: 2: O format. Therefore, the color sampling rate of extracted digital images must be transformed. By performing a simple average of the color image components of the adjacent odd and even lines, you can change the color sampling rate from 4: 4 to 4: 2: O, as shown in Figure 5.

The image data after sampling rate transformation needs to be cached in different regions of the SRAM according to the Y/Cb/Cr image type, so as to facilitate subsequent processing of the coding of the hydrogen 64. Figure 6 shows the effect of the color component of the actual image before and after the sampling rate conversion.

Core module of 2.2 H.264 Encoder
Considering the existing hardware resources, real-time performance, implementation difficulty, and other factors, the design only adopts the intra-frame prediction method, encoding
It includes the intra-Frame Prediction module, the transformation quantization module, and the CAVLC entropy encoding module. During processing, the macro block (16 × 16) is used as the unit, and the brightness and color block are used to perform intra-frame prediction, transformation quantization, and reverse transformation inverse quantization, respectively,
Then perform the CAVLC entropy encoding. The brightness ratio of the image is Y: U: V = 4: 2: O.

H.264 encoder is first implemented on PC with VC ++ in the early stage of design, and then transplanted to FPGA with custom hardware modules. The time required for the two is shown in table 1. We can see that hardware is used for implementation.
H.264 compression coding a frame of image only requires about 16 MS, compared with the PC implementation has greatly improved, and the hardware module occupies less than 264 of the resources, high cost performance.

Since the hardware module of the User-Defined intra-Frame Prediction module greatly improves the system performance compared with the software, the hardware structure design of the intra-Frame Prediction module is analyzed here.

According to the H.264 intra-frame prediction algorithm, the intra-Frame Prediction module is designed in the non-rate distortion optimization mode. It reads a MB (16 × 16) brightness and color image number from the SDRAM through the interface module.
In the brightness and color prediction module, the current MB is predicted and the prediction mode is selected, and the Prediction residual and the best prediction mode are output; at the same time, the residual difference between the prediction result and the result after inverse DCT transformation and inverse quantification
The reconstruction module compensates for the reconstruction, and then writes it back to the SDRAM. As shown in main structure 7, the entire module is divided into four sub-modules: interface module, brightness prediction, color prediction and Image Reconstruction module.


In the interface module, four RAM are designed to store the read original image and reference image data for prediction: RAM0 to store the brightness prediction pixel, depth 32, address 0 ~ 15. Store the upstream prediction reference image
Prime, address 16 ~ 31. The left-side prediction reference pixels are stored. RAMl stores the original brightness value of the macro block at a depth of 256. RAM2 stores the color prediction reference pixels at a depth of 32. The address ranges from 0 ~ 7 storage on the upper side
Cb prediction reference pixel, address 8 ~ 15. The left-side Cb prediction reference pixels are stored. The address ranges from 16 to 16 ~ 23. Store reference pixels for cr prediction on the upper side. The address ranges from 24 to 24 ~ 31. left-side Cr prediction reference pixels are stored. RAM3
The original color value of the macro block, with the depth of 128.
The internal structure of the brightness prediction module is shown in figure 8.


① The mode selection module uses the pre-Macro Block prediction reference pixel available information (avail) to specify the pre-Macro Block for prediction in a certain order, for example, avail = "11" indicates that both the top-side and left-side prediction reference pixels are acceptable.
Then, DC, HOR, VERT, and PLANE are used to predict the order of the current macro blocks. In the residual processing module, two RAM sequences are used to save the prediction residual values of various prediction modes.
The model selection module compares the cost function of the current prediction mode with the cost function of the previous prediction mode. If the cost function of the current prediction mode is small, it indicates that the current prediction mode is better.
During the test, the prediction residual is specified to be saved in the residual RAM of the last poor prediction mode. When the available prediction modes of the front macro block are all predicted, the model selection module determines the Optimal Prediction Based on the cost prediction function of each mode.
Mode, and indicates that the prediction mode corresponds to the RAM stored in the residual processing module, and the corresponding residual is input to the integer transformation module.
② The prediction module contains the implementation entities of the DC, HOR, VERT, and PLANE prediction modes. Based on the prediction modes determined by the mode selection module, the module reads the prediction reference pixels and original pixel values from the interface module, the residual value is output to the residual processing module after prediction, and the predicted value is output to the compensation reconstruction module for storage.
③ The residual processing module uses two RAM blocks for storing the residual data. Each macro block can be used for two predictions in parallel. The residual data is saved to two RAM blocks respectively, and a better prediction mode is selected, compare the next prediction mode with the previous one until the optimal prediction mode is selected.
④ The prediction cost module calculates the predicted cost of each prediction model, and uses 4 × 4 blocks as the unit for hadamard transformation, after the transformation, each 4 × 4 DC coefficients are converted to hadamard again. The absolute value of all the transformation results is the corresponding prediction cost.

The structure of the color prediction module is basically the same as that of the brightness prediction module. However, because the color is divided into Cb and Cr components, the storage method of residual data in RAM is slightly different; color prediction and brightness prediction of the same macro block are executed in parallel.
Because the color data to be processed is less than half the brightness, the author uses the method of first processing the color in the subsequent integer transformation and then processing the brightness, making the flow more compact and reducing the waiting time, improve the operation of the entire module
Speed.

3 conclusion

The real-time H.264 Video Encoding System Based on NiosII is designed to compress a 264X100 color image at a clock frequency of 320 MHz.
MS, the image compression ratio reaches 2% when the quantification parameter is set to 30, and the frame rate of the image is monitored at 25 frames/s in real time. The system features low resource usage, low cost, low bit rate, and high-definition video quality.
Development prospects.
Figure 9 shows the resource usage of the system after comprehensive simulation in the integrated development environment.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.