Most common video signals must be preprocessed Before encoding using the video compression codecs. This requires that the data be in planarformat for higher processing performance. For example, broadcast standards such as NTSC and PAL may need to convert the interlaced format to a progressive scan. In addition, you often need to format the color and brightness information. In particular, the video of the CCD camera is captured in a 4: 2 staggered line scan format. However, the specific specifications of the video compression standard only accept line-by-line scanning of format input. In this case, the line-by-line scanning pseudo signal must be removed, because it may be quite difficult for the line-by-line scanning encoder to process the line-by-line scanning video content. There are a large number of complex de-interlacing algorithms available for engineers, but not all applications require the highest level of video quality. In addition, complex algorithms often require a large amount of computing, and developers are always limited by the MIPs budget of Digital Signal Processor (DSP. When an application does not require the highest level of video quality, you can use a Scaling Algorithm in the hardware to implement de-interlacing. This technology can transform the format from to and detach the interface to other hardware, which is particularly helpful for saving valuable dspmips resources. Surprisingly, when video compression is taken into account, the zooming hardware sometimes gets the same quality as the high-complexity deduplication algorithm. The simple method described in this article can be used to deinterlace video applications. This technology is most effective when there are a large number of actions in video data frames, because static images tend to highlight defects. Brightness and color Encoding NTSC defines the resolution of standard definition (ntsc sd) as 720 pixels per line, 480 pixels per column, and 30 frames per second. The information of each pixel contains three components: Y is the luma information, CB (u) is the blue information, and Cr (v) is the red information. In the past, when NTSC was adopted, engineers were limited in Video Stream encoding by transmission bandwidth and computing capability. Because the human eye is more sensitive to brightness information, NTSC only requires a 2-1 horizontal sampling of the color information, thus reducing this burden. Each frame captured by the CCD camera has a 720x480 y value, a 360x480 U value, and a 360x480 V value. Each value is 8 bits (1 byte) in the range of [0,255]. In this way, each ntsc sd frame is (720 + 360 + 360) × 480 = 691,200 bytes. The Y/u/V component of the captured frame is generally used for line scanning, usually in the format of YUV. There are two ways to construct the data, but for the sake of simplicity, we assume that the data is in the format of uyvy to 2 (figure 1 ). As mentioned above, most encoders require that the input video be In the YUV format. There are two major differences between scanning data at and plane data. In the format, the color information needs to be further vertically sampled. That is to say, for each ntscsd frame, each U or V component contains 360 × 240 bytes instead of 360 × 480 bytes. In this way, each ntscsd frame in the format of is 518,400 bytes [(720 × 480) + (360 × 240 × 2)]. To balance real-time performance and qualified image quality, additional color downsampling is required. The effective implementation of video compression standards also often requires the storage of brightness and color components separately, because the encoding algorithms may use different methods to process them. Figure 2 shows the ntsc sd video frame in the 4-plane format. Line-by-line scanning of pseudo Images The interlace scan includes two scans of the image. One scan captures an even number of rows, and the other scan captures an odd number of rows. The two captures are separated by hours and then merged to form a complete frame. When the two parts are merged, the pseudo signal of the line scan may be formed. For example, the vertical edge of the rectangle frame will result in a sawtooth effect (see the last frame in figure 3 ). This pseudo signal generated by capturing a moving video target at different times is called the barrier pseudo image. For NTSC standard, if a video frame is captured at 30 frames per second, the start time between two consecutive lenses (that is, the top field and the complementary bottom field) is 16.67 Ms. If you capture fast motion behavior in a video scenario in such frames, a line-by-line scan pseudo image will be generated. Because these pseudo signals are expressed as high-frequency noise, they may cause serious problems for the line-by-line video encoder, mainly because of human eye sensitivity and compression standard working methods. In fact, all video compression standards are based on two very important assumptions: 1. the human eye is more sensitive to low-frequency information, which means that even if some high-frequency information in the original frame is removed, acceptable visual quality can still be maintained. 2. The encoding process is based on Pixel blocks. This means that each 16x16 or 8x8 pixel block in a video frame may have very similar modules in adjacent frames. Therefore, the implementation of encoding is usually to find a similar pixel block in the previous Encoding Frame, and only encode the delta between them. In this way, a high compression ratio can be obtained, and in most compression standards, the motion evaluation (me) module is specifically defined for this purpose. Unfortunately, almost all pixel blocks may have a line-by-line scanning pseudo image, which makes it difficult for the me module to find similar pixel blocks in the previous encoding frames. The result is that Delta is larger, and me needs to use more bits to encode it. Therefore, the best way is to reduce or remove the line-by-line scanning pseudo signal before the captured frames are fed into the line-by-line video encoder. Process the video in the same line As mentioned above, a large number of complex algorithms can be used to achieve high-quality data deduplication. In addition, there is a more direct and simple method, that is, using the zoom hardware, such as Texas Instrument's TMS320DM6446 Digital Media processor. The zooming hardware can completely remove all field force lines. It uses the information of the remaining field to generate data loss. A 480 P30 (240 pixels, 30 frames/second) Video (pixels, 30 frames/second) is generated after all the background data of a I60 video is removed. The data is vertically scaled to produce a P30 de-interlace result. The advantage of this method is that it can eliminate all the pseudo-signals of the interlace scan by 100%, but the vertical fidelity will have a significant loss. This method can be used as a preprocessing step before row-by-row compression. This is because lossy video compression algorithms usually remove high-frequency signals (especially at low bit rates ). Therefore, based on the different content of the data source, this solution can achieve the same effect as that of complex algorithms after the compression processing is taken into account. For example, you can use a less complex depacker to convert the row-based scanning broadcast data into low bit rate data for Row-by-row scanning of mobile phone screen display. Design implementation The resizer in the TMS320DM6446 processor performs the same common functions as all zooming, but there is a slight difference. Note that the zooming module supports horizontal and vertical scaling from 1/4x to 4x, And the zooming factor is independent of the direction. In addition, all filter coefficients are programmable. For example, if you use an input frame (figure 1) in the format of, which consists of uyvy, the resolution is 720x480 pixels (ntsc sd) per frame ). In the de-interlace processing, the splitter is first notified that the input frame width is 724 pixels, rather than the actual 720 pixels. This is because the horizontal input size of the dm6446 processor must be adjusted to 720 + Delta to achieve accurate scaling. Delta is calculated by the formula in the scaling tool. Then, the zooming machine learns that the spacing is twice the actual spacing, so that it can receive the first two horizontal scan rows as one. This allows the zooming tool to scale horizontally at on even rows (in the upper left corner of Figure 4) and discard odd rows (in the upper right corner ). The input and output dimensions are set to 244 and 480 respectively in the vertical direction. Therefore, the scale-in is scaled up vertically at to insert the discarded odd rows. The scale-out device is then notified that the width of the output frame is 720 pixels, and the output spacing is 1440 [720 + (360 × 2)] bytes to form an output frame (figure 4 ). To enable the conversion from to so that the line-by-line encoder can use the data, for each input frame in the format, the zooming tool is called three times to generate a frame for removing the line. The three configuration parameters must be U, y, and V respectively. Therefore, you need to call the zoozer three times. The starting point is the input frame in the format of uyvy (ntsc sd resolution ). The output frame is defined to replace the ntscsd resolution (704x480) with the 4cif resolution (720x480 ). The 16 columns on the right of the input frame must be discarded due to the 32-byte output alignment limit of the scale. One alternative is to delete the eight columns on the right and the eight columns on the left. The first call is to extract the Y component in the input frame, and then deinterlace it. The input frame is treated as an image in the format of plane (figure 5) by the indicator scale-in. The deinterlace operation should only be applied to the Y component. In addition, the zooming tool is also instructed to perform a horizontal scaling, extract the Y component from the input frame at intervals, and perform a Vertical Scaling to insert the discarded Y component into the odd line. The second call to the zooming is to modify the U component, which requires further vertical downsampling at a ratio of 2 to 2. Since all the odd rows need to be discarded for the next sample, this will automatically generate a row-by-Row U buffer, so removing the row is not required. For vertical downgrading, the vertical input size is set to 484, and the output size is set to 240. Operations on the V component are similar to the U component. A scaling engine can be used to pre-process a video that needs to be converted from the YUV format before the video compression decoder is used for encoding. Due to some factors (for example, video codecs often remove high-frequency components), we need to consider the quality of the compressed video. However, this technology is not suitable for all applications and must be careful to ensure that output quality is acceptable to applications.
Author: zhengting he, snethil natarajan Software Application Engineer Texas Instruments Track: http://cn.fpdisplay.com/technology/Tech_Shtml/2_2007124102915701.shtml |