YUV is the pixel format represented separately by the brightness parameter and the color parameter. The advantage of this separation is that it can not only avoid mutual interference, but also reduce the color sampling rate without significantly affecting the image quality. YUV is a general saying that its specific arrangement methods can be divided into many specific formats.
YUV format resolution 1 (player -- project2) enables video players in yuv420 format to open *. MP4; *. 264 files based on the card API design to play back the files. The video format is yuv420. The YUV format generally has two categories: packed format and planar format. The former stores the YUV component in the same array, usually several adjacent pixels constitute a macro pixel (macro-pixel), while the latter uses three arrays to separate the three components of YUV, it is like a three-dimensional plane. In table 2.3, yuy2 to y211 are both in the packaging format, while if09 to yvu9 are in the flat format. (Note: When introducing various formats, each YUV component carries a subscript. For example, y0, U0, and V0 indicate the yuv component of the first pixel, y1, u1, and V1 indicate the YUV component of the second pixel, and so on .) Mediasubtype_yuy2
Yuy2Format: mediasubtype_yuyv
YuyvFormat (the actual format is the same as yuy2) mediasubtype_yvyu
YvyuFormat: Package mediasubtype_uyvy
UyvyFormat: Package mediasubtype_ayuv with alpha channel.
YUVMediasubtype_y41p
Y41pFormat: Package mediasubtype_y411
Y411Format (the actual format is the same as y41p) mediasubtype_y211
Y211Mediasubtype_if09 if09 mediasubtype_iyuv iyuv mediasubtype_yv12 yv12 format mediasubtype_yvu9 yvu9 format table 2.3yuv sampling
One of the advantages of YUV is that the sampling rate of the color channel is lower than that of Channel Y without significantly reducing the visual quality. There is a notation used to describe the sampling frequency ratio of U and V to Y. This notation is called A: B: C Notation:
? |
Indicates that the color channel does not have subsampling. |
? |
: 2 horizontal subsample with no vertical subsample. For every two U samples or V samples, each scan row contains four y samples. |
? |
: 0 indicates a horizontal subsample of and a vertical subsample. |
? |
Indicates a horizontal subsample with no vertical subsample. For each U sample or V sample, each scan row contains four y samples. Compared with other formats, sampling is not commonly used. This article will not discuss it in detail. |
Figure 1 shows the sample grid used in the figure. The lighting sample is represented by a cross, and the color sample is represented by a circle.
The main form of sampling is defined in the ITU-R recommendation bt.601. Figure 2 shows the sample grid defined in this standard.
Sampling has two common variations. One form is used for MPEG-2 videos, the other for MPEG-1 as well as ITU-T recommendations H.261 and H.263. Figure 3 shows the sample Mesh Used in the MPEG-1 scheme, and figure 4 shows the sample Mesh Used in the MPEG-2 scheme.
Figure 3. YUV sample position (MPEG-1 solution)
Compared to the MPEG-1 scheme, it is easier to convert between the MPEG-2 scheme and the sample mesh defined for the and formats. Therefore, the preferred MPEG-2 scheme in windows should be considered as the default conversion scheme in the format.
Surface Definition
This section describes the 8-bit YUV format recommended for video rendering. These formats can be divided into several categories:
? |
Format, 32 bits per pixel |
? |
, 16 bits per pixel |
? |
, 16 bits per pixel |
? |
, 12 bits per pixel |
First, you should understand the following concepts to understand the following content:
? |
Surface Origin. For the YUV format described in this article, the origin (0, 0) is always in the upper left corner of the surface. |
? |
Span. The cross-distance of the surface, also known as spacing, refers to the width of the surface, expressed in bytes. For a surface with the origin in the upper left corner, the cross distance is always positive. |
? |
Alignment. The surface alignment depends on the driver of the graphic display. The surface should always be aligned with DWORD, that is to say, all rows on the surface must start from the 32-bit (DWORD) boundary. Alignment can be greater than 32 bits, but it depends on the hardware requirements. |
? |
Packaging format and plane format. The YUV format can be divided into the packaging format and the plane format. In the packaging format, the Y, U, and V components are stored in an array. Pixels are organized into several giant pixel groups. The layout of the giant pixel group depends on the format. In the plane format, Y, U, and V components are stored as three separate planes. |
Format, 32 bits per pixel
We recommend a 4: 4 format with the fourcc code ayuv. This is a packaging format. Each pixel is encoded into four consecutive bytes, and its organizational order is as follows.
The Byte marked with a contains the Alpha value.
, 16 bits per pixel
Two formats are supported. The fourcc code is as follows:
Both are packaged, and each giant pixel is encoded into two pixels of four consecutive bytes. This will multiply the color horizontal sampling by the coefficient 2.
Yuy2
In yuy2 format, data can be considered as a data without positive or negative signs.CharValue array. The first byte contains the first sample of Y, the second byte contains the first sample of U (CB), and the third byte contains the second sample of Y, the fourth byte contains the first V (CR) sample, as shown in 6.
If the image is viewed as composed of two little-EndianWordValue array, the firstWordInclude y0 in the lowest valid bits (LSB) and U in the highest valid bits (MSB. SecondWordInclude Y1 in LSB and V in MSB.
Is yuy2 used for Microsoft DirectX? The preferred format of video acceleration (DirectX VA) is. It is expected to become an interim requirement for DirectX va accelerators that support videos.
Uyvy
The format is the same as yuy2, but the byte order is the opposite-that is, the color bytes and the light byte are flipped (figure 7 ). If the image is viewed as composed of two little-EndianWordValue array, the firstWordContains U in LSB, y0 in MSB, and second in MSBWordInclude V in LSB and Y1 in MSB.
, 16 bits per pixel
We recommend two 16-bit formats at each pixel. The fourcc code is as follows:
Both fourcc codes are in flat format. In both the horizontal and vertical directions, the color channel uses the coefficient 2 for re-sampling.
Imc1
All y samples are usedCharThe array composed of values is first displayed in the memory. Followed by all V (CR) samples, and then all U (CB) samples. The V and U planes have the same distance as the y Planes, thus generating unused areas of memory as shown in figure 8.
Imc3
This format is the same as imc1, but the u and v planes are exchanged:
, 12 bits per pixel
We recommend four bits per pixel 12 bits. The fourcc code is as follows:
? |
Imc2 |
? |
Imc4 |
? |
Yv12 |
? |
Nv12 |
In all these formats, the color channels must be re-sampled with a coefficient of 2 in both the horizontal and vertical directions.
Imc2
The format is the same as that of imc1, but the V (CR) and U (CB) lines are staggered at the semi-cross border. In other words, each complete cross-distance row in the color area starts with a line V, and then a line U starts at the next half cross-distance boundary (Figure 10 ). Compared with imc1. Its color address space is reduced by half, so the overall address space is reduced by 25%. In each format, imc2 is the second preferred format, after nv12.
Imc4
This format is the same as imc2, but u (CB) and V (CR) rows are exchanged:
Yv12
All y samples are usedCharThe array composed of values is first displayed in the memory. This array is followed by all V (CR) samples. The span of the V plane is half of the span of the Y plane, and the behavior of the V plane contains half of the row. The V plane is followed by all the U (CB) examples, and its cross distance and number of rows are the same as the V plane (figure 12 ).
Nv12
All y examples are composedCharThe array composed of values is first displayed in the memory, and the number of rows is an even number. The Y plane is followed byCharValue array, which contains the packaged U (CB) and V (CR) samples, as shown in 13. When the combined U-V array is consideredWordIn an array composed of values, LSB contains the U value, and MSB contains the V value. Nv12 is the preferred 4: 2: 0 pixel format for DirectX va. It is expected to become an interim requirement for DirectX va accelerators that support videos.
YUV format Resolution 2 confirm the h264 video format-h264 supports the encoding and decoding of the YUV (also known as ycrcb) video) it is a color encoding method (PAL) used by European television systems ). YUV is mainly used to optimize the transmission of color video signals, so that it is backward compatible with vintage black and white TVs. Compared with RGB video signal transmission, RGB requires three independent video signals to be transmitted at the same time ). "Y" indicates the brightness (luminance or Luma), that is, the gray scale value, while "u" and "V" indicate the color (chrominance or chroma ), it describes the image color and saturation, and is used to specify the color of a pixel. The "brightness" is created through the RGB input signal by overlapping specific parts of the RGB signal. "Color" defines two aspects of color-tone and saturation, expressed by Cr and CB respectively. Cr reflects the difference between the red part of the GB input signal and the brightness value of the RGB signal. CB reflects the differences between the blue part of the RGB input signal and the brightness value of the RGB signal. Add the concept of field-the concept of field does not start with DV, and the TV system already exists (of course, we all know the relationship between DV and TV). In the end, it is still a scanning problem, specific to the PAL system:
25 frames per second, two scans per frame, one Scan Line (including the television's electron beam) first from top to bottom, and then the second scan from top to bottom
I understand the concept of the field mainly to make the picture more smooth and smooth at a limited bandwidth and cost.
The scanning lines of the two fields are separated from each other. For example, for a frame, the top line number is 0, followed by 1, followed by 2, 3, 4, and 5, 6 .... So the first field may be, or 6; maybe, -- this is the line scan.
In the row-by-row scan mode, scanning lines are scanned sequentially in the order of 0, 1, 2, 3, and 4. Obviously, there is no concept of field at this time. The YUV and ycbcryuv color models are derived from the RGB model. This model separates brightness and color, which is suitable for image processing. Application: simulation field y' = 0.299 * R' + 0.587 * G' + 0.114 * B 'U' =-0.147 * R'-0.289 * G' + 0.436 * B' = 0.492 * (B '-y ') V '= 0.615 * R'-0.515 * G'-0.100 * B' = 0.877 * (R'-y ') r' = y' + 1.140 * V 'G' = y'-0.394 * U'-0.581 * V 'B' = y' + 2.032 * U' YCbCr model derived from YUV Model. YCbCr is the offset version of the YUV color space. application: digital video, ITU-R bt.601 recommended y '= 0.257 * R' + 0.504 * G' + 0.098 * B' + 16cb '=-0.148 * R'-0.291 * G' + 0.439 * B' + 128cr '= 0.439 * R'-0.368 * G'-0.071 * B' + 128r' = 1.164 * (y'-16) + 1.596 * (CR '-128) G' = 1.164*(y'-16)-0.813 * (CR '-128)-0.392 * (CB'-128) B '= 1.164 * (y'-16) + 2.017 * (CB'-128) PS: Each of the above symbols carries a slash, gamma Correction is performed on the basis of the original value. Gamma Correction helps to compensate for the detailed loss caused by linear allocation of gamma values during the anti-aliasing process and enrich the image details. Without Gamma Correction, the details of the dark area are not easily displayed. After this image enhancement technique is used, the layers of the images are clearer. Therefore, YUV in h264 should belong to YCbCr.
Next, let's take a closer look at the YUV format. The YUV format generally has two categories: packed format and planar format. The former stores the YUV component in the same array, usually several adjacent pixels constitute a macro pixel (macro-pixel), while the latter uses three arrays to separate the three components of YUV, it is like a three-dimensional plane. We often say that yuv420 belongs to the planar format YUV. The color ratio is as follows: y0u0v0 Y1 y2u2v2 y3y4 Y5 y6 y7y8u8v8 y9 y10u10v10 y11y12 Y13 Y14 Y15, in the YUV file, how does yuv420 store it? In the YUV sequence of common h264 tests, for example, the YUV sequence of the CIF image size (352*288), there is no file header at the beginning of the file, directly YUV data, first, store the y information of the first frame. The length is 352*288 bytes, then the U information of the first frame is 352*288/4 bytes, and finally the V information of the first frame, the length is 352*288/4 bytes, so we can calculate the total length of the first frame of data is 352*288*1.5, that is, 152064 bytes. If this sequence is 300 frames, the total sequence length is 152064*300 = 44550kb, which is why the common 300-frame CIF sequence is always 44m.
Sampling means that the three elements y, CB, and CR have the same resolution. In this way, the three elements are sampled on each pixel. number 4 refers to the sampling rate of various elements in the horizontal direction. For example, each four brightness sampling points has four CB Cr sampling values.. All information values are fully retained during sampling. during sampling (sometimes recorded as yuy2), the color element has the same resolution as the brightness value in the vertical direction, in the horizontal mode, the unit is half the brightness resolution (indicates that two CB and Cr samples are generated for each four brightness values .) video is used to construct high-quality video color signals.
In the popular sampling formatYv12) Cb and CR have half of the Y Resolution in the horizontal and vertical directions. is somewhat different, because it does not mean to use in actual sampling, but to define this encoding method in the coding history to distinguish it from and ). sampling is widely used in consumer applications, such as video conferencing, digital TV, and DVD storage. Because each color difference element contains 1/4 y sample elements, the 4: 2: 0ycbcr video needs to be exactly 4: 4: 4: 4 or half of the RGB video sample quantity.
Sampling is sometimes described as a "12-bit per pixel" method. The reason can be seen from the sampling of Four pixels. sampling at is performed for 12 times. For each y, CB, and Cr, 12*8 = 96 bits are required, with an average of 96/4 = 24 bits. 6*8 = 48 bits are required for, with an average of 48/4 = 12 bits per pixel.
In a sequence of videos scanned at, the samples corresponding to the Y, CB, AND Cr of a complete video frame are allocated to two fields. We can see that the total number of samples for the interlace scan is the same as the number of samples used in the progressive scan.
Comparison: The y41p (and y411) (packed format) format retains the Y component for each pixel, while the UV component samples every four pixels in the horizontal direction. A macro pixel is 12 bytes, which actually represents 8 pixels. The YUV components in the image data are arranged in the following order: U0 y0 V0 Y1 U4 Y2 V4 Y3 Y4 Y5 y6 Y8... Iyuv format (planar) extracts Y components for each pixel. When UV components are extracted, the image is first divided into several 2x2 macro blocks, then, each Macro Block extracts one u component and one V component. The yv12 format is similar to that of iyuv, but it is still flat mode. The yuv411 and yuv420 formats are mostly used in DV data. The former is used in NTSC and the latter is used in palth. Yuv411 extracts the Y component for each pixel, while the UV component samples every four pixels horizontally. Yuv420 does not mean that the V component sampling is 0. Compared with yuv411, it increases the color difference sampling frequency by one time in the horizontal direction, and reduces the color difference sampling by half at the U/V interval in the vertical direction, as shown in. (It seems that the image cannot be displayed)
Specific requirements for the use of BITs in various formats (sampling with is used, and 8 bits are used for each element ):
Format: Sub-qcif brightness resolution: 128*96 bits per frame: 147456
Format: qcif brightness resolution: 176*144 bits per frame: 304128
Format: CIF brightness resolution: 352*288 bits per frame: 1216512
Format: 4cif brightness resolution: 704*576 bits per frame: 4866048