About RGB, yuy2, yuyv, yvyu, uyvy, and ayuv]
Common RGB/YUV formats in DirectShow
Source: http://hqtech.nease.net
Author: Lu Qiming
Knowledge: RGB and YUV ---- From DirectShow practices by Lu Qiming
The color display principle of the Computer Color Display is the same as that of the color TV, which adopts the principle of adding and Mixing colors of R (red), g (green), and B (blue: three electronic bundles of different intensity are emitted to make the red, green, and blue phosphor materials on the inside of the screen emit light and produce color. This color representation method is called RGB Color Space Representation (it is also the most commonly used Color Space Representation Method in multimedia computer technology ).
Based on the three-color principle, any color-light F can be mixed with R, G, and B of different components.
F = R [R] + G [g] + B [B]
R, G, and B are the coefficients of the three colors involved in the mixing. When the three base colors are 0 (weakest), the light is mixed with black light. When the three base colors are K (strongest), the light is mixed with white light. By adjusting the values of the R, G, and B coefficients, you can mix various colors between the black light and the white light.
Where does YUV come from? In Modern Color TV systems, three-channel color cameras or color CCD cameras are usually used for video recording. Then, the color image signals are divided by color, amplified, and corrected to obtain RGB, after the matrix conversion circuit, the Brightness Signal y and two color difference signal R-Y (U) and B-Y (v) are obtained. Finally, the sending end encodes the brightness and color difference Signals respectively, send data over the same channel. The representation of this color is the so-called YUV color space representation.
The importance of using YUV color space is that its brightness signal y is separated from the color signal u and v. If only the Y Signal component is used, but not the U or V component, the black and white gray images are displayed. The color TV uses the YUV space to solve the compatibility problem between the color TV and the black and white TV with the Brightness Signal y, so that the black and white TV can also receive the color TV signal.
The conversion formula between YUV and RGB is as follows (the RGB value ranges from 0 to 255 ):
Y = 0.299r + 0.587G + 0.114b
U =-0.147r-0.289G + 0.436b
V = 0.615r-0.515g-0.100b
R = Y + 1.14 V
G = Y-0.39u- 0.58 V
B = Y + 2.03u
In DirectShow, common RGB formats include rgb1, rgb4, rgb8, rgb565, rgb555, rgb24, rgb32, and argb32; common YUV formats include yuy2, yuyv, yvyu, uyvy, ayuv, y41p, y411, y211, if09, iyuv, yv12, yvu9, yuv411, and yuv420. As the auxiliary description type of the video media type (subtype), their corresponding guids are shown in table 2.3.
Table 2.3 common RGB and YUV formats
Guid format description
Mediasubtype_rgb1 2 color, each pixel is represented by 1 bit, requires a color palette
Mediasubtype_rgb4 16 color, each pixel is represented by four bits, the color palette is required
Mediasubtype_rgb8 256 colors, each pixel is represented by 8 bits, and a color palette is required.
Mediasubtype_rgb565 each pixel is represented by 16 bits. The RGB component uses 5 bits, 6 bits, and 5 BITs respectively.
Mediasubtype_rgb555 each pixel is represented by 16 bits, and the RGB component uses 5 bits (the remaining 1 bits are not needed)
Mediasubtype_rgb24 each pixel is represented by 24 bits, and each RGB component uses 8 bits
Mediasubtype_rgb32 each pixel is represented by 32 bits, and each RGB component uses 8 bits (the remaining 8 bits are not needed)
Mediasubtype_argb32 each pixel is represented by 32 bits, and each RGB component uses 8 bits (the remaining 8 bits are used to represent the alpha channel value)
Mediasubtype_yuy2, in the format
Mediasubtype_yuyv yuyv format (the actual format is the same as that of yuy2)
Mediasubtype_yvyu yvyu format, packaged
Mediasubtype_uyvy uyvy format, packaged
Mediasubtype_ayuv 4: 4 YUV with alpha channel format
Mediasubtype_y41p y41p format, packaged
Mediasubtype_y411 y411 format (the actual format is the same as that of y41p)
Mediasubtype_y211 y211 format
Mediasubtype_if09 if09 format
Mediasubtype_iyuv iyuv format
Mediasubtype_yv12 yv12 format
Mediasubtype_yvu9 yvu9 format
The following describes various RGB formats.
Rgb1, rgb4, and rgb8 are color palette RGB formats. when describing the format details of these media types, the bitmapinfoheader data structure is followed by a color palette (defining a series of colors ). Their image data is not a true color value, but an index of the current pixel color value in the palette. Take rgb1 (2-color Bitmap) as an example. For example, if the two color values defined in the palette are 0x000000 (black) and 0 xffffff (white), then the image data is 001101010111... (Each pixel is represented by one digit.) The color of each pixel is black, black, white, black, white, black, black, and white ....
22 rgb565 uses 16 bits to represent a pixel. 5 bits in the 16 bits are used for R, 6 bits are used for G, and 5 bits are used for B. In a program, one word (word, one word equals two bytes) is usually used to operate on one pixel. After a pixel is read, the meaning of each bit of the word is as follows:
High byte and low byte
R g B
The value of each RGB component can be obtained by combining the shielded word and the shift operation:
# Define rgb565_mask_red 0xf800
# Define rgb565_mask_green 0x07e0
# Define rgb565_mask_blue 0x001f
R = (wpixel & rgb565_mask_red)> 11; // value range: 0-31
G = (wpixel & rgb565_mask_green)> 5; // value range: 0-63
B = wpixel & rgb565_mask_blue; // value range: 0-31
Rgb555 is another 16-bit RGB format. The RGB component is represented by 5 bits (the remaining 1 bits are not used ). After reading a pixel with a word, the meaning of each bit of the word is as follows:
High byte and low byte
X r r g B (X indicates no, can be ignored)
The value of each RGB component can be obtained by combining the shielded word and the shift operation:
# Define rgb555_mask_red 0x7c00
# Define rgb555_mask_green 0x03e0
# Define rgb555_mask_blue 0x001f
R = (wpixel & rgb555_mask_red)> 10; // value range: 0-31
G = (wpixel & rgb555_mask_green)> 5; // value range: 0-31
B = wpixel & rgb555_mask_blue; // value range: 0-31
Rgb24 uses 24 bits to represent a pixel, and RGB is represented by 8 bits. The value range is 0-255. Note that the order of RGB components in the memory is BGR .... Generally, you can use the rgbtriple data structure to operate a pixel, which is defined:
Typedef struct tagrgbtriple {
Byte rgbtblue; // blue weight
Byte rgbtgreen; // green component
Byte rgbtred; // red weight
} Rgbtriple;
Rgb32 uses 32 bits to represent a pixel. Each RGB component uses 8 bits. The remaining 8 bits are used as the alpha channel or are not used. (Argb32 is rgb32 with alpha channel .) Note that the order of RGB components in the memory is bgra .... Generally, you can use the rgbquad data structure to operate a pixel, which is defined:
Typedef struct tagrgbquad {
Byte rgbblue; // blue weight
Byte rgbgreen; // green component
Byte rgbred; // red weight
Byte rgbreserved; // reserved bytes (used as alpha channel or ignored)
} Rgbquad;
The following describes various YUV formats. The YUV format generally has two categories: packed format and planar format. The former stores the YUV component in the same array, usually several adjacent pixels constitute a macro pixel (macro-pixel), while the latter uses three arrays to separate the three components of YUV, it is like a three-dimensional plane. In table 2.3, yuy2 to y211 are both in the packaging format, while if09 to yvu9 are in the flat format. (Note: When introducing various formats, each YUV component carries a subscript. For example, y0, U0, and V0 indicate the yuv component of the first pixel, y1, u1, and V1 indicate the YUV component of the second pixel, and so on .)
The yuy2 (and yuyv) format retains the Y component for each pixel, while the UV component samples every two pixels horizontally. A macro pixel is 4 bytes, which actually represents 2 pixels. (Means that a macro pixel contains four Y components, two U components, and two v components .) The YUV component order in the image data is as follows:
Y0 U0 Y1 V0 Y2 U2 Y3 V2...
The yvyu format is similar to yuy2, except that the order of YUV components in the image data is different:
Y0 V0 Y1 U0 Y2 V2 Y3 U2...
The format of uyvy is similar to that of yuy2, but the order of YUV components in image data is different:
U0 y0 V0 Y1 U2 Y2 V2 Y3...
The ayuv format has an alpha channel and extracts YUV components for each pixel. The image data format is as follows:
A0 y0 U0 V0 A1 Y1 U1 V1...
The y41p (and y411) format retains the Y component for each pixel, while the UV component samples every 4 pixels horizontally. A macro pixel is 12 bytes, which actually represents 8 pixels. The YUV component order in the image data is as follows:
U0 y0 V0 Y1 U4 Y2 V4 Y3 Y4 Y5 y6 Y8...
In the y211 format, the Y component is sampled every two pixels in the horizontal direction, while the UV component is sampled every four pixels. A macro pixel is 4 bytes, which actually represents 4 pixels. The YUV component order in the image data is as follows:
Y0 U0 Y2 V0 Y4 U4 y6 V4...
The yvu9 format extracts the Y component for each pixel. When the UV component is extracted, the image is first divided into several 4x4 macro blocks, then, each Macro Block extracts one u component and one V component. When storing image data, the first is the array of Y components of the entire image, followed by the U Component Array and the V Component Array. The if09 format is similar to yvu9.
The iyuv format extracts the Y component for each pixel. When the UV component is extracted, the image is first divided into several 2x2 macro blocks, then, each Macro Block extracts one u component and one V component. The yv12 format is similar to that of iyuv.
The yuv411 and yuv420 formats are mostly used in DV data. The former is used in NTSC and the latter is used in palth. Yuv411 extracts the Y component for each pixel, while the UV component samples every four pixels horizontally. Yuv420 does not mean that the V component sampling is 0. Compared with yuv411, it increases the color difference sampling frequency by one time in the horizontal direction, and reduces the color difference sampling by half at the U/V interval in the vertical direction, 2.12.
Color problem:
When we use DVDRip or embedding, we usually encounter terms about color, such as YUV, RGB, yv12, and so on. When many people get started with these things, they will feel dizzy and confused.
For another example, many articles emphasize that fast recompress should be selected during vdm processing, but what are the differences between fast recompress, normal recompress, and full processing modes?
This article will answer these questions one by one.
This is a summative article, so many paragraphs are taken directly from other articles. I would like to express my gratitude to the original author. This article references the articles "the chroma upsampling error (color upsampling error)" And silky originally published by Don munsil & Stacey benchmark.
1. What is RGB?
RGB indicates the three primary colors of red, green, and blue. r = red, G = green, and B = blue.
2. What is YUV/YCbCr/ypbpr?
The Brightness Signal is often called Y, and the color signal is composed of two independent signals. Depending on the Color System and format, the two color signals are often referred to as U, V, Pb, PR, CB, and Cr. These are all produced by different encoding formats, but they actually have the same concept. In a DVD, the color signal is stored as CB and Cr (C stands for color, B stands for blue, and R stands for red ).
3. What is, and?
In the past decade, video engineers have discovered that human eyes are less sensitive to color than to brightness. In physiology, there is a rule that there are more retina stem cells on the human retina than retina cone cells. To put it bluntly, the role of retina stem cells is to identify brightness, the role of retina cone cells is to identify the color. Therefore, the bright and dark resolution of your eyes is more accurate than the color resolution. It is precisely because, in our video storage, there is no need to store all the color signals. Since the eyes are invisible, why is it a waste of storage space (or money) to store them?
Consumption video tapes like Beta or VHS benefit from leaving more bandwidth on the video tape to a black-white signal (called "brightness "), leave a small amount of bandwidth to a color signal (called "color ").
In MPEG2 (that is, the compression format used by the DVD), the Y, CB, AND Cr signals are stored separately (that is why three cables are required for Component Video Transmission ). Among them, the y signal is black and white, and is stored in full resolution. However, because the human eye is less sensitive to color information, the color signal is not stored in full resolution.
The highest color signal resolution format is. That is to say, each 4-point y sample has a corresponding 4-point CB and 4-point Cr. In other words, in this format, the resolution of the color signal is the same as that of the Brightness Signal. This format is mainly used in video processing devices to prevent image quality from decreasing. When an image is stored in the master tape, such as D1 or D5, the color signal is usually reduced.
[Center]
In Figure 1, you can see the brightness and color sampling distribution in the format. As shown in the figure, each pixel in the image has the corresponding color and brightness sampling information. [/Center]
This is, that is to say, every 4-point y sampling, there will be 2 points CB and 2 points Cr. In this format, the number of scanned lines of the color signal is the same as that of the brightness signal, but the number of color sampling points on each scanning line is only half of the Brightness Signal. When the signal is decoded, the "missing" color sampling is usually supplemented by a certain inner Interpolation Point Algorithm Through the color information operation on both sides of the algorithm.
[Center]
Figure 2 shows the distribution of brightness and color samples in the format. Here, each pixel has a corresponding brightness sample, and half of the color sample is discarded. Therefore, we can see that the color sample signal is available at every sampling point. When a picture is displayed, the missing color information is calculated by the color on both sides through the inner interpolation point. As mentioned above, human eyes are less sensitive to color than to brightness. Most people cannot tell the differences between pictures of colors and. [/Center]
The lowest color signal resolution format, that is, the format used by the DVD, is. In fact, is a chaotic title. Literally, is equivalent to sampling every 4 o'clock, and there are 2 points CB and 0 points Cr, but this is not the case at all. In fact, it means that the color sampling is only half of the brightness sampling on each horizontal scanning line, and the number of scanning lines is only half of the brightness! In other words, the color signal resolution is only half of the Brightness Signal either horizontally or vertically. For example, if the size of the entire screen is 720*480, the Brightness Signal is 720*480, and the color signal is only 360*240. In
In, the "missing" color sampling is not only supplemented by the left and right adjacent sampling through the inner interpolation point calculation, the color sampling of the entire row must also be obtained through the color sampling of the upper and lower rows through the inner interpolation point operation. The reason for this is to make the most cost-effective use of the storage space of the DVD. It is true that 4: 4 has a great effect, but if we want to store a movie at 4: 4, our DVD disk must be at least 2 feet in diameter (more than 60 centimeters )!
[Center]
Figure 3 shows the arrangement of the brightness and color Sampling Signals in a non-staggered image in the color format. In the same format as, There is only half of the color sampling information in each scanning line. What is different from, is that not only is the horizontal color information "dropped" half, but the vertical color information is also "dropped" half, in the screen, the color sampling is only 1/4 of the brightness sampling. Note that in the color format, the color sampling is placed in the middle of two scanning lines. Why? Simple: The color sampling on a DVD disk is based on the "average" color information of the two scanned lines above and below. For example, in figure 3, the first line of color sampling (line
The line between line 1 and line 2) is obtained by the "average" of line 1 and line 2, and the color samples of the second line (the line with the center of Line 3 and Line 4) the same is true for line 3 and line 4.
Although the concept of "average" is mentioned many times in the article, the "average" is not the average of (A + B)/2 in our general sense. Color Processing has an extremely complex algorithm that minimizes distortion and is close to the original quality. [/Center]
4. What is yv12 and yuy2?
On personal computers, these yuvs are encapsulated in some formats and sent to software or hardware for processing. There are two packaging methods: packed format, which is used to package y together with the corresponding UV package. The other is planar format, which packages y and U and V respectively and splits them into three plane (plane ).
Yv12 and yuy2 are both YUV packaging formats,Both are packed format.. (In fact, only yuy2 is the packed format, while yv12 belongs to planar.
Format .)
The difference between yv12 and yuy2 is that yv12 is in the yuv4: 2: 0 format, that is, the original storage format on the DVD/VCD. Yuy2 is in the yuv4: 2: 2 format.
5. Why should fast recompress be selected during vdm processing?
The reason for selecting fast recompress is as follows: avisynth 2.5.
Avisynth 2.5 supports direct yv12 processing. We know that the raw MPEG data is in the yuv4: 2: 0 format, that is, the yv12 format. In the past, when we were compressing DivX/Xvid, the processing process was as follows:
DVD/VCD (YUV)-> dvd2avi (YUV-> yuv4: 2: 2-> yuv4: 4: 4-> rgb24)-> vfapi (rgb24) -> tmpgenc/aviutl/virtualdub (rgb24)-> DivX/Xvid codec (rgb24-> yuv4: 2: 0)-> MPEG-4 (YUV)
PS. vfapi can only transmit data with rgb24 internally, so it will be converted to rgb24 output
Or
DVD/VCD (YUV)-> mpg2dec. DLL (YUV-> yuv4: 2: 2)-> avisynth 2.0.x (only filters supporting yuv4: 2: 2 can be used, and filters supporting rgb24/32 cannot be used) -> virtualdub (YUV, cannot use VD filter, because the filetr of VD is processed on rgb32, and fast recompress must be selected for compression before yuv4: 2 is directly unblocked: 2, that is
Yuy2 data to codec compression)-> DivX/Xvid codec (YUV-> yuv4: 2: 0)-> MPEG-4 (YUV)
Therefore, the previous processing process involves several YUV <-> RGB conversions. This conversion is lossy. The more times the original color information is lost, the more serious the original color information is. In addition, the computation of this conversion is time-consuming (this can explain why many cards are generated when we convert yv12 to RGB output, but the quality of RGB is actually much higher ). Then someone (Marc
FD) I thought that the final conversion to MPEG should be saved in the yuv4: 2: 0 format, so why isn't the entire process processed in yv12, that is, all filters are rewritten to the yv12 version, and the colors, noise filtering, and ivtc of yv12 are adjusted directly:
1. The amount of data processed is small. (Uv is half less than yuy2 and less than RGB 24/32, according to yv12)
2. Conversion computing is not required.
So the speed is fast. In addition, it can avoid YUV <-> RGB conversion loss. Isn't it a two-pronged technique?
Therefore, avisynth 2.5 that supports yv12 was born.
However, currently virtualdub does not support yv12. Even if fast recompress is selected, VD will convert the yv12 input to yuy2. Therefore, to get the benefits of yv12 processing, you must use virtualdubmod. This revision only supports yv12. Vdm does not perform any processing only when fast recompress is selected, and data is directly discarded to the encoder for compression. In this way, yv12 can be retained and the entire yv12 process is implemented.