Flower Wei fun Pat Cloud product Manager
Deep Decomposition Video Basics
"Video technology has grown to a more than 100-year history, although it is shorter than the history of photography, but it has been the most important medium for a long time in the past."
Because of the rise of Internet in the new century, the traditional media technology has a better development platform, and new multimedia technology has emerged. Multimedia technology Not only covers the expression of traditional media, but also increases the interactive function and becomes the most important information tool at present.
In the multimedia technology, the first development is the picture information technology, because the information source is more extensive, the generation speed high production efficiency, and the application threshold is low, so once is the Internet most attractive content.
However, with the continuous progress of technology, the production and processing of video technologies gradually reduced the threshold, information resources continue to grow, at the same time, because the content of video information more rich and complete congenital advantages, in recent years has gradually become the mainstream.
So then I'll make a detailed introduction to video information technology. The first thing we talk about today is video technology in the analog and digital eras. ”
Video technology in the simulation era
The earliest video technology came from the film, and the film technique came from photographic technology. Because the modern Internet video information technology principle comes from the television technology, therefore only then does the television technology introduction.
The world's first television was born in 1925 and was invented by the British John Beder. It is also the world's first set of TV shooting, signal transmitting and receiving system. And the principle of television technology can be understood as signal acquisition, signal transmission, image restoration three stages.
The acquisition of the camera signal, through the photosensitive device to obtain the intensity of light (the early TV is black and white, so only to take the brightness signal). Then every 30~40 milliseconds, the intensity information of the collected light is sent to the receiving end. The signal is also synchronized every 30~40 milliseconds, and the signal is scanned onto the screen for display.
So for the signal to restore, because the screen TV using a ray gun will be shot to the fluorescence layer, to stimulate the fluorescence display, then the ray gun to draw the entire image will take a while. The ray gun starts at the top of the screen and emits rays from one line to the bottom of the screen. It then continues the launch of a row from the top to display the next image. But the ray gun scanning speed is not so fast, so every time the image is displayed, either only the singular line, or only a single row. Then the two images are superimposed, which is a complete frame. So television was interlaced in the early stages.
So how did the signal come about?
As with the camera's photosensitive principle, the photosensitive device is a light sensitive device, which can produce different voltages for the intensity of the input. These signals are then converted into different currents to transmit to the receiving end. When the TV's scanner gun is emitted to the screen with different current intensity, the stronger the phosphor receives, the brighter it will become, the weaker it will be. This produces a black and white signal.
So what is the concept of frames and fields?
As mentioned earlier, because the video capture signal is a continuous shooting image, such as every 40 milliseconds to intercept an image, that is, every second produces 25 images. Each image is a frame, so 25 images per second can be described as a frame rate of 25FPS (frames per second). And since the past television screen scanning is interlaced, every two scans produce an image, and each scan is called 1 field. This means that every 2 scans generate 1 frames. So when the frame rate is 25FPS, interlaced scanning is 50 games per second.
The analog era in the world television signal standards are not uniform, TV field standards are many, called TV signal standards. Black and white time standard very much, there are a, B, C, D, E, G, H, I, K, K1, L, M, N, and so on, a total of 13 (China is using D and K system). In the era of color television, the format has been simplified into three types: NTSC, PAL, SECAM, where NTSC is divided into NTSC4.43 and NTSC3.58. China's color TV is used in the PAL system D modulation mode, so also called pal-d format. Interested can be Baidu encyclopedia "TV format" to learn more about.
In addition, you may find that the frequency of the field is consistent with the frequency of the alternating current. For example, China's power grid AC frequency is 50Hz, and the TV system pal-d is 50 games per second, but also 50Hz. Is there any connection between this? What you can tell you is that there is a connection, but it is recommended that you study it yourself.
How is the color signal produced?
In fact, with the basic black-and-white camera technology, people have always wanted to achieve color camera. As early as 1861, British physicist Maxwell argued that all colors can be superimposed on three base colors of red, blue, and green. But the photosensitive device is only sensitive to light, but it is not recognized for color. In order to realize the recognition of color, the light is decomposed into a solid color mode of three primary colors by means of a light filter and a spectrophotometer. Then the solid color brightness of the three primary colors is collected, and then the signal is superimposed to realize the acquisition ability of the color signal.
How is the color signal expressed?
Because the original black and white, basically only need a signal to restore the image (synchronous signal behind). But with color, can the signal express a complete picture of color, and how to express it?
After the advent of color TV, in order to be compatible with the early black and white signal (that is, the black and white machine can receive color signals, but only show black and white), scientists introduced YUV color notation.
YUV signals can be called chromatic aberration signals (Y,R-Y,B-Y) or as component signals (YCbCr, or component, YPbPr). It is composed of a luminance signal y (luminance or luma), and two chroma signals of U and V (chrominance or chroma). The black and white uses only the luminance signal y, and the color TV can use an additional two chroma signal to achieve color effects. But how did the YUV signal come about?
First, because the black and white compatibility is considered, the underlying signal still uses the luminance signal. and the color expression itself is achieved through the superposition of RGB tri-color, in order to be able to convert YUV signal can be reduced to three-color RGB color values, mathematicians used the chromatic aberration algorithm, that is, the selection of a CR signal and a CB signal. The CR signal is the difference between the RGB red signal and the RGB luminance value, and the CB signal refers to the difference between the RGB blue signal and the RGB luminance value. So the YUV signal is sometimes expressed as y,r-y and b-y, so it is also called chromatic aberration signal.
Why is YUV color going to continue today?
If you usually take a video on your phone, you can transfer the video file to your computer and then open it with the MediaInfo software, and you'll find a lot of information about the video's parameters. And these parameter information inside, you will be sure to find the video color of the phone shot is also using YUV signal mode. Why not use RGB to express? There's no black and white in the morning, huh?
In fact, you do not have to consider the reason for compatibility, because you no matter what the signal mode shooting video, as long as the digital information file form, can be related to the playback device signal mode. Because the playback device needs to decode and render the video file while it is playing. This time, no matter what the signal mode or color space, can be converted into a device-compatible way.
As for why the YUV signal pattern has always been so far, the main reason is not because of compatibility considerations, but the YUV signal has a huge advantage, is to save bandwidth. This is important in the field of digital media.
The visual characteristic of human eye is that the human eye is the most sensitive to luminance signal, and the sensitivity to chroma signal is weaker. Therefore, the capacity of Chroma signal can be reduced properly, and the difference will not be observed by human eye. Like the audio inside the MP3 compression format, is the ear insensitive frequency signal capacity to reduce or remove, in order to greatly reduce the size of the file, but the human ear is basically not heard the difference.
As for the YUV signal is how to reduce the information capacity, you can see the following citation:
The main sampling formats of YUV are YCbCr 4:2:0, YCbCr 4:2:2, YCbCr 4:1:1 and YCbCr 4:4:4. Among them, YCbCr 4:1:1 is more commonly used, the meaning is: each point to save a 8bit brightness value (that is, Y value), each 2x2 point to save a Cr and CB value, the image in the naked eye does not feel much change. So, the original use of RGB (R,g,b are 8bit unsigned) model, a point requires 8x3=24 bits (such as the first figure), (after full sampling, YUV still accounted for 8bit). After sampling by 4:1:1, the average is now only 8+ (8/4) + (8/4) =12bits (4 points, 8*4 (Y) +8 (U) +8 (V) =48bits), with an average of 12bits per point. This compresses the image data by half.
The above content is quoted from Baidu Encyclopedia "YUV" article. Limited to space reasons, for the various sampling mode YUV no longer description, we can refer to the Baidu Encyclopedia in the detailed explanation.
Video technology in the digital age
Video technology developed into the digital age, in fact, there is not much change in principle. This is why the knowledge of analog video technology is mentioned earlier.
But the digital video technology, although the basic principle has not changed, but all aspects of performance and function has been greatly improved. These are the highlights of the breakthroughs in video technology after digitization:
The evolution of color photography
As mentioned earlier, the realization of color camera is actually the light decomposition into three primary colors to take the luminance value, but this structure is more complex, the cost is high. Because the color camera requires a prism to be used for the light, then the light must be used for three photosensitive devices (CCD or CMOS). This structure brings the second bad place is the structure will be relatively large, is not conducive to miniaturization of miniaturized.
And then, German Bayer invented a filter, a mosaic filter. The three-color mosaic filter is covered on the photosensitive device, so that a photosensitive device can be used to collect three colors, but also the elimination of the structure of the prism. This way down, not only the cost is reduced, the structure is also simplified.
With this technology, the camera device can be more and more small, now integrated in the mobile phone camera overall thickness of only a few millimeters, size only millimeters. Of course, in the professional field, high-end cameras still use the prism plus 3CCD technology, not because they do not want to change, but 3CCD color abundance better. and professional camera CCD technology from IT development to fit type, interested students can see the Sony Company on the FIT CCD Professional camera introduction to understand. All in all, the civil and professional areas of development are not the same, so the route is also different.
Field concept disappears
In the analog TV era, the technology of the CRT is limited by the interlaced scanning technology to restore the image display. But now it's all flat-screen TVs. (LCD TV, Plasma TV, Laser TV), television imaging mode is no longer a line of a line of scanning, but a one-time full screen rendering. So now video shooting generally does not have the concept of the field, of course, in order to forward compatibility, in the video file information, you will see the scanning mode parameters. Video files taken with the mobile phone, the parameters of the scanning mode are progressive, is the meaning of progressive scanning.
Sample rate and sampling accuracy
We all know that the biggest difference between analog and digital is the way information is stored and transmitted, one is digital quantization of analog quantity. Then the digitization of the continuous process must be used to quantify the sampling process, can also be interpreted as fragmentation. For example, the audio digitization, the audio at each very small time interval to obtain audio information and then digital quantization, and finally all the continuous sampling of digital quantization data combination to form the final information. Video is the same, according to a certain time interval, the acquired image is digitally quantified, and then the continuous digital quantization is a complete set of video files.
But the sampling rate of the video is not as we all understand, 25 frames per second of the image, the sampling rate is 25Hz. In fact, the ITU (International Telecommunications Union, International Telecommunication Union), in the CCIR 601 Standard, has a clear definition of the sampling standard for video:
First, sampling frequency: In order to ensure the synchronization of the signal, the sampling frequency must be a multiple of the TV signal line frequency. CCIR Common TV image sampling standards for NTSC, PAL, and SECAM formats:
F S=13.5mhz
This sampling frequency is exactly 864 times times the line frequency of PAL and SECAM, and 858 times times of NTSC system, which can guarantee the synchronization of sampling clock and line synchronization signal. For 4:2:2 sampling format, the luminance signal is sampled with FS frequency and two chromatic aberration signals are used respectively
Frequency sampling of F S/2=6.75mhz. The minimum sample rate for the resulting chroma component is 3.375MHz.
Second, the resolution: according to the sampling frequency, can be calculated for the PAL and SECAM format, each scan line sampling 864 sample points, for NTSC system is 858 sample points. Since each line in the TV signal includes a certain synchronous signal and a back sweep signal, the effective sample point of the image signal is not so much, CCIR 601 provides for all the standard, the effective sample point of each line is 720 points. Because different formats have different valid rows per frame (PAL and SECAM are 576 lines, NTSC is 484 rows), CCIR defines 720x484 as the basic standard for HDTV HDTV (High Definition TV). When the actual computer displays digital video, the following table parameters are usually used:
Third, data volume: CCIR 601 stipulates that each sample point is digitized by 8 digits, that is, there are 256 levels. But in fact, the brightness signal accounted for 220 levels, chroma signal accounted for 225 levels, other bits for synchronization, encoding and other control. If you sample by the sample rate of F S, the 4:2:2 format, the amount of data for the digital video is:
13.5 (MHz) x8 (bit) +2x6.75 (MHz) x8 (bit) = 27mbyte/s can also be calculated, if sampled by 4:4:4, the amount of digital video data is 40 megabytes per second! A 10-second digital video consumes 270 gigabytes of storage space at a data rate of 27 megabytes per second. According to this data rate, a 680 megabyte CD-ROM can only record about 25 seconds of digital video data, and even if the current high-speed optical drive, its data transmission rate is far less than 27 megabytes per second transmission requirements, video data will not be played back in real time. The amount of uncompressed digital video data for the current computer and network, whether storage or transmission is not realistic, therefore, in multimedia applications of digital video is the key issue of digital video compression technology.
According to the above citation, the sampling rate and the sampling precision of YUV are the solution to the compatibility of digital video from analogue to digital transition. Continuation of the analog video in the behavioral unit scanning mechanism (analog video does not have the concept of resolution, only the concept of the line). As the standard is a unified standard for digital television broadcasting system, it is generally only seen in the field of radio and television, and in other digital video system is not reflected in the basic. For example, you can not find the parameters about the sampling rate in the video file information.
Video resolution
Video resolution is also the main feature of the digital video age, because analog video uses a line scanning mechanism, that is, the image is displayed in line, and the video line in each line is not digitally quantified, so the analog video is defined by how many lines. For example, PAL adopts 576 lines, NTSC standard uses 480 lines.
In the digital age, in order to quantify the specific information of the video, it is necessary to sample and quantify the information of each line, forming the concept of resolution. If you use PAL format video, each line of quantization image point is 768, then the resolution is 768x576. In other words, the PAL-made video image can be decomposed into 768x576 pixel points.
Although simple to see the concept of video resolution is quite simple, but actually not so simple. The reason is that there are many applications for digital video, from the earliest broadcast and television applications, to surveillance and security, to Internet applications, and later to high-definition digital TVs, as well as mobile internet and so on. Because of the many industries involved, each industry will develop its own standards, so the definition of video image resolution has a lot of standards. We take the most common radio and television, surveillance security as an example:
We also have the concept of contact resolution in the computer field, such as VGA (640x480), SVGA (800x600), XGA (1024x768), SXGA (1280x1024), sxga+ (1400x1050), UXGA (1600x 1200), WXGA (1280x800), wxga+ (1280x854/1440x900), WSXGA (1600x1024), wsxga+ (1680x1050), WUXGA (1920x1200), and more. The highest standard now is WQUXGA (3840x2400). This standard was originally developed by IBM analog Signal Computer display standards, and later by the manufacturers continue to use and upgrade. And then by the VESA Standardization Organization Unified Development.
But why can't the resolution be a simple number, rather than a bunch of letters in front of it? This pile of letters will definitely stun a large group of people.
The reason for the resolution of an output is not simply how many pixels are set, but also the way to achieve this pixel-point imaging. Including the level of color, how much bandwidth, how to scan, if in-depth talk about the circuit form, gain control, timing, addressing methods and so on. If not detailed how these images are generated, then the products between the various manufacturers may be difficult to compatible, and will not see today's developed computer market.
By the same token, the standardization of the resolution and implementation of the way to help the industry's unification and compatibility.
What is the resolution standard for monitoring the security field? See below:
As explained here, CIF is the abbreviation for Common intermediate format, which is a common image transfer format used in video conferencing for universal image transmission, which is part of the ITU H.261 protocol. As you can see, the number of chroma samples per resolution and the number of rows are half the corresponding resolution. Yes, because this standard takes into account the performance of the camera and the performance impact of transmission, the use of the interval pixel sampling and interlaced scanning mechanism, and the interval pixel sampling through interpolation to complement.
But these parameters seem to be difficult to see now, why? Very simple, because surveillance security is now high-definition, are D2, D3 this level, the corresponding resolution is 720P and 1080P this category.
So what is the definition of resolution in the field of radio and television?
The video resolution standard for PAL and NTSC is mentioned earlier, and there is also a SECAM format, with a resolution of SECAM of 720x576. Then you will find that the SECAM and PAL lines are the same, with only a different resolution for each row. This is due to the different SECAM modulation carrier mode.
In the era of SD TV, the understanding of resolution is actually different from the present. For example, SECAM format each frame image is 625 rows, but the resolution is 720x576, that is, only 576 rows. is because the video signal transmission process in the frame forward and the frame inverse, and the frame of the reverse process is the back sweep, reverse back. When the video signal is displayed normally, it is necessary to eliminate the interference of the line frame inverse scanning to the screen, so it becomes 576 lines.
In the high-definition era, digital TV introduced the HDTV standard, its definition of display resolution is 1280x720 progressive scan, which is commonly known as 720p;1920x1080 interlaced scanning, commonly known as the 1080i;1920x1080 progressive scan, that is, the so-called 1080P.
Of course, high-definition digital TV has gradually become popular, is currently facing 4K HD transition, so-called UHDTV (Ultra High definition television, Ultra HD digital TV). The UHDTV draft defines two resolution standards, and 4K (3840x2160) and 8K (7680x4320) that support 50Hz, 60Hz, and 59.94Hz three frame rates, with progressive scan only. The UHDTV uses orthogonal sampling, a pixel aspect ratio (PAR) of 1:1, and a display aspect ratio (DAR) of 16:9.
The concept of pixel aspect ratio and display aspect ratio is relatively simple and is not explained here.
About Signal synchronization
Signal synchronization is a very important technology in the field of radio and television, because if there is a problem, your TV screen must be impossible to read, such as the following situation:
The reason for this picture is that the signal is not synchronized. When a row scan is caused, there is no location at the specified position.
To display the image content in the correct location, you must provide a synchronization signal to constrain it. Whether it is the analog TV era, or in the digital TV era, whether it is a TV or a monitor need signal synchronization.
There are generally two kinds of synchronization signals, field synchronization (VSYNC) and Row Synchronization (HSYNC), respectively. Regardless of the type of signal interface, it contains one or two simultaneous signals.
Pin definition for VGA signal line
Another form of VGA interface, also called RGBHV interface
DVI interface Pin Definition
Dedicated video synchronization interface in professional equipment
Although there are many devices such as the TV's composite signal input (Composite), HDMI input, display DisplayPort input, professional equipment SDI and HD SDI input, there is no dedicated video field synchronization and line synchronization signal interface, but not that these signals do not need synchronization. Instead, these signal interfaces have modulated the field synchronization and line synchronization signals into the signal.
In other words, we usually see the video signal Connector, not only the pure video information, but also contains a lot of information, such as synchronous signal, clock signal (Tc,timecode), CEC control signal, HDCP copyright protection information, Serialclock equipment and resolution identification information.
Not to be continued
"Deep decomposition" listening to the fun Pat cloud product Manager Anatomy Video Basics