Http://blog.sina.com.cn/s/blog_48c5b1f10100warj.html
MPEG-1 technology VCD output standards ;.....
MPEG-2 technology DVD output standards; such as: HD video transmission, HD satellite TV playback receiving ......
MPEG-4 technology DVD output standards; such as: the current popular network transmission playback, MP4, MP5, satellite hd TV, cable digital TV, computer graphics and so on;
H.264 is the latest technology. It is applicable to HD video transmission and HD satellite TV playback and receiving .......
........................................ ........................................ ......
MPEG-2 Technology
MPEG, short for the movable Picture Experts Group, was established in 1988. Currently, MPEG has issued three formal international standards for active image and sound encoding, called MPEG-1.
, MPEG-2 and MPEG-4, while MPEG-7 and MPEG-21 are in the study.
Technical Introduction
MPEG-2 was designed in 1994 to aim at image quality and higher transfer rates for advanced industrial standards. The transfer rate provided by the MPEG-2 is between 3-10 Mbits/sec, and the resolution can be reached in NTSC.
720x486, MPEG-2 can also provide and provide broadcast-level video and CD-level sound quality. The MPEG-2 audio encoding can provide both left and right loop sound channels, and an aggravated bass sound channel,
Analysis of MPEG-2 Video Encoder
And up to 7 audio channels (the reason why the DVD can be dubbing in 8 languages ). Because of the clever processing of the MPEG-2 at design, most MPEG-2 decoders can also play data in MPEG-1 formats, such as VCD.
At the same time, because of the excellent performance of the MPEG-2, has been applied to HDTV, so that the original intention to design the MPEG-3 for HDTV, has not been born was abandoned. (MPEG-3 requires a transfer rate of 20 Mbits/sec-
40 Mbits/sec, but this will make the picture slightly distorted ). In addition to being the designated standard for DVDs, MPEG-2 can also be used for broadcasting, cable television networks, cable networks and satellite live broadcasting
(Directbroadcastsatellite) Provides broadcast-Level Digital Videos.
Features
Another feature of the MPEG-2 is that it can provide a wide range of change Compression
MPEG-2 encoder (5)
To meet the requirements of different image quality, storage capacity, and bandwidth.
For end users, due to the resolution restrictions of the existing TV, the high-definition picture quality (such as DVD picture) brought by the MPEG-2 on the TV is not obvious, to its audio characteristics (such as increasing the bass, more
Sound channels) more eye-catching.
MPEG-2 encoded images are divided into three types, namely I frame, P frame and B frame.
The I-frame image adopts the intra-frame encoding method, that is, only the spatial correlation within the Single-frame image is used, but not the temporal correlation. Frame P and B are encoded between frames, that is, both spatial and temporal
Correlation. P-frame images only use forward time prediction, which can improve compression efficiency and image quality. The P-frame image can contain the intra-frame encoding part. That is, each Macro Block in the p-frame can be a Forward prediction or a frame.
Encoding. B-frame image adopts bidirectional time prediction, which can greatly increase the compression ratio.
Level
The code stream of MPEG-2 is divided into six levels. To better represent the encoding data, the MPEG-2 uses syntaxes to define a hierarchical structure. It is divided into six layers, from top to bottom: Image Sequence layer, image group (GOP)
, Image, Macro Block, macro block, block.
Edit Standard
Basic Introduction
MPEG-2 standards are currently divided into nine parts, collectively referred to as ISO/iec13818 international standards. The content of each part is described as follows:
Part-ISO/IEC13818-1, system: system, describes how multiple video, audio and data basic bitstreams are synthesized to transmit bitstreams and program bitstreams.
The second part-ISO/IEC13818-2, video: video, describes the video encoding method.
Three sections-ISO/IEC13818-3, audio: audio, which describe an audio encoding method that is backward compatible with the MPEG-1 audio standard.
Four-ISO/IEC13818-4, compliance: compliance testing, describes a method for testing whether an encoding code stream conforms to a MPEG-2 code stream.
Five-ISO/IEC13818-5, software: software, describes the first, second and third parts of the MPEG-2 standard software implementation method.
Part 6-ISO/IEC13818-6, DSM-CC: digital storage media-command and control, describing session signaling sets between servers and users in interactive multimedia networks.
The last six parts have been accepted and become a formal international standard. They have been widely used in digital TV and other fields. In addition, the MPEG-2 standard has three parts: the seventh part of the provisions not with MPEG
-1. Multi-channel audio encoding that is backward compatible with audio; Part 8 is stopped now; Part 9 specifies the real-time interface for transmitting the code stream.
The ATM video coding Expert Group established in 1990 has cooperated with MPEG in the first and second parts of the ISO/iec13818 standard, so the above two parts have become the ITU-T
Standard: ITU-TRec.H.220 system and ITU-TRec.H.262 video.
Explanation
Next we will mainly discuss the MPEG video encoding system, namely the ISO/IEC13818-2 part. MPEG-2 Video Encoding
MPEG-2 video encoding standard is a hierarchical series, divided into four levels according to the resolution of the encoded image (levels )"; according to the collection of the encoding tools used, it is divided into five "classes (profiles )". Level"
A subset of MPEG-2 video coding standards in a specific application, combined with several "Classes": a compression coding tool for a specific set of images in an input format, generate the encoding code within the specified rate range
Stream. Of the 20 possible combinations, 11 are currently accepted, known as MPEG-2 applicability.
-As we know, the current analog TV has the coexistence of PAL, NTSC, and SECAM standards. Therefore, the input format standards of digital TV try to unify these three standards, form a uniform number
The word studio standard, which is ccir601, now known as ITU-rrec bt601. The four input image formats in the MPEG-2 are all based on this standard. Low-level input format pixels
It is 1/4 of the ITU-rrec bt601 format, that is, 352x240x30 (representing the Frame Rate of the image is 30 frames per second, and the number of valid lines of scanning per frame is 240 rows, 352 valid pixels per line), or 352x288x25. Low-level
The input image format of mainlevel fully complies with the ITU-rrec bt601 format, that is, 720x480x30 or 720x576x25. The main level is above the HDTV range, basically four times of the ITU-rrec bt601 format.
Medium 1440 advanced (High-1440Level) image aspect ratio is 4: 3, the format is 1440x1080x30, the advanced (highlevel) image aspect ratio is 16: 9, the format is 1920x1080x30.
In the five "classes" of MPEG-2, the higher "class" means to use a large number of encoding tool sets, more fine processing of the encoding image, good image quality will be achieved at the same bit rate, of course.
The cost is also high. In addition to lower-class encoding tools, higher class encoding also uses some additional tools that are not used by lower classes. Therefore, in addition to decoding images encoded using this class method, high-class Decoder,
It can also decode the image encoded with a lower class method, that is, the "class" of the MPEG-2 has backward compatibility. Simpleprofile uses the least coding tool. Mainprofile
In addition to the class encoding tool, a bidirectional prediction method is also added. Snrscalableprofile and spatiallyscalableprofile provide a method for multi-level broadcast.
The image encoding information is divided into the basic information layer and one or more secondary information layers. The basic information layer contains information that is vital to image decoding. the decoder can decode the Image Based on the basic information.
Poor. The secondary information layer contains image details. The basic information layer is strongly protected during broadcast to provide strong anti-interference capabilities. In this way, when the distance is close and the receiving conditions are good, you can
At the same time, basic information and secondary information are received to restore high-quality images. However, when the distance is far away and the receiving conditions are poor, basic information can still be received and the image is restored without decoding interruption. Advanced Class
(Highprofile) In fact, when the bit rate is higher and the image quality is higher, in addition, the first four classes are processing chromatic aberration signals in line by line when processing y, U, V, the advanced class also provides the ability to process chromatic aberration at the same time.
Signal possibility.
Currently, standard digital TVs use the MP @ ML primary class and primary class, while HDTV uses the MP @ HL primary class and advanced class. Below, we take MP @ ML as an example to illustrate the principle and key technology of MPEG-2 Video Encoding System
.
Principle of MPEG-2 compression coding
Edit the principles and technologies of this section
Principles
In summary, the principle of MPEG-2 image compression is to take advantage of two features in the image: spatial correlation and temporal correlation. Any scene in an image is composed of several pixels. Therefore
A pixel usually has a certain relationship with some pixels around it in brightness and color. This relationship is called spatial correlation; A plot in a program is often composed of several consecutive frames of images,
There is also a certain relationship between the front and back frames of an image sequence, which is called time correlation. These two correlations enable a large amount of redundant information in the image. If we can remove this redundant information,
Only a small amount of irrelevant information is retained for transmission, which can greatly save the transmission frequency band. The receiver uses this non-relevant information and uses a certain decoding algorithm to restore the original
Start image. A good compression encoding solution is to remove redundant information from the image to the maximum extent.
Image Classification
In the MPEG-2, the encoded image is divided into three types, namely I frame, P frame and B frame.
---- The I-frame image adopts the intra-frame encoding method, that is, only the spatial correlation within the Single-frame image is used, but not the temporal correlation. I frame is mainly used for receiver initialization, channel acquisition, and program cutting.
For switching and insertion, the compression ratio of the I-frame image is relatively low. An I-frame image is periodically present in an image sequence. The frequency of occurrence can be selected by the encoder.
Frames P and B are encoded between frames, that is, space and time are used at the same time. P-frame images only use forward time prediction, which can improve compression efficiency and image quality. P-frame images can contain
The intra-frame encoding part, that is, each Macro Block in the p frame can be a Forward prediction or an intra-frame encoding. B-frame image adopts bidirectional time prediction, which can greatly increase the compression ratio. It is worth noting that the B-frame image
The future frame is used as a reference, so the Transmission sequence and display sequence of the image frames in the MPEG-2 encoding code stream are different.
Edit the six layers of the current encoding code stream
Summary
From top to bottom are: Video Sequence layer (sequence), image group layer (GOP: groupofpicture), image layer (picture), and image bar layer (slice ), macro Block layer (macroblock) and image block layer (Block ).
As shown in figure 1, apart from the macro block layer and image block layer, the preceding four
6) MPEG-2 Encoder
The corresponding start code (SC: startcode) exists in the layer. It can be used to re-capture synchronization when sending and receiving both ends fail due to an error code or other reasons. As a result, at least one record of data will be lost in a single out-of-step process.
Details
A sequence refers to the image sequence of a program. The sequence header after the sequence start Code contains the image size, aspect ratio, image rate, and other information. Sequence extension contains some additional data. To ensure availability
Input image sequence. The sequence header is repeatedly sent.
The sequence layer is the image group layer. An Image Group consists of a group of I, P, and B images with mutual prediction and generation relationships, but the first frame is always I frame. The GOP header contains the time information.
The image group layer is divided into three types: I, P, and B. The PIC header contains the image encoding type and time reference information.
Under the image layer is the image stripe layer. An Image stripe contains a certain number of macro blocks, and its sequence is consistent with the scanning sequence. An image in MP @ ML must be in the same macro block line.
The image bar layer is a macro block layer. Three macro block structures are defined in the MPEG-2: 4: 2: 0 Macro Block macro block and macro block, the relationship between the brightness block and the number of chromatic aberration blocks that constitute a macro block.
The macro block contains four brightness image blocks, one CB Color Block and one Cr color block. The macro block contains four brightness image blocks, two CB color blocks and two CR color blocks. The macro blocks contain
Four brightness image blocks, four CB color blocks, and four Cr color blocks. These three macro block structures correspond to three kinds of brightness and color sampling methods.
Encoding Method
Before video encoding, the component signal R, G, and B are transformed into the form of Brightness Signal y and chromatic aberration signal CB and Cr. The sampling frequency of the Brightness Signal in format is 13.5 MHz, and the sampling frequency of the two chromatic aberration Signals
All are 6.75 MHz. In this way, the Brightness Signal in the sampling structure is the 720x576 sample value for each frame, and CB and Cr are the 360x576 sample values, that is, the color difference signal is extracted once every pixel in each row, 3, ○ represents y
Signal Sampling Point. × indicates the sampling point of CB and Cr signals.
In format, the sampling frequency of the brightness and chromatic aberration signals is 13.5 MHz. Therefore, the brightness and chromatic aberration signals in the sampling structure of the space are the x576 values of each frame. The sampling frequency of the Brightness Signal in format
The sampling rate is 13.5 MHz. In the sample structure, the brightness signal is the 720x576 sample value for each frame, and CB and Cr are the 360x288 sample values. That is, the two chromatic aberration signals are sampled every other row, every pixel in each sample row is two colors.
The difference signal is extracted once.
Through the above analysis, it is not difficult to calculate that, in the format, the CB and Cr sample values in the image block space of every four y signals constitute a CB and CR image block respectively. In the format, CB in the image block space of every four y Signals
The CR sample values constitute two CB and CR image blocks respectively. In the format, the CB in the image block space of each four y signals, the CR sample values constitute four CB, respectively, CR image block. The corresponding macro block structure is based on this.
Under the macro block layer, the image block is the bottom layer of the MPEG-2 code stream, and is the basic unit of DCT transformation. An image block in MP @ ML consists of 8x8 sample values. The sample values in the same image block must all be Y Signal samples.
Values, or all are CB signal sample values, or all are Cr signal sample values. In addition, the image block is also used to represent 8x8 DCT coefficients generated after 8 sample values are transformed by DCT.
In the case of intra-frame encoding, the encoded image only goes through DCT, And the quantizer and bit stream encoder generate the encoded bit stream instead of the prediction loop. DCT is directly applied to original image data.
In the case of inter-frame encoding, the original image is first compared with the prediction image in the frame memory to calculate the motion vector, and the Prediction image of the original image is generated from the motion vector and reference frame. Then
The differential image data generated by the difference between the image and the predicted pixel is transformed by DCT, and then the encoded bit stream is generated by the quantizer and bit stream encoder.
It can be seen that the difference between intra-frame encoding and inter-frame encoding is whether the prediction ring is processed.
Edit key technical links in this section
Cosine Transform DCT
DCT is a kind of spatial transformation. In the MPEG-2, DCT is carried out in 8x8 image blocks, which generates 8x8 DCT coefficient data blocks. The biggest feature of DCT transformation is that a general image can combine the energy set of the image block.
In a few low frequency DCT coefficients, that is, the 8x8dct coefficient block is generated. Only a small number of low frequency coefficients in the upper left corner are large, and the other coefficients are small, in this way, only the encoding and transmission of a few coefficients may not seriously affect the image.
Image quality.
DCT cannot directly compress the image, but it has a good concentration effect on the image energy, laying the foundation for compression.
Quantizer
Quantization is performed on the DCT transform coefficient. The quantization process is to remove the DCT coefficient with a specific quantization step. The size of the quantization step is called the quantization precision. The smaller the quantization step, the smaller the quantization precision, and the information contained.
The higher the bandwidth, the higher the transmission frequency. Different DCT transform coefficients have different importance for human visual sensing. Therefore, the encoder performs 64 DCT transformations on an 8x8 DCT transform Block Based on the Visual Sensing principle.
The coefficients use different quantization precision to ensure that the specific DCT spatial frequency information is contained as much as possible, so that the quantization accuracy does not exceed the requirement. In the DCT transform coefficient, the low frequency coefficient is of great importance to visual sensing.
The quantization accuracy of this allocation is relatively small; the high-frequency coefficient is less important to visual sensing, And the quantization accuracy of the allocation is coarse. Generally, most of the high-frequency coefficients in a DCT transform block are converted to zero after quantification.
Scan and travel code
DCT transformation generates an 8x8 two-dimensional array. for transmission, it must also be converted to a one-dimensional arrangement. There are two-dimensional to one-dimensional conversion methods, or scan methods: Type Scanning (zig-zag) and alternating
Scanning, among which type scanning is the most commonly used. After quantification, most non-zero DCT coefficients are concentrated in the upper left corner of the 8x8 two-dimensional matrix, that is, the low-frequency component area. After scanning, these non-zero DCT coefficients are concentrated in
The front of the one-dimensional array, followed by the long string quantization to zero DCT coefficient, these create conditions for the program encoding.
-In the Process Code, only the non-zero coefficient is encoded. The code of a non-zero coefficient is composed of two parts: the first part represents the number of continuous zero coefficients before the non-zero coefficient (called a traveling path), and the last part is the non-zero coefficient.
Coefficient. In this way, the advantages of the scanning model are shown, because in most cases, there are many opportunities for zero connection, and the efficiency of the process code is relatively high. When the posterior residual DCT series in a one-dimensional Sequence
When the number is zero, the 8x8 block encoding can be completed as long as it is indicated by a block end sign (eob). The compression effect is very obvious.
Entropy Encoding
Quantization only generates an effective discrete representation of the DCT coefficient. before actual transmission, bit stream encoding is required to generate digital bit streams for transmission. A simple encoding method uses a fixed length code, that is, each
The Quantization value is expressed in the same number of BITs, but this method is less efficient. Entropy encoding can improve the coding efficiency. Entropy encoding is based on the statistical characteristics of the encoding signal, reducing the average bit rate. Travel and non-
Zero coefficient can be independent or combined for entropy encoding. One of the most commonly used entropy encoding methods is Hoffmann encoding, while the MPEG-2 video compression system uses Hoffmann encoding. In the Hoffmann encoding
A code table is generated after the probability of the encoded signal. It allocates less bits to frequently occurring high probability signals, and allocates more bits to infrequently occurring low probability signals, make the average length of the entire code stream Trend
Shortest.
Channel Cache
Because entropy encoding is used, the rate of the generated bit stream changes with the statistical characteristics of the video image. However, in most cases, the frequency band allocated by the transmission system is constant.
Set the channel cache before entering the channel. Channel caching is a caching tool that writes data from the entropy encoder to a variable bit rate and reads data from the channel at the nominal constant bit rate of the transmission system. The cache size, or
Capacity is set, but the instantaneous output bit rate of the encoder is usually higher than or lower than the frequency band of the transmission system, which may cause overflow or underflow of the cache. Therefore, the cache must have a control mechanism
The compression algorithm is controlled through feedback, and the bit rate of the encoder is adjusted to balance the data write rate and read rate of the cache. The cache controls the compression algorithm by controlling the quantization step of the quantizer.
When the instantaneous output rate of the encoder is too high and the cache is about to overflow, the quantization step increases to reduce the encoding data rate. Of course, the image loss is also increased accordingly; when the instantaneous output rate of the encoder is too low
When the cache is about to overflow, the quantization step is reduced to increase the encoding data rate.
Motion Estimation
Motion Estimation uses the frame encoding method to generate an estimation of the compressed image by referring to the frame image. Motion Estimation
MPEG-2 codec model Diagram
Accuracy is very important for the compression effect of Inter-frame encoding. If the estimation is good, only a small value is left after the compressed image is subtracted from the estimated image for transmission. The motion estimation is carried out in the unit of macro blocks, and the calculation is
The offset between the macro block at the corresponding position of the compressed image and the reference image. This positional offset is described by a motion vector. A motion vector represents the displacement in both the horizontal and vertical directions. In motion estimation, P
Frames and B-frame images use different reference frames. The P-frame image uses the previously decoded I frame or P frame as the reference image, which is called Forward prediction. The B-frame image uses two frames as the prediction reference, which is called bidirectional
Prediction. One reference frame is prior to the Encoding Frame (Forward prediction) in the display sequence, and the other frame is later than the Encoding Frame (backward prediction) in the display sequence ), the reference frame of frame B is an I frame or P frame in any case.
Motion Compensation
Using the motion vector calculated by motion estimation, the macro block in the reference frame image is moved to the corresponding position in the horizontal and vertical directions to generate the prediction of the compressed image. In most natural scenarios
They are all ordered. Therefore, the difference between the predicted image generated by motion compensation and the compressed image is very small. Subjective Evaluation of Digital Image Quality
Subjective Evaluation conditions include: evaluation team structure, Observation distance, test image, ambient illumination, and background color adjustment. The evaluation team consists of a certain number of observers, with professionals and non-professionals each occupying
Proportion. The observed distance is 3-6 times the diagonal line size of the display. The test image consists of several image sequences with certain image details and motion. Subjective Evaluation reflects the statistical evaluation of image quality by many people.
Mean Value.