Knowledge about Video Compression Algorithms

Source: Internet
Author: User
Tags benchmark
Knowledge about Video Compression Algorithms
MPEG-1
After MPEG video compression, three elements are included: I-frames, p-frames, and B-frames ). In the process of MPEG encoding, some video frame sequences are compressed into I frames, some are compressed into P frames, and some are compressed into B frames. The I-frame method is the intra-frame compression method, also known as the "Key Frame" compression method. The I-frame method is based on the discrete cosine transform DCT (discrete cosine transform) compression technology. Such algorithms are similar to JPEG compression algorithms. Using I-frame compression can achieve a compression ratio of 1/6 without obvious compression traces.

The compression algorithm for high compression is not implemented only by intra-frame compression while image quality is ensured. MpEG frames use a combination of intra-frame and intra-frame compression algorithms. The P-frame method is a forward pre-Attention algorithm. It takes into account the same information or data between adjacent frames, that is, It compresses Frames Based on the characteristics of motion. The P-frame method compresses the data of the current frame based on the differences between the current frame and the adjacent previous frame (I frame or P frame. The method of p-frame and I-Frame Joint compression in bytes can achieve higher compression without obvious compression traces.

However, only the slave can use B-frame compression to achieve a high compression rate of to 1. The B-frame method is a bidirectional pre-compaction algorithm for inter-frame compression. When a frame is compressed into B frame, it compresses the frame based on the data difference between the adjacent previous frame, the current frame, and the next frame, that is, only the difference between the current frame and the front frame is recorded. The B-frame data only contains 15% of the I-frame data and 50% of the P-frame data.

The MPEG standard brightness is in a format similar to. After compression, the brightness signal resolution is 352 × 240, and the two chromium signal resolutions are 176 × 120, the frame rate of the two different resolutions is 30 frames per second. The basic method of encoding is to first merge and compress the first frame of the image into an I frame within the unit time. Then, based on the effective compression of a Single-frame image for each subsequent frame, only the portion of the image that changes relative to the previous frame is stored. During Inter-frame compression, the intra-frame compression method is often used at intervals. because intra-frame (Key Frame) compression is not based on the previous frame, a key frame is usually set every 15 frames, this can reduce the accumulation of errors related to the compression of the previous frame. The MPEG encoder first needs to determine to compress the current frame as I frame, P frame or B frame, and then uses the corresponding algorithm to compress it. The possible format of a video sequence compressed by MPEG is: ibbpbbpbbpbbpbbibpbbpbbpbbpbbpbbi ......

Compressing B or P frames requires much more computing processing time than compressing I frames. Some encoders do not have B-frame or even P-frame compression functions, and obviously the compression effect is not very good.


MPEG-2
MPEG introduced MPEG-2 compression standards in 1994 to enable interoperability between video/audio services and applications. The MPEG-2 standard is specific to the compression scheme and system layer of standard digital TV and high definition TV in various applications. The coding bit rate ranges from 3 megabits per second ~ 100 megabits, standard formal specification in ISO/iec13818. MPEG-2 is not a simple upgrade of MPEG-1, MPEG-2 in terms of system and transmission made more specific provisions and further intact. MPEG-2 is particularly suitable for broadcasting-Level Digital TV coding and transmission, identified as sdtv and HDTV encoding standards. MPEG-2 also specifically specifies the multi-channel program re-split mode. MPEG-2 standards are now divided into nine parts, collectively referred to as ISO/iec13818 international standards.

The principle of mPEG-2 image compression is to use two features in the image: spatial correlation and temporal correlation. Each scene in an image is composed of several pixels. Therefore, a pixel usually has a certain relationship with some pixels around it in terms of brightness and color, this relationship is called spatial correlation. A plot in a program is often composed of several consecutive frames of images. There is also a certain relationship between the front and back frames of an image sequence, such a relationship is called time correlation. These two correlations enable a large amount of redundant information in the image. If we can remove the redundant information and only retain a small amount of non-relevant information for transmission, we can greatly save the transmission frequency band. The receiver can use this non-relevant information to restore the original image with a certain degree of image quality. A good compression encoding solution is to remove redundant information from the image to the maximum extent.

MPEG-2 encoded images are divided into three types, namely I frame, P frame and B frame.

The I-frame image watermark uses the intra-frame encoding method, that is, it only uses the spatial correlation within the Single-frame image, rather than the temporal correlation. I frame uses intra-frame compression and does not use motion compensation. Because I frame does not depend on other frames, it is a random access point and is a decoded base frame at the same time. I frame is mainly used for receiver initialization and channel acquisition, as well as program switching and insertion. I frame image compression times are relatively low. I-frame images are periodically generated in today's image sequences, and the frequency of appearance can be selected by the encoder.

The P-frame and B-frame images are encoded between frames, that is, the spatial and temporal correlations are used at the same time. P-frame images are only pre-written with forward time, which can improve the compression efficiency and image quality. The P-frame image can contain the intra-frame encoding part, that is, each Macro Block in the p-frame can be either forward or intra-frame encoding. B-frame image pre-compression uses bidirectional time pre-compression, which can greatly increase the compression factor. It is worth noting that because the B-frame image scheme uses the future frame as the benchmark, the transmission sequence and display sequence of the image frames in the MPEG-2 encoding code stream are different.

The P-frame and B-frame images are encoded between frames, that is, the spatial and temporal correlations are used at the same time. P-frame images are only pre-written with forward time, which can improve the compression efficiency and image quality. The P-frame image can contain the intra-frame encoding part, that is, each Macro Block in the p-frame can be either forward or intra-frame encoding. B-frame image pre-compression uses bidirectional time pre-compression, which can greatly increase the compression factor. It is worth noting that because the B-frame image scheme uses the future frame as the benchmark, the transmission sequence and display sequence of the image frames in the MPEG-2 encoding code stream are different.

The code stream of MPEG-2 is divided into six levels. To better represent the encoding data, the MPEG-2 uses syntaxes to define a hierarchical structure. It consists of six layers: Image Sequence layer, image group (GOP), image, Macro Block, Macro Block, and block.


MPEG-4
MPEG-4 was released in November 1998. MPEG-4 is aimed at Video and Audio Encoding at a certain bit rate, and pays more attention to the interaction and flexibility of multimedia systems. MPEG-4 standards strive to achieve two goals: Low Bit Rate multimedia communication; multi-industry multimedia communication. To this end, the MPEG-4 introduces the AV object (audio/visual objects), making many other interactive operations possible:
"AV object" can be an isolated person, Voice of the person, or background music. It features efficient coding, efficient storage and transmission, and interactive operations.

MPEG-4 performs operations on the AV object mainly including: using the AV object to represent the audio, visual, or audio-visual combination content; combining the existing AV object to generate a composite AV object, then, the AV scenario is generated. The AV object data can be flexibly merged and synchronized to select an appropriate network to transmit the AV object data; agree to the acceptor's user to perform interactive operations on the AV object in the AV scenario.
The MPEG-4 standard consists of six main parts:
① DMIF (the dellivery multimedia integration framework)
DMIF is the overall multimedia transmission framework. It mainly solves the Operation Problems of multimedia applications in interactive networks, broadcast environments, and disk applications. Establishes interaction and transmission between the client and the server by transmitting multi-channel merging bit information. Through DMIF, MPEG4 can establish a channel with special quality of service (QoS) and bandwidth for each basic stream.
② Data plane
The data plane in MPEG4 can be divided into two parts: transmission link and Media Link.
In order to make the basic stream and AV objects appear in the same scenario, MPEG4 references the concepts of Object Description (OD) and stream graph desktop (SMT. The OD transmits the information flow diagram of the basic stream related to the special AV object. Each stream is connected to a channel Assosiation tag on the desktop. Cat can smoothly transmit the stream.
③ Buffer management and real-time Identification
MPEG4 defines a system decoding mode (SDM). This decoding mode describes an ideal decoding device for processing the syntactic semantics of bit streams. It requires special buffer zone and real-time mode. Through effective management, we can make better use of limited buffer space.
④ Audio Encoding
The advantage of MPEG4 is that it not only supports natural sound, but also supports sound synthesis. The audio part of MPEG4 combines audio synthesis and encoding with natural sound encoding, and supports audio object features.
⑤ Video Encoding
Similar to audio encoding, MPEG4 also supports encoding of natural and synthetic visual objects. The synthesized visual objects include 2D, 3D, and facial expressions.
6. Scenario Description
MPEG4 provides a series of tools used to form a group of objects in a scenario. Some necessary merging information forms a scenario description. These scenarios are described in binary format BIFS (binary format for Scene Description). BIFS and AV objects are transmitted and encoded together. Scenario Description describes how to organize and synchronize AV objects under the coordinates of a specific AV scenario. At the same time, there are problems such as intellectual property protection for av objects and AV scenarios. MPEG4 provides a wide range of AV scenarios.
Compared with MPEG-1 and MPEG-2, MPEG-4 is more suitable for interactive AV services and remote monitoring, its design goals make it more adaptive and scalable: MPEG-4 transmission rate between 4800-64000bps, the resolution is 176 × 144. It can use very narrow bandwidth to compress and transmit data through frame reconstruction technology, so as to obtain the best image quality with the least amount of data. Therefore, it will play a major role in digital TV, dynamic images, the Internet, real-time multimedia monitoring, mobile multimedia communications, video streams and video games on the Internet/Intranet, and interactive multimedia applications on DVDs.

H.264
H. 264 is a joint video group (JVT: Joint Video Team) of ITU-T VCEG (Video Coding expert group) and ISO/iec mpeg (active image coding Expert Group) developed a new digital video encoding standard, which is both ITU-T H. 264 is part 2 of the MPEG-4 of ISO/IEC. Draft solicitation started in January 1998, in September 1999, the first draft was completed, its trial model TML-8 was developed in May 2001, And the FCD board of H.264 was adopted at JVT 5th meeting in June 2002. The standard is still under development and is expected to be officially passed in the first half of next year.

H.264, like the old standard, is also a hybrid encoding mode for DPCM plus transform encoding. However, it uses the Concise Design of "regression basics" without many options to achieve much better compression performance than H.263 ++. It enhances the adaptability to various channels, uses the "Network-friendly" structure and syntax to facilitate the handling of error codes and packet loss. The application has a wide range of targets to meet different rates, resolutions, and transmission (storage) the basic system is open, and no copyright is required for use.

H. the 264 algorithm is conceptually divided into two layers: the video encoding layer (VCL: Video Coding layer) is responsible for efficient video content representation, and the network extraction layer (NAl: Network encoding action Layer) pack and transmit data in an appropriate manner as required by the network. H.264 supports motion vectors of 264 or 1/4 pixels. When the accuracy is 1/4 pixels, a 6-stroke filter can be used to reduce high-frequency noise. For motion vectors with a precision of 1/8 pixels, a more complex 8-stroke filter can be used. During Motion Prediction, the encoder can also select an "enhanced" interpolation filter to improve the effect of the pre-Attention. H. there are two methods for entropy encoding in 264. One is to use the unified VLC (UVLC: Universal VLC) for all the symbols to be encoded ), the other method is the adaptive binary arithmetic encoding of the watermark. Draft H.264 includes tools for error elimination to facilitate the transmission of compressed videos in scenarios with multiple error codes and packet loss, such as the robustness of transmission through mobile channels or IP channels.

Technically, H. there are multiple flashes in the 264 standard, such as unified VLC symbol encoding, precise and multi-mode Displacement prediction, integer transformation based on 4 × 4 blocks, hierarchical encoding syntax, etc. These measures enable the H.264 algorithm to produce extremely high encoding efficiency. With the same reconstruction image quality, it can save about 264 bit rate than H.263. H.264 is highly adaptive to the code stream structure network, and supports error recovery, which can be well adapted to IP and wireless network applications.

H.264 has broad application prospects, such as real-time video communication, Internet video transmission, video streaming media service, multi-point communication on Heterogeneous Networks, compressed video storage, and video database. H. 264 there is no cost for obtaining superior performance. The cost is greatly increased by the computing complexity. It is estimated that the encoding computing complexity is about three times that of H.263, the decoding complexity is about twice that of H.263.

H. 264 the suggested technical features can be summarized into three aspects: First, attention should be paid to usefulness; second, mature technologies should be used to pursue higher coding efficiency and concise expressions; the second is to pay attention to the adaptation to mobile and IP networks. The middleware uses the layered technology to isolate the encoding and channel in the form. In fact, many other channels are considered in the source encoder algorithm; third, the main key components of the hybrid encoder are greatly improved under the basic framework of the hybrid encoder, for example, multi-mode Motion Prediction, intra-frame pre-Attention, multi-frame pre-Attention, unified VLC, 4 × 4 two-dimensional integer transformation, and so on.

So far, H. 264 has not yet been finalized yet. However, due to its higher compression ratio and better channel adaptability, it will certainly be applied more and more widely in the field of digital video communication or storage, and its development potential is limitless.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.