Video compression algorithm related knowledge

Source: Internet
Author: User

Video compression algorithm related knowledge
MPEG-1
MPEG Video compression encoding consists of three elements: I-frames (i-frames), P-frames (p-frames), and B-frames (b-frames). In the process of MPEG encoding, some video frame sequences are compressed into I-frames, partially compressed into P-frames, and partially compressed into B-frames. The I-frame method is the intra-frame compression method, also called the "Keyframe" compression method. The I-frame method is a compression technique based on DCT (discrete cosine Transform), which is similar to the JPEG compression algorithm. I-frame compression can achieve a compression ratio of 1/6 without significant compression marks.

In the premise of guaranteeing the image quality, the compression algorithm with high compression can not be realized only by intra-frame compression, and MPEG adopts the compression algorithm combining frame and intra-frame. P-Frame method is a forward-prediction algorithm, which considers the same information or data between adjacent frames, and also considers the characteristics of motion for inter-frame compression. The P-Frame method compresses this frame data according to the difference between this frame and the adjacent previous frame (I-frame or P-frame). The combination of P-frame and I-frame compression method can achieve higher compression and no obvious compression marks.

However, only the use of B-frame compression ability to achieve a high compression of 200:1. The B-Frame method is a two-way pre-measured inter-frame compression algorithm. When a frame is compressed into B-frame, it compresses the frame according to the different points of the adjacent previous frame, the frame and the next frame data, or only the difference between the frame and the front and back frames. B-frame data only has the I-frame data of 15%, p-frame data below 50%.

The MPEG standard adopts the similar 4:2:2 format, the resolution of the compressed luminance signal is 352x240, and the resolution of two chroma signal is 176x120, the frame rate of the two different resolution information is 30 frames per second. The basic method of encoding is to first set and compress the first frame of the image as an I-frame within a unit time. Then for the subsequent frames, on the basis of the effective compression of a single frame image, only the portion of its relative to the front and back frame changes. In the process of inter-frame compression is also often used in intra-frame compression method, because the frame (keyframe) compression is not based on the previous frame, generally every 15 frames set a key frame, which can reduce the correlation of the previous frame compression error accumulation. The MPEG encoder first decides to compress the current frame as I-frame or P-frame or B-frame, and then use the corresponding algorithm to compress it. The possible format for a video sequence that is compressed by MPEG full encoding is: Ibbpbbpbbpbbpbbibbpbbpbbpbbpbbi ...

Compressing into B-frames or P-frames requires much more computational processing time than compressing them into I frames. Some encoders do not have a B-frame or even P-frame compression function, obviously the compression effect is not very good.


MPEG-2
MPEG organizations introduced the MPEG-2 compression standard in 1994 to realize the possibility of interoperability between AV services and applications. The MPEG-2 standard is specific to the compression scheme and system layer of standard digital TV and HDTV in various applications, coding rate from 3 megabits per second ~100 megabits, standard formal specification in iso/iec13818. MPEG-2 is not a simple upgrade of MPEG-1, MPEG-2 in the system and transmission aspects of more specific provisions and further intact. MPEG-2 is particularly suitable for the encoding and transmission of broadcast-grade digital TVs and is recognized as the encoding standard for SDTV and HDTV. MPEG-2 also specifically defined the multiplexing of multi-channel programs. The MPEG-2 standard is now divided into 9 parts, collectively referred to as ISO/IEC13818 international standards.

The principle of MPEG-2 image compression is to take advantage of two features in the Image: spatial correlation and temporal correlation. Any scene in a frame image is made up of several pixel points, so a pixel usually has a certain relationship with the luminance and chroma of some pixels around it, and such a relationship is called spatial correlation; an episode in a program is often composed of a sequence of images consisting of several frames of continuous images, There is also a relationship between the front and back frame images in an image sequence, which is called temporal correlation. These two correlations make a large amount of redundant information available in the image. Assuming that we can remove these redundant information and only keep a small amount of non-related information for transmission, we can save the transmission band greatly. The receiver utilizes these non-related information, according to certain decoding algorithm, can restore the original image under the premise of guaranteeing certain image quality. A good compression coding scheme is the ability to maximize the removal of redundant information in the image.

MPEG-2 encoded images are divided into three categories, called I-frames, p-frames, and B-frames, respectively.

I-frame image is adopted in-frame coding, that is, using only spatial correlation within a single frame image, but not using time correlation. I-frames use intra-frame compression and do not use motion compensation because I frames do not rely on other frames, so it is random access in the point, the same time is the decoded base frame. I-frame is mainly used for receiver initialization and channel acquisition, as well as program switching and insertion, I-frame image compression ratio is relatively low. The I-frame image is periodically present in the image sequence, and the frequency can be selected by the encoder.

P-Frame and B-frame images are used for inter-frame coding, that is, the spatial and temporal correlations are used at the same time. The P-frame image is only pre-measured with forward time, which can improve the compression efficiency and image quality. The P-frame image can include intra-frame coding, that is, each macro block in the P-frame can be pre-predicted or intra-frame encoded. B-Frame image adopts bidirectional time prediction, which can greatly improve the compression ratio. It is worth noting that because the B-frame image uses the future frame as a reference, the sequence of transmission and the order of the image frames in the MPEG-2 coded stream are different.

P-Frame and B-frame images are used for inter-frame coding, that is, the spatial and temporal correlations are used at the same time. The P-frame image is only pre-measured with forward time, which can improve the compression efficiency and image quality. The P-frame image can include intra-frame coding, that is, each macro block in the P-frame can be pre-predicted or intra-frame encoded. B-Frame image adopts bidirectional time prediction, which can greatly improve the compression ratio. It is worth noting that because the B-frame image uses the future frame as a reference, the sequence of transmission and the order of the image frames in the MPEG-2 coded stream are different.

MPEG-2 code stream is divided into six levels. To better represent coded data, MPEG-2 a hierarchical structure with syntax. It is divided into six layers, from top to bottom respectively: Image sequence layer, image Group (GOP), image, macro block bar, macro block, block.


MPEG-4
MPEG-4 in November 1998, MPEG-4 is a video and audio codec for a certain bit rate, paying more attention to the interactivity and flexibility of multimedia systems. MPEG-4 Standard strives to achieve two goals: multimedia communication at low bit rate, and it is a synthesis of multi-industry multimedia communication. To do this, MPEG-4 introduces AV objects (audio/visual Objects), making it possible for many other interactions to occur:
"AV object" can be an isolated person, can also be the voice of this person or a background music and so on. It is characterized by efficient coding, efficient storage and propagation, and interoperable operation.

MPEG-4 the operation of AV objects mainly include: The use of AV objects to represent auditory, visual or audio-visual combination of content; Combine existing AV objects to generate composite AV objects, and thus generate AV scenes; flexible multiplexing and synchronization of AV object data to select the appropriate network to transmit these AV object data The user of the receiver is agreed to interact with AV objects in the AV scene.
The MPEG-4 standard consists of 6 main components:
①DMIF (the Dellivery Multimedia Integration Framework)
DMIF is the overall framework of multimedia transmission, which mainly solves the operational problems of multimedia applications in interactive networks, broadcast environments and disk applications. The interaction and transmission between client and server is established by transmitting the multi-channel synthetic bit information. By DMIF,MPEG4, you can establish a channel with special quality of service (QoS) and bandwidth for each basic stream.
② Data plane
The data plane in MPEG4 can be divided into two parts: the transmission relation part and the media relation part.
To make the basic stream and AV objects appear in the same scene, MPEG4 refers to the concept of Object Description (OD) and Flow graph Desktop (SMT). OD transmits a flow graph of the basic flow associated with a particular AV object. The desktop connects each stream to a cat (Channel assosiation Tag), and the cat enables the flow to be transferred smoothly.
③ buffer management and real-time identification
MPEG4 defines a system decoding mode (SDM), which describes an ideal decoding device for processing bit-stream sentences, which requires a special buffer and real-time mode. By effectively managing, the limited buffer space can be better utilized.
④ Audio encoding
The advantage of MPEG4 is that it supports not only natural sounds, but also synthetic sounds. The audio portion of the MPEG4 combines the synthesized encoding of the audio with the encoding of the natural sound, and supports the object characteristics of the audio.
⑤ Video Encoding
Similar to audio coding, MPEG4 also supports encoding of natural and synthetic visuals. The synthetic visuals include 2D, 3D animations, and human facial expressions animations.
⑥ Scene Description Narrative
MPEG4 provides a series of tools for composing a set of objects in a scene. Some of the necessary synthesis information is composed of scene descriptive narration, which is represented in binary format bifs (binary formats for Scene description), BIFs and AV objects are transmitted and encoded together. Scene description narration is mainly used to describe the problem of how to organize and synchronize the AV objects under a specific AV scene coordinate. At the same time there are AV objects and AV Scenes of intellectual property protection and other issues. MPEG4 provides us with a rich AV scene.
Compared with MPEG-1 and MPEG-2, MPEG-4 is more suitable for interactive AV services and remote monitoring, and its design objectives make it more adaptable and scalable: MPEG-4 transfer rate between 4800-64000bps, 176x resolution 144, the ability to use very narrow bandwidth through frame reconstruction technology compression and data transmission, so that the least amount of data to obtain the best image quality. Therefore, it will play a role in digital TV, dynamic image, Internet, real-time multimedia monitoring, mobile multimedia communication, video streaming on internet/intranet and visual games, interactive multimedia application on DVD.

H.
A new digital video coding standard developed by ITU-T's VCEG (video coding Expert Group) and ISO/IEC MPEG (Active image Coding Expert Group) (Jvt:joint video team) It is the 10th part of ITU-T's H. MPEG-4 and ISO/IEC. The draft was drafted in January 1998, and in September 1999, the first draft was finalized and May 2001 developed its test model. The 5th meeting of the JVT, tml-8,2002 June, adopted the FCD board of H. The standard is currently under development and is expected to be formally approved in the first half of next year.

The same as the previous standard, the DPCM plus transform encoding mixed encoding mode. But it uses the simple design of "return to the basic", does not have many options, obtains the compression performance which is much better than the h.263++, strengthens to each kind of channel the adaptability, uses "the network friendly" the structure and the grammar, facilitates to the error and the packet loss processing, the application target scope is wide, to satisfy the different rate, The need for different resolutions and different transmission (storage) situations; Its basic system is open and use without copyright.

The algorithm is conceptually divided into two layers: the video coding layer (Vcl:video Coding layer) is responsible for efficient video content representation, and the network abstraction layer (Nal:network abstraction Layer) is responsible for packaging and delivering the data in the appropriate manner required by the network. The motion vector of 1/4 or 1/8 pixel accuracy is supported. A 6-tap filter can be used to reduce high-frequency noise at 1/4 pixel accuracy, and a more complex 8-tap filter can be used for motion vectors with 1/8 pixel accuracy. The encoder can also select the "enhanced" interpolation filter to improve the pre-measured effect when motion is expected. There are two methods for entropy coding in H. S, one is to use unified VLC (Uvlc:universal VLC) for all the symbols to be encoded, and one is to use content-adaptive binary arithmetic coding. The bill contains tools for error elimination to facilitate the transmission of compressed video in error-prone, packet-loss environments, such as the robustness of transmission in mobile channels or IP channels.

Technically, there are several flash places in the H. x standard, such as the unified VLC symbol coding, high precision, multi-mode displacement prediction, integer transformation based on 4x4 blocks, layered coding syntax, etc. These measures make the H-h.263 algorithm have very high coding efficiency, and can save about 50% of the bit rate under the same reconstructed image quality. The code-flow structure network has strong adaptability and error recovery ability, which can adapt to the application of IP and wireless network very well.

There are wide application prospects, such as real-time video communication, Internet video transmission, video streaming service, multi-point communication on heterogeneous network, compressed video storage, video database and so on. The acquisition of superior performance is not without cost, the cost is greatly added to the computational complexity, it is estimated that the computational complexity of the coding is approximately equal to 3 times times the H.263, decoding complexity is approximately twice times the equivalent of H.263.

The technical features suggested by H. Three can be summed up in one is to pay attention to the use of mature technology, the pursuit of higher coding efficiency, concise form of expression; the second is to pay attention to the mobile and IP network adaptation, the use of layered technology, from the formal coding and channel isolation, in essence, in the source encoder algorithm many other considerations of the characteristics of the channel; Under the basic framework of the hybrid encoder, significant improvements have been made to its key components, such as multi-mode motion prediction, intra-frame prediction, multi-frame prediction, unified VLC, 4x4 two-dimensional integer transformation, etc.

Up to now, H. A has not been finalized, but because of its higher compression ratio, better channel adaptability, will be in the digital video communication or storage field is more and more widely used, its development potential is limitless.

Video compression algorithm related knowledge

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.