[Reprint] video encoding (h264 overview)

Last Update:2014-08-20 Source: Internet

Author: User

Tags compact

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

IVideo Encoding

1.1 target of video compression and encoding

1) Ensure compression ratio

2) Ensure recovery quality

3) easy to implement, low cost, and reliability

1.2 starting point of compression (feasibility)

1) Time Correlation

In a video sequence, adjacent two adjacent frames have very few differences. This is the time correlation.

2) Spatial correlation

In the same frame, there is a large correlation between adjacent pixels. The closer the two pixels are, the stronger the side correlation.

Based on the model classification of the used source:

1) waveform-based coding

If the source model of "an image consists of many pixels" is used, the parameters of this source model are the brightness and the amplitude of the color of the pixel, these parameters are encoded Based on waveforms.

2) Content-based encoding

If a source model composed of several objects is used, the parameters of this source model involve the shape, texture, and motion of each object. Content-based encoding is used to encode these parameters.

H264 applications can be divided into three levels:

1) basic Grade: (simple version, wide application, support for intra-and inter-frame encoding, based on variable entropy encoding .)

Applications: real-time communication, such as video sessions, conference TVs, and wireless communications.

2) Major grades: (a number of technical measures are adopted to improve image quality and increase compression ratio, support for interlace videos and context-based adaptive arithmetic coding .)

Application: digital broadcast and digital video storage

3) extended grades: application fields: Video Stream Transmission and on-demand video on various networks

4) HD grade:

IIVideo Encoding principles

2.1 compress an image or video sequence to generate a code stream.

Image processing: Intra-Frame Prediction Encoding

The predicted P value is obtained by reference of the encoded image after motion compensation. The prediction image P is subtract from the FN of the current frame, and the residual difference DN of the two images is obtained. The dn is converted to t, q is quantified, and space redundancy is eliminated. The coefficient x is obtained, sorts X (to make the data more compact) and Entropy code (to add motion vectors... Obtain the nal data.

Processing of video sequences: Inter-Frame Prediction Encoding

Predicted P is predicted by the Macro Block encoded in the current video (brightness 4 × 4 or 16 × 16, color 8 × 8 ). The block to be processed, minus the predicted value p, gets the residual value DN. The dn is converted to T, quantizes Q, obtains the coefficient x, rearranges X (makes the data more compact), and Entropy code, obtain nal data

2.2 fields, frames, Images

Field: The image scanned by the same line. Even rows become the top rows. The odd number of rows is the base field.

All the top fields are called the top fields. All bottom fields are called bottom fields.

Frame: line-by-line Scanned Image

Image: both the field and frame are regarded as images.

2.3 macro blocks, slices

Macro Block (MB)A macro block consists of a 16x16 brightness block, an 8x8 CB, and an 8x8 Cr.

Slice (Slice): An image can be divided into one or more pieces. A piece consists of one or more macro blocks.

3.H264Structure and Application

H.264 separates nal from VCL from the Framework Structure for two main purposes:

First, you can define the interface between VCL video compression processing and NAL network transmission mechanism. This allows the design of VCL in the video encoding layer to be transplanted on different processor platforms, it has nothing to do with the data Encapsulation Format of the nal layer;
Second, both VCL and NAL are designed to work in different transmission environments. In heterogeneous network environments, VCL bit streams do not need to be reconstructed and reencoded.

3.1 h264 encoding format

H264 functions are divided into two layers: Video Encoding layer (VCL) and network extraction layer (NAL)

The VCL function is used for video encoding and decoding, including motion compensation prediction, transform encoding, and entropy encoding;

NAL is used to encapsulate and package VCL video data in an appropriate format.

1) VCL data is the video data sequence after compression and encoding.

VCL data can be transmitted or stored only after it is encapsulated in the nal unit.

2) nal unit format

The Nal unit consists of a 1-byte header, three fixed-length fields, and an uncertain number of encoding segments.

Header mark Syntax: NALU type (5bit), importance indication bit (2bit), and prohibition bit (1bit ).

NALU type: 1 ~ 12 used by H.264, 24 ~ 31 is used by applications other than H.264.

Importance indication: indicates the importance of the nal unit for reconstruction. The greater the value, the more important it is.

Bit prohibited: when the network discovers that the nal unit has a bit error, you can set this bit to 1 so that the receiver can discard this unit.

NAL Header

Rbsp

NAL Header

Rbsp

NAL Header

Rbsp

(1)NAL units:Video data is encapsulated in an integer byte NALU. Its first byte indicates the data type in the unit. H.264 defines two encapsulation formats. Packet Switching-based networks (such as H.323 Systems) can use RTP Encapsulation Format to encapsulate NALU. Other systems may require that NALU be transmitted as an ordered bit stream. 264 defines a bit stream format transmission mechanism. It uses start_code_prefix to encapsulate NALU to determine the nal boundary.

(2) Parameter set:In the past, the gob \ GOP \ image header information is crucial in video coding and decoding standards. The loss of packets containing this information often results in the inability to decode images related to this information. For this reason, H.264 transfers the information that is rarely changed and takes effect on a large number of vcl nalu in the parameter set. Parameter sets are divided into two types: sequence parameter sets and image parameter sets. To adapt to multiple network environments, parameter sets can be transmitted in-band or out-of-band.

Parameter set of a sequence (SPS): including all the information of an image sequence,

Image parameter set (PPS): contains the information of all images.

3.2 h264 Network Transmission

H.264 can be used in Networks Based on RTP/udp/IP, H.323/m, MPEG-2 transmission, and H.320 Protocol

For RTP encapsulation of H.264, refer to RFC 264. The load type (PT) domain is not specified.

3.3 Data Division

Generally, the data of a macro block is stored together to form a piece. Data Division re-combines the macro block data of a piece, the Macro Block semantic-related data is divided into parts. There are three different data classifications in H.264.

(1) Header Information Division: contains the macro block type, quantization parameters, and motion vectors, which are the most important pieces of information.

(2) Intra-frame information Division: including the cbps and intra-frame coefficients within the frame. Intra-frame information can prevent the spread of errors.

(3) Inter-frame information Division: contains the cbps between frames and the inter-frame coefficient, which is usually much larger than the first two divisions.

Intra-frame information division work out intra-frame macro Blocks Based on header information, and inter-frame information division work out inter-frame macro Blocks Based on header information. The division of Inter-frame information is of the lowest importance and has no contribution to key synchronization. When data division is used, the data in the slice is saved to different caches based on its type, and the size of the slice must be adjusted to make the maximum partition smaller than the MTU size.

If the decoding end obtains all the divisions, it can completely reconstruct the slices. If the decoding end finds that the intra-frame information or inter-frame information is lost, the available header information still has good error recovery performance. This is because the macro block type and the motion vector of the macro block contain the basic features of the macro block.

3.4 flexible macro block sequence (FMO)

By setting the Macro Block order ing table (mbamap) to assign the macro block to different chip groups at will, the FMO mode breaks down the original Macro Block order, reducing coding efficiency and increasing latency, however, it enhances the anti-error performance. The FMO mode divides various image modes, including the chessboard mode and the rectangle mode. Of course, the FMO mode can also split macro blocks in one frame in sequence, so that the split part size is smaller than the MTU size of the wireless network. After the FMO mode is used, the image data is transmitted separately. The chessboard mode is used as an example, when data in one group is lost, the data in the other group (including information about adjacent macro blocks with lost macro blocks) can be overwritten. The experimental data shows that when the loss rate is 10% (for video conferencing applications), the image after error masking still has a high quality.

ThuH264Network Transmission

NAL supports many packet-based wired/wireless communication networks, such as H.320, MPEG-2 and RTP/IP. However, most video applications currently use RTP/udp/IP network protocol layers. Therefore, the transmission framework is mainly used in the following description. Next, we will analyze the basic processing unit nalu of the nal layer and its network encapsulation, segmentation, and merging methods.

4.1. nal Unit

Each nal unit is a variable-length byte string of a certain syntax element, including a byte header information (used to represent the data type), and load data of several integer bytes. A nal unit can carry an encoding piece, A/B/C data segment, or a sequence or image parameter set.

The Nal unit is transmitted in sequence according to the RTP serial number. Among them, T is the load data type, accounting for 5 bits; r is the importance indicator bit, accounting for 2 bits; the last F is the prohibited bit, accounting for 1 bit. The details are as follows:

(1) NALU type bit

It can represent 32 different types of features of nalu, type 1 ~ 12 is defined by H.264. The type is 24 ~ 31 is used outside H.264. the RTP load specification uses some of these values to define package aggregation and splitting. Other values are reserved for H.264.

(2) Importance indicator

It is used to mark the importance of a nal unit during refactoring. The larger the value, the more important it is. If the value is 0, this nal unit is not used for prediction. Therefore, it can be discarded by the decoder without error diffusion. If the value is greater than 0, this nal unit is used for non-drift reconstruction, and the value is higher, the loss of this nal unit has a greater impact.

(3) Bit prohibited

The default value in encoding is 0. When the Network identifies a bit error in this unit, it can be set to 1 so that the receiver can discard this unit, it is mainly used to adapt to different types of network environments (such as wired and wireless environments ). For example, for a gateway from wireless to wired, one side is a wireless non-IP environment, and the other side is a wired network without bits errors. If the verification fails when a nal unit arrives at the wireless end, the gateway can remove the nal unit from the nal stream or forward the known nal unit to the receiving end. In this case, the smart decoder will try to refactor this nal unit (it is known that it may contain bit errors ). Instead of intelligent decoders, this nal unit will be simply discarded. The Nal unit structure specifies the common format used for grouping or stream-oriented transmission subsystems. In H.320 and MPEG-2 systems, the stream of the nal unit should be within the nal unit boundary, with a 3-byte starting prefix before each nal unit. In a packet transmission system, The Nal unit is determined by the system's transmission procedure. Therefore, the above starting prefix is not required. A set of NAL units is called an access unit, and the time information (SEI) is added after the demarcation to form a basic encoded image. The basic encoded image (PCP) is composed of a group of encoded NAL units, followed by a redundant encoded image (RCP), which is a redundant representation of the same video image of PCP, it is used to recover information when the PCP program is lost in decoding. If the encoded video image is the last image of the encoded video sequence, the end of the sequence nal unit should appear, indicating that the sequence ends. An image sequence has only one parameter group and is decoded independently. If the encoded image is the last image of the entire nal unit stream, the end of the stream should appear.

H.264 adopts the strict access unit mentioned above, which not only makes H.264 adaptive to multiple networks, but also further improves its anti-error capability. When the serial number is set, you can find out which VCL unit is lost. The redundant encoded image can still get a rough image even if the basic encoded image is lost.

RTP in 4.2. H.264

The above describes the structure and implementation of The Nal unit. Here we will discuss in detail the load specification and anti-Code Performance of RTP. RTP can reduce the packet loss rate at the receiving end by sending redundant information, which increases the latency. Unlike redundant fragments, RTP adds redundant information as a backup of some key information, which is suitable for non-equal protection mechanisms. The multimedia transmission specifications are as follows:

(1) group replication is performed multiple times, and the sender replicates and resends the most important bit Information Group, so that the receiver can receive the packet correctly at least once, at the same time, the receiving end should discard the redundant backups of the groups that have been correctly received.

(2) group-based forward correction: this operation performs an exclusive or operation on the protected group and sends the operation result to the receiver as redundant information. Due to latency, it is not used for dialog applications and can be used for streaming media.

(3) Audio redundancy encoding, which can protect any data streams including videos. Each group consists of headers, loads, and loads of the previous group. H.264 can be used together with data splitting.

RTP encapsulation specifications are summarized as follows:

(1) The additional overhead is less, so that the MTU size is between 100 and ~ The value range is 64 KB;

(2) It is easy to distinguish the importance of grouping without decoding the data in the grouping;

(3) the load specification shall ensure that groups that are lost due to other bits cannot be decoded without decoding;

(4) supports dividing NALU into multiple RTP groups;

(5) Multiple NALU nodes can be grouped into one RTP group.

H. 264 A simple packaging scheme is adopted, that is, to put a NALU in an RTP group, put the NALU (including the NALU header that serves as the load header simultaneously) into the RTP load, and set the RTP Header Value. Ideally, VCL will not generate NAL units larger than MTU to avoid splitting the IP layer. At the receiving end, the RTP sequence information is used to identify duplicate packets and discard them, and the nal unit in the valid RTP packet is taken out. The basic and extended grades allow unordered Decoding of slices so that packages do not have to be re-ordered in the jitter cache. When using the primary grade (disordering of slices is not allowed), we need to use the RTP sequence information to re-Sort packets. The concept of decoding the Don sequence is being discussed in IETF.

The following situations exist. For example, when content pre-encoding is used, the encoder does not know the MTU size of the underlying network and generates many NALU larger than the MTU size. This requires NALU splitting and merging.

(1) NALU Segmentation

Although the division of the IP layer can make the data block smaller than 64 kilobytes, it cannot be protected at the application layer, thus reducing the effect of the non-equality protection solution. Because UDP data packets are smaller than 64 kilobytes and the length of a piece is too small for some applications, application layer packaging is part of the RTP packaging solution. The current splitting scheme is being discussed by IETF and has the following features: ① The NALU blocks are transmitted in ascending order according to the RTP sequence number; ② the first and last NALU blocks can be marked; ③ detect lost parts.

(2) Merge nalu

Some nalu, such as SEI, and parameter set, are very small. merging them helps reduce the header overhead. Existing two collection groups: ① a single time collection group (STAP) combined by timestamp, generally used in a low-latency environment; ② multi-time collection group (mtap ), different timestamps can also be combined, which are generally used in high-latency environments, such as stream applications.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More