H.264 technical advantages and Its Application in H.323 Systems

Source: Internet
Author: User

I. Introduction

In recent years, with the rapid construction of communication network infrastructure in China, video services have developed rapidly because they can provide audio and video information to participants at multiple points, saving a lot of money and improving work efficiency, it is expected to become the main service of NGN. Since its emergence, video conferencing systems have developed a variety of systems suitable for various communication networks. Currently, multimedia information transmission systems include H.323, H.324, and H.320. The openness of the IP technology makes it very suitable for carrying a variety of services. As IP Security and QoS problems are gradually solved, the advantages of using IP as the bearer network will become more obvious, the next generation network will also adopt the IP technology as the bearer network technology. Therefore, this article focuses on the H.323 system applicable to multimedia services provided on the IP network. H.264 is a new video codec standard proposed by JVT to achieve higher video compression ratio, better image quality and good network adaptability. Facts prove that H. 264 encoding saves code streams, and its internal anti-packet loss, anti-code capability, and good network adaptability make it very suitable for IP transmission, H. 264 is expected to become the preferred video standard in H.323 systems.

The H.323 system puts forward the following three main requirements for video encoding and decoding standards:

(1) Some IP network access methods, such as XDSL, provide limited bandwidth. In addition to the bandwidth occupied by audio and data, the available bandwidth for video transmission is less, which requires a high compression ratio of video coding and decoding, in this way, the image quality can be improved at a certain bit rate.

(2) Anti-Packet Loss Performance and anti-error performance are good, and adapt to various network environments, including wireless networks with serious packet loss and error codes.

(3) Good network adaptability, facilitating Video Stream Transmission in the network.

Ii. H.264 is applicable to three technical advantages of H.323 Systems

H.264 fully considers the requirements of multimedia communication for video coding and decoding, and draws on the previous research results of video standards. Therefore, it has obvious advantages. The following describes the three advantages of H.264 in combination with the video codec technology requirements of the H.323 system.

1. compression ratio and Image Quality

The improvement of traditional algorithms such as intra-frame prediction, inter-frame prediction, transform encoding and entropy encoding makes H.264 coding efficiency and image quality further improved on the basis of previous standards.

(1) Variable Block Size: You can flexibly select the block size during inter-frame prediction. H. 264 adopts four modes: 16x16, 16x8x16, and 8x8, we can further use three sub-Macro Block partitioning modes, namely 8x4, 4x8, and 4x4, to make the division of moving objects more accurate and reduce the prediction error, improve coding efficiency. In-Frame Prediction generally adopts two brightness prediction modes: intra_4 × 4 and intra_16 × 16. Intra_4 × 4 is suitable for detailed areas in images, while intra_16 × 16 is more suitable for rough image areas.

(2) high-precision motion estimation: in H.264, the accuracy of Motion Compensation prediction for brightness signals is 264 pixels. If the motion vector points to the entire pixel position of the reference image, the predicted value is the value of the reference image pixel at this position. Otherwise, the predicted value of the position of 1/2 pixels is obtained by using linear interpolation of the 6-step FIR filter, you can obtain the value at the 1/2 pixel position by taking the integer and the mean of the pixel value at the 1/4 pixel position. Obviously, the use of high-precision motion estimation will further reduce the inter-frame prediction error.

(3) Motion Estimation of multiple reference frames: motion vectors and reference image indexes must be obtained for each m x n brightness block after Motion Compensation prediction, each sub-Macro Block in the sub-Macro Block has different motion vectors. The process of selecting a reference image is performed at the sub-macro block level. Therefore, multiple sub-macro blocks in a sub-Macro Block use the same reference image for prediction, the reference images selected between multiple sub-macro blocks of the same slice can be different, which is the motion valuation of multiple reference frames.

(4) more flexible selection of reference images: reference images can even adopt bidirectional prediction encoding, which allows you to select images that match the current image as the reference images for prediction, this reduces the prediction error.

(5) weighted prediction: encoder is allowed to weight Motion Compensation prediction values with a certain coefficient to improve image quality in certain scenarios.

(6) elimination block effect filter in the motion compensation cycle: to eliminate the block effect introduced in the prediction and transformation process, H. 264 also uses a block effect filter, but the difference is that H. the elimination block effect filter of 264 is located inside the motion estimation loop. Therefore, the image after the elimination block effect can be used to predict the motion of other images, thus further improving the prediction accuracy.

2. Anti-packet loss and anti-error code

The use of key technologies such as parameter sets, slice usage, FMO, and redundant slice can greatly improve the system's ability to defend against packet loss and error codes.

(1) parameter set: the parameter set and its flexible transmission method will greatly reduce the possibility of errors caused by the loss of key header information. To ensure that the parameter set reaches the decoder reliably, you can send the same parameter set multiple times or multiple parameter sets.

(2) slice usage: an image can be divided into one or several slices. When an image is divided into multiple slices, the visual impact of the space is greatly reduced when the decoding of one slice fails, and the slice also provides the re-synchronization point.

(3) PAFF and mbaff: when the image is encoded for the interlace scan, because there is a large scanning interval between the two fields, for a motion image, the spatial correlation between two adjacent lines in the frame is reduced compared to that of a row-by-row scan. encoding the two fields separately saves the bit stream. For frames, there are three optional encoding methods: merge the two fields into one frame for encoding, encode the two fields separately, or combine the two fields into one frame, however, the difference is that the two vertical adjacent macro blocks in the frame are merged into macro block pairs for encoding. The first two types are called PAFF encoding, which are effective when encoding the moving area. Because the adjacent two rows have a large correlation, the frame mode is more effective. When an image has a moving area and a non-moving area at the same time, the field mode is adopted for the moving area at the MB level, and the frame mode is more effective for the non-moving area. This mode is called mbaff.

(4) FMo: FMO can further improve the error recovery capability of the slice. Using slice group, FMO changes the way images are divided into slices and macro blocks. The ing between a macro block and a film group defines which piece group the macro block belongs. Using FMO technology, H.264 defines seven Macro Block scanning modes.

(1) Intra-frame prediction: H. 264 draws on the past experience of video codec standards in intra-frame prediction. It is worth noting that in H. in 264, the IDR image can make the reference image cache ineffective, and the subsequent image does not refer to the image before the IDR image during decoding, so the IDR image has a good re-synchronization effect. In some channels with serious packet loss and error codes, the IDR image can be occasionally transmitted to further improve the anti-error code and anti-Packet Loss Performance of H.264.

(2) redundant images: to improve the robustness of H.264 decoders in the event of data loss, redundant images can be transmitted. When the basic image is lost, the original image can be reconstructed through redundant images.

(3) Data Division: Because information such as motion vectors and macro block types is more important than other information. 264 introduces the concept of data division, and puts semantic-related syntax elements in the film into the same division. In H. 264 there are three different types of data, and the three types of data are transmitted separately. If the information of the second or third types is lost, the error recovery tool can still be used to properly recover lost information through the information in the first category.

(4) Multi-reference frame motion estimation: the multi-Reference Frame Motion Estimation not only improves the encoding efficiency of the encoder, but also improves the error recovery capability. In the H.323 system, by using RTCP, when the encoder learns that a reference image is missing, you can select the image that the decoder has correctly received as the reference image.

(5) to prevent the spread of errors in space, the decoder can specify that when the Macro Block in P or B is used for intra-frame prediction, the adjacent non-intra-frame encoding macro block is not used as a reference.

3. Network Adaptability

To adapt to various network environments and applications, H.264 defines the video encoding layer (VCL) and network extraction layer (NAL ). Among them, VCL is used for video encoding and decoding, including motion compensation prediction, transform encoding and entropy encoding. nal is used to encapsulate and package VCL video data in an appropriate format.

(1) NAL units: video data is encapsulated in the NALU of an integer byte. Its first byte indicates the data type in the unit. H.264 defines two encapsulation formats. Packet Switching-based networks (such as H.323 Systems) can use RTP Encapsulation Format to encapsulate NALU. Other systems may require that NALU be transmitted as an ordered bit stream. 264 defines a bit stream format transmission mechanism. It uses start_code_prefix to encapsulate NALU to determine the nal boundary.

(2) parameter set: In the past, the Gob/GoP/image header information in the video encoding/decoding standard is crucial, the loss of packets containing the information often results in the failure to decode images related to the information. For this reason, H.264 transfers the information that is rarely changed and takes effect on a large number of vcl nalu in the parameter set. Parameter sets are divided into two types: sequence parameter sets and image parameter sets. To adapt to multiple network environments, parameter sets can be transmitted in-band or out-of-band.

3. Implement H.264 In the H.323 System

Because H. 264 is a new video codec standard, which is applied to H.323 systems. 264 there are some problems, such as how to define the entity H. 264 capability, so the H.323 standard must be supplemented and modified as necessary. To this end, the ITU-T has developed the h.241 standard. This article only describes modifications related to H.323.

First, define H.264 capability during H.245 capability negotiation. H. 264 A capability set contains one or more H. 264 capability list, each H. the 264 capability includes two optional parameters: profile and level, and custommaxmbps and custommaxfs. In H.264, profile is used to define the encoding tools and Algorithms for generating bit streams. level is a key parameter requirement. The H.264 capability is included in the genericcapability structure. The capabilityidentifier type is standard and the value is 0.0.8.241.0.0.1. It is used to identify the H.264 capability. Maxbitrate is used to define the maximum bit rate. The collapsing field contains the H.264 capability parameter. The first entry of the collapsing field is profile. The parameteridentifier type is standard and the value is 41. It is used to identify the profile. The parametervalue type is booleanarray. Its value is profile, which can be 64, 32, or 16, the three values indicate three profiles in sequence: baseline, main, and extended. The second entry of the collapsing field is level, the parameteridentifier type is standard, and the value is 42, which is used to identify level, parametervalue is of the unsignedmin type, and its value is H. 264 the 15 Optional level values defined in limit. Several other parameters are optional.

Second, because h. in section 264, the image structure is different from the traditional standard. Some original H.245 signaling methods are not applicable to H. 264, such as videofastupdategob in miscellaneouscommand. Therefore, h.241 redefined several signaling methods to provide corresponding functions.

Finally, for RTP encapsulation of H.264, see RFC 264. The load type (PT) domain is not specified.

Iv. Conclusion

As a new international standard, H.264 is applicable to coding efficiency, image quality, network adaptability, and anti-code

All of them are successful. However, with the rapid development of terminals and networks, the requirements for Video Codec are constantly increasing. Therefore, H.264 is still being improved and developed to meet new requirements. The current research on H.264 focuses on how to further reduce codec latency, algorithm optimization, and further improve image quality. Currently, there are more and more video conferencing systems using H.264 encoding/decoding, most of which are interconnected on the baseline profile. With the continuous improvement of H.264 and the popularization of video communication, we believe that H.264 will become more and more widely used.

■ References
[1] I TU-T h.241.extended video procedures and control signals for h.300 series terminals. July 2003
[2] Wiegand T, Sullivan g j, BJ Phi ntegaard G. Overview of the H.264/AVC Video Coding Standard. IEEE Trans, circuits, syst Video audio L, Windows 264 (7)
Previous Article: key IP-based H.264 Technology
Next article: H.264 code stream Structure

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.