H.264 is the latest video coding standard for ITU-T, known as ISO/IEC14496-10 or MPEG-4 AVC, and is a new product jointly developed by the video coding Expert Group of Motion Image Expert Group (mPEG) and ITU. H.264 is divided into two layers, including the video encoding layer and network adaptation layer. The video encoding layer processes block, Macro Block, and chip data and tries its best to be independent from the network layer. 264 is the latest ITU-T video coding standard, known as ISO/IEC14496-10 or MPEG-4 AVC, is a new product jointly developed by the Motion Image Expert Group (mPEG) and ITU video coding expert group.
H.264 is divided into two layers, including the video encoding layer and network adaptation layer. The video encoding layer processes block, Macro, and chip data and tries its best to be independent from the network layer. This is the core of video encoding, including many tools for error recovery; the network adaptation layer processes data above the chip structure, enabling the H. 264 can be used in RTP/udp/IP, H.323/m, MPEG-2 transmission and H.320 protocol-based networks.
Limitations on video compression over IP Networks
1. H.264 application scenarios
Before discussing IP-based H.264, it is necessary to describe the application scenarios of H.264 related to IP networks and their requirements for transmission and decoder. The following describes three scenarios: Dialog application, download service, and streaming media application.
Conversational applications, such as video calls and video conferences, have strict latency restrictions. End-to-End latency must be less than 1 s, preferably less than 100 ms. The decoder parameters can be adjusted in real time, and the error recovery mechanism must be changed based on actual network changes. The coding and decoding complexity cannot be very high. For example, the bidirectional prediction model cannot be used.
Download service. reliable transmission protocols such as FTP and HTTP can be used to transmit all data. Due to the non-real-time nature of such applications, the encoder can perform efficient coding through optimization and has no requirements for latency and error recovery mechanisms.
Streaming media service applications have latency requirements between the above two, and the initialization latency is less than 10 s. Compared with real-time encoding, the latency requirement is reduced. encoder can optimize and implement efficient coding (such as bidirectional prediction ). However, streaming media services generally use unreliable transmission protocols. Therefore, error control and channel error correction coding are required during encoding.
This article mainly discusses dialog applications and streaming media applications, which are based on IP networks. There are three types of IP networks: uncontrollable IP networks (such as inte.net), controllable IP networks (WAN), and wireless IP networks (such as 3G networks ). The three IP networks have different mtusize, bit error probability, and TCP usage tags. The maximum size of the transmission unit is the maximum group length of the network layer. During H.264 encoding, the part length must be smaller than the MTU size, which avoids data segmentation at the network layer. The MTU size between two IP nodes is dynamically changed. It is generally assumed that the MTU size of the wired IP network is 1.5 kb, and that of the wireless network is 100 bytes. It can be seen that the data splitting technology must be used in H.264, which is applicable to wireless networks to make the part length smaller than the MTU size. TCP transmission control protocol can solve the packet loss problem caused by network congestion. In wireless networks, packet loss is caused by link layer errors. TCP is not a good solution, the error control protocol should be used.
2. Protocol environment used by H.264
The dialog application and streaming media application use the same protocol group.
Network Layer Protocol: use IP (Internet Protocol ). Each IP group starts from the sender and passes through a series of routers to the receiver. IP groups larger than MTU are divided and reorganized. The transmission time of each group varies. The IP address's first 20 bytes are guaranteed by the Verification Code, but the data is not protected. The maximum value of an IP group is 64 KB, but the size of MTU is generally not that large.
Transport Layer Protocol: There are two main Protocols: TCP and UDP. TCP provides reliable byte-oriented transmission services, and uses retransmission and timeout mechanisms as the basis for error control. Due to unpredictable latency, it is not suitable for real-time communication transmission. UDP provides unreliable datagram transmission services. The check count (8 bytes) contained in the UDP header can detect and remove groups with bit errors. UDP allows loss, replication, and order change during group transmission. When using the UDP protocol, the high-level must use the error recovery protocol.
Application layer transmission protocol: RTP (real-time transmission protocol ). This protocol is used in combination with IP/udp and is a session-oriented protocol. Each RTP group contains the RTP Header, the load header (optional), and the load itself. The content of the RTP Header mark is shown in Figure 1. The basic option occupies 12 bytes and the mark bit marks the end of a group with the same time stamp. The RTP protocol enables the sender to divide the data into reasonable groups and feed back the network features observed by the decoder to the sender so that the sender can dynamically adjust the bit rate and anti-code mechanism. RTP grouping and RTP load specifications are discussed in Part 4.
Application Layer Control Protocols: H.245, sip, SDP, or RTSP. These protocols can control streaming media, and send and receive parties can negotiate and control the Dynamic Session Layer.
H.264 error recovery tool
The error recovery tool is continuously improved with the improvement of video compression and encoding technology. In the old standard (H.261, h263, the second part of the MPEG-2, block and macro block group division, intra-frame encoding Macro Block, intra-frame encoding block, and intra-frame encoding image are used to prevent the spread of errors. In later improved standards (H.263 +, MPEG-4), errors are restored using multiframe references and data splitting techniques.
H. 264 based on the previous standard, three key technologies are proposed: (1) parameter set, (2) flexible macro block sequence (FMO), and (3) redundant chip (RS) to restore errors.
1. Intra-frame encoding
In H.264, the technology of intra-frame encoding is the same as the previous standard. It is worth noting that:
(1) h. the reference Macro Block of the intra-Frame Prediction encoding Macro Block in 264 can be an inter-frame encoding macro block. The intra-Frame Prediction macro block is not the same as the intra-frame encoding in H.263, the predicted intra-frame encoding method is more efficient than the non-predicted intra-frame encoding method, but it reduces the re-synchronization performance of intra-frame encoding, you can set the intra-Frame Prediction flag to restore this performance.
(2) There are two types of clips that only contain intra-frame macro blocks, one is the intra-frame (islice), and the other is the instant refresh (idrslice ), an immediate refresh must be included in an immediate refresh image (idrpicture. Compared with short-term reference images, instant image refresh provides stronger re-synchronization performance.
In a wireless IP network environment, in order to improve the re-synchronization performance of intra-frame images, we need to optimize the encoding Rate Distortion and set the intra-Frame Prediction mark.
2. Image Segmentation
H.264 supports dividing an image into slices. The number of macro blocks in the image is arbitrary. In non-FMO mode, the macro block sequence in the slice is the same as the grating scan sequence, which is special in FMO mode. The partition can be adapted to different MTU sizes, or used for grouping and packaging.
3. Select a reference image
Selecting Image Data Based on Macro Block, slice, or frame is an effective tool for error recovery. For systems with feedback, the encoder obtains information about the lost image areas during transmission, and you can refer to the image to select and decode the original image area corresponding to the image that has been correctly received for reference. In systems without feedback, redundant encoding will be used to increase error recovery performance.
4. Data Division
Generally, the data of a macro block is stored together to form a piece. Data Division re-combines the macro block data of a piece, the Macro Block semantic-related data is divided into parts.
There are three different data classifications in H.264.
Header Information Division: contains the macro block type, quantization parameters, and motion vectors, which are the most important pieces of information.
Intra-frame information Division: Includes intra-frame cbps and intra-frame coefficients. Intra-frame information can prevent the spread of errors.
Inter-frame information Division: contains the cbps between frames and the inter-frame coefficient, which is usually much larger than the first two divisions.
Intra-frame information division work out intra-frame macro Blocks Based on header information, and inter-frame information division work out inter-frame macro Blocks Based on header information. The division of Inter-frame information is of the lowest importance and has no contribution to key synchronization. When data division is used, the data in the slice is saved to different caches based on its type, and the size of the slice must be adjusted to make the maximum partition smaller than the MTU size.
If the decoding end obtains all the divisions, it can completely reconstruct the slices. If the decoding end finds that the intra-frame information or inter-frame information is lost, the available header information still has good error recovery performance. This is because the macro block type and the motion vector of the macro block contain the basic features of the macro block.
5. Use of parameter sets
The parameter set of the sequence (SPS) includes all the information of an image sequence, and the image parameter set (PPS) includes information of all the pieces of an image. Multiple different sequences and image parameter sets are sorted and stored in the decoder. The encoder sets the image parameter set based on the sequence parameter set and selects an appropriate image parameter set based on the storage address of the titles of each encoded part. Only by focusing on the parameters of the sequence and the parameters of the image can the recovery performance of H.264 errors be enhanced.
The key to using a parameter set in an error channel is to ensure that the parameter set arrives at the decoding end in a timely and reliable manner. For example, in real-time channels, encoders use reliable control protocols to send them out-of-band transmission as early as possible, enable the control protocol to send the first slices that reference new parameters to the decoder before they arrive. Another method is to use application layer protection to resend multiple backup files, make sure that at least one backup data arrives at the decoding end. The third method is to set the fixed parameter set in the hardware of the decoder.
6. flexible macro block sequence (FMO)
The flexible macro block order is H. one of the major features of 264 is that macro block order ing tables (mbamap) can be set to randomly assign macro blocks to different slices. the FMO mode breaks the original Macro Block order and reduces coding efficiency, the latency is increased, but the anti-code performance is enhanced. The FMO mode divides various image modes, including the chessboard mode and the rectangle mode. Of course, the FMO mode can also split macro blocks in one frame in sequence, so that the split part size is smaller than the MTU size of the wireless network. After the FMO mode is used, the image data is transmitted separately. The chessboard mode is used as an example, when data in one group is lost, the data in the other group (including information about adjacent macro blocks with lost macro blocks) can be overwritten. The experimental data shows that when the loss rate is 10% (for video conferencing applications), the image after error masking still has a high quality.
Quantity.
7. Redundancy Film Method
The front mentioned that when a system without feedback is used, the reference frame selection method cannot be used for error recovery. Redundant fragments should be added during encoding to improve the anti-error performance. It should be noted that the encoding parameters of these redundant parts are different from those of non-redundant parts, that is, they are appended with a fuzzy redundant part after a clear piece. Clear the image before decoding. If it is available, discard the redundant image; otherwise, use the redundant fuzzy image to reconstruct the image.
H.264 Real-time transmission protocol (RTP)
1. RTP load specification
The network protocol environment of H.264 has been described in the second part. Here we will discuss in detail the RTP load specification and anti-code performance. RTP reduces the packet loss rate at the receiving end by sending redundant information, which increases latency. Unlike redundant fragments, the added redundant information is the backup of some key information, it is suitable for non-Equal-weight protection at the application layer. The following describes three specifications related to multimedia transmission.
(1) group replication is performed multiple times, and the sender replicates and resends the most important bit Information Group, so that the receiver can receive the packet correctly at least once, at the same time, the receiving end should discard the redundant backups of the groups that have been correctly received.
(2) group-based forward correction: this operation performs an exclusive or operation on the protected group and sends the operation result to the receiver as redundant information. Due to latency, it is not used for dialog applications and can be used for streaming media.
(3) Audio redundancy encoding, which can protect any data streams including videos. Each group consists of headers, loads, and loads of the previous group. H.264 can be used together with data splitting.
2. H.264 concept of NAL Unit
H.264 The Nal unit packs the encoding data. The Nal unit consists of a 1-byte header, three fixed-length fields, and an uncertain number of encoding segments.
Header mark Syntax: NALU type (5bit), importance indication bit (2bit), and prohibition bit (1bit ).
NALU type: 1 ~ 12 used by H.264, 24 ~ 31 is used by applications other than H.264.
Importance indication: indicates the importance of the nal unit for reconstruction. The greater the value, the more important it is.
Bit prohibited: if the network discovers that the nal unit has a bit error, you can set this bit to 1 so that the receiver can discard this unit.
3. Group packaging rules
(1) The additional overhead is less, so that the MTU size is between 100 and ~ 64 K Bytes range;
(2) The importance of the group can be identified without decoding the data in the group;
(3) the load specification shall ensure that the Group cannot be decoded because of the loss of other bits without decoding;
(4) supports dividing NALU into multiple RTP groups;
(5) Multiple NALU nodes can be grouped into one RTP group.
The RTP Header can be the NALU header, and the preceding packaging rules can be implemented.
4. Simple Packaging
Put a NALU in an RTP group, put the NALU (including the NALU header simultaneously serving as the load header) into the RTP load, and set the RTP Header Value. In order to avoid further division of large groups by the IP layer, the size of the slice group is generally smaller than the MTU size. Due to the different transfer paths of packets, the decoder must sort the packets again. The RTP sequence information can be used to solve this problem.
5. NALU Segmentation
For pre-encoded content, NALU may be larger than MTU size limit. Although the division of the IP layer can make the data block smaller than 64 kilobytes, it cannot be protected at the application layer, thus reducing the effect of the non-classified protection solution. Because UDP data packets are smaller than 64 kilobytes and the length of a piece is too small for some applications, application layer packaging is part of the RTP packaging solution.
The new discussion Scheme (IETF) should meet the following characteristics:
(1) The NALU blocks are transmitted in ascending order according to the RTP sequence number;
(2) mark the first and last NALU blocks;
(3) detect lost parts.
6. Merge nalu
Some nalu, such as SEI, and parameter set, are very small. merging them helps reduce the header overhead. Two collection groups exist:
(1) A single time collection group (STAP), which is combined by timestamp;
(2) multi-time collection group (mtap). Different timestamps can also be combined.
This article focuses on several powerful tools for error recovery in H.264 under the conditions of IP network restrictions. However, different tools must be combined in different IP networks to achieve efficient coding and transmission. Because the current wireless network limits the MTU size and latency, the error recovery tool can be combined with image segmentation, Data Division, and RTP grouping technology, avoid using redundant information and feedback to improve error recovery performance. In addition, the efficient FMO encoding mode can greatly improve the anti-Packet Loss Performance of coding.
Link: http://www.ltesting.net/ceshi/ruanjianceshikaifajishu/rjcshjdj/wlfwq/2007/0713/132391.html