H264 RTP Envelope principle (reprint)

Source: Internet
Author: User
Tags coding standards

1. Introduction
with the development of information industry, people's demand for information resources has been gradually transition from text and pictures to audio and video, and more and more emphasis is placed on the real-time and interactivity of acquiring resources. But people are faced with another unavoidable embarrassment of having to spend a lot of time waiting to transfer files while seeing vivid and clear media demonstrations on the web. In order to solve this contradiction, a new media technology emerged, this is the streaming media technology. Because of its advantages such as low start-up time and saving of client storage space, streaming media has become the first choice for people, and the application of streaming media network has been developing continuously in the world. The real-time Streaming protocol RTP details the standard packet format for transmitting audio and video over the internet, which is used in conjunction with Transmission Control Protocol RTCP as one of the most widely used protocols for streaming media technology.
H.264/AVC is a new generation of video coding standards, combined by the ITU-T video Coding Expert Group (VCEG) and ISO/IEC dynamic Imaging Expert Group (MPEG) Joint Video Group (JVT), which has the greatest advantage of having a high data compression ratio, In the same image quality conditions, the compression ratio of H. MPEG-2 is more than twice times that of the MPEG-4, which is 1.5~2 times. At the same time, using the layered design of video coding layer (VCL) and Network Abstraction Layer (NAL), it is very suitable for streaming media technology for real-time transmission. This paper is based on RTP protocol, the streaming packaging of H. S video, realizes a basic streaming media server function, while using the open source player VLC as the receiving end, constitute a complete video transmission system of H.

2. Settings for key parameters of the RTP protocolThe RTP protocol is a new protocol for real-time data transmission that was proposed by IETF in 1996. The RTP protocol is actually composed of real-time Transport protocol RTP (real-time Transport Protocol) and real-time Transmission Control Protocol RTCP (real-time Transport control Protocol). RTP protocol is based on multicast or unicast network to provide users with continuous media data real-time transmission services; The RTCP protocol is the control part of RTP protocol, which is used to monitor the quality of data transmission in real time and provide congestion control and flow control for the system. The RTP protocol is described in detail in RFC3550. Each RTP packet consists of a fixed header (header) and two parts of the payload (Payload), where the first 12 bytes of the header are fixed, while the payload can be audio or video data. The RTP fixed header format is shown in 1: One of the more critical parameter settings is explained below:
(1) Mark Bit (M): 1 bits, the meaning of the indicator is generally defined by a specific Media application framework (profile), which is intended to mark important events in the RTP stream.
(2) Load type (PT): 7 bits, used to indicate the specific format of the RTP payload. In RFC3551, the default value for RTP transmission load types in commonly used audio and video formats is specified, for example, type 2 indicates that the RTP packet is loaded with the ITU g.721 algorithm encoded voice data, using a frequency of 8000HZ, and the use of mono.
(3) Serial number: 16 bits, each send a RTP packet, the serial number plus 1. The recipient can use it to detect packet loss and restore the grouping order.
(4) Timestamp: 32 bits, timestamp indicates the sampling time of the first byte in the RTP data packet, reflecting the deviation of each RTP packet relative to the timestamp initial value. For the RTP sender, the sampling time must originate from a linearly monotonically increasing clock.
The format of the RTP packet is not difficult to see, it contains the type of transmission media, format, serial number, timestamp, and whether there is additional data and other information. All of these provide a basis for real-time streaming media transmission. The Transmission Control Protocol RTCP provides congestion control and flow controlling for RTP transmission, its specific package structure and the meaning of each field can refer to RFC3550, which is not mentioned here. 3. The basic flow structure and its transmission mechanism structure of basic flow in 3.1 hThe structure of the basic stream (elementary stream,es) is divided into two layers, comprising a video coding layer (VCL) and a network adaptation layer (NAL). The video coding layer is responsible for efficient video content representation, while the network adaptation layer is responsible for packaging and transmitting the data in the appropriate manner required by the network. The benefits of introducing nal and separating it from VCL include two aspects: first, the separation of signal processing and network transmission, VCL and NAL can be implemented on different processing platforms, and the other, VCL and nal separation design, so that in different network environment, The gateway does not need to refactor and re-encode the VCL bitstream because of the different network environment.
The basic flow of H. A consists of a series of Nalu (Network abstraction Layer Unit) with different Nalu data volumes. Draft h. 2 states that when the data stream is stored on the media, the starting code is added before each nalu: 0x000001, which indicates the starting and ending positions of a nalu. Under such a mechanism, the decoder detects the starting code in the stream as a Nalu starting identifier, and when the next starting code is detected, the current Nalu ends. Each Nalu cell consists of a byte Nalu header (Nalu header) and several bytes of payload data (RBSP). Where the Nalu header is shown in format 2:

The F:forbidden_zero_bit.1 bit, or 1 if there is a syntax conflict. When the network recognizes that there is a bit error in this cell, it can be set to 1 so that the receiver loses the unit.
A nri:nal_ref_idc.2 bit that indicates the level of importance for the Nalu. A larger value indicates that the current Nalu is more important. The specific value is greater than 0 o'clock, there is no specific provision. Type:5 bit, indicating the type of Nalu. The details are shown in table 1:

In particular, Nalu with NRI values of 7 and 8 are sequence parameter sets (SPS) and image parameter sets (PPS), respectively. A parameter set is a group of data that is rarely changed to provide decoded information for a large number of VCL Nalu. The sequence parameter set acts on a series of sequential encoded images, and the image parameter set acts on one or more independent images in the encoded video sequence. If the decoder does not receive the two parameter sets correctly, then the other Nalu cannot be decoded. They are therefore typically sent before other Nalu are sent, and are transmitted using different channels or more reliable transport protocols such as TCP, or they can be transmitted over and over again. 3.2 transmission mechanism for H. Video
The RTP protocol and the structure of the basic stream are discussed separately, so how can the RTP protocol be used to transfer the video? One effective way is to peel each nalu from the H. S video, add the corresponding RTP header before each NALU, and then include the RTP header and Nalu The data packets sent out. The following is from the RTP Baotou and Nalu two aspects respectively elaborated.
The full RTP fixed header format is indicated in the previous Figure 1, according to Rfc3984[3], where the individual bits are specified in detail.
V: Version number, 2 bits. Depending on the RFC3984, the RTP version number currently used should be set to 0x10.
P: Fill bit, 1 bits. A special encryption algorithm is not currently used, so this bit is set to 0.
X: Extension bit, 1 bits. The current fixed head is not followed by a header extension, so the bit is also 0.
CC:CSRC count, 4 bits. Represents the number of CSRC behind the RTP fixed header, which is not used for the basic streaming media server to be implemented in this article, and is also set to 0x0.
M: Marker bit, 1 bits. If the current Nalu is the last Nalu of an access unit, the M position is 1, or the current RTP packet is the last shard of a nalu (Nalu shards are described later), M position 1. In the remaining case, the M-bit remains at 0.
PT: Load type, 7 bits. For the H. Video format, there is currently no default PT value specified. Therefore, a value greater than 95 can be selected. This is set to 0x60 (decimal 96).
SQ: Serial number, 16 bits. The starting value of the ordinal is a random value, which is set to 0, and each RTP packet is sent with an ordinal value plus 1.
TS: Timestamp, 32 bits. As with the ordinal, the start value of the timestamp is also a random value, which is set to 0. Depending on the RFC3984, the clock frequency corresponding to the timestamp must be 90000HZ.
SSRC: Sync source indicator, 32 bits. SSRC should be randomly generated so that no two synchronization sources have the same SSRC identifier during the same RTP session. There is only one synchronization source here, so set it to 0x12345678.
For each NALU, the size varies depending on the amount of data it contains. In an IP network, IP fragmentation occurs when the IP message size to be transmitted exceeds the Maximum Transmission unit MTU (Maximum Transmission Unit). The maximum amount of IP packets (MTU) that can be transmitted in an Ethernet environment is 1500 bytes. If the IP packets sent are larger than the MTU, the packets are removed and sent, which results in a lot of packet fragmentation, increased packet loss, and reduced network speed. In the case of video transmission, if the RTP packet is larger than the MTU and is arbitrarily split by the underlying protocol, it may cause delayed playback of the receiving side player or even not play properly. Therefore, for NALU units larger than the MTU, the unpacking process must be performed. RFC3984 gives the different RTP packaging schemes in 3: (1) Single Nalu Packet: Only one nalu is encapsulated in one RTP package, and this package is used for Nalu less than 1400 bytes in this article.
(2) Aggregation Packet: Encapsulating multiple Nalu in one RTP package, this packaging scheme can be used for smaller nalu to improve transmission efficiency.
(3) Fragmentation Unit: One Nalu encapsulated in multiple RTP packets, in this article, for a nalu larger than 1400 bytes, this scheme is used for unpacking. 4. Implementation of streaming media transmission system in H.A complete streaming media system consists of server side and client two parts [5][6]. For server-side, its main task is to read the H. s video, separate each Nalu unit from the stream, analyze the type of NALU, set the corresponding RTP header, encapsulate the RTP packet and send it. For the client, the main task is to receive RTP packets, parse out the NALU unit from the RTP packet, and send it to the decoder to decode and play. The streaming media transmission system is shown in frame 3.

H264 RTP Envelope principle (reprint)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.