Principles of h.2rtp packets

Source: Internet
Author: User
Tags coding standards

1. Introduction

       With the development of the information industry, people's requirements for information resources have gradually transitioned from text and images to audio and video, and are increasingly emphasizing the real-time and interactive access to resources. However, people are faced with another unavoidable embarrassment, that is, they have to spend a lot of time waiting for file transmission while seeing vivid and clear media presentations on the network. To solve this conflict, a new media technology emerged, which is the streaming media technology. Streaming media has gradually become the first choice because of its advantages such as low startup latency and saving client storage space. streaming media network applications are also evolving globally. The real-time stream transmission protocol
RTPThe standard data packet format for transmitting audio and video over the Internet is described in detail. It is used in combination with the transmission control protocol RTCP and has become one of the most widely used protocols in streaming media technology.

        H.264/AVC
It is a new generation of video coding standards jointly developed by a Joint Video group (JVT) consisting of the ITU-T video coding Expert Group (VCEG) and the ISO/IEC dynamic image Expert Group (mPEG, its biggest advantage is its high data compression ratio. h. 264 of the compression ratio is more than 2 times of the MPEG-2, is the MPEG-4 of 1.5 ~ 2 times. At the same time, the layer design of the video encoding layer (VCL) and network extraction layer (NAL) is very suitable for real-time transmission of streaming media technology. This article uses the RTP protocol to package and transfer H.264 videos in streaming mode. It provides a basic Streaming Media Server Function and uses the open-source player VLC.
As the receiver, a complete H.264 video transmission system is formed.

2. RTP
Key Protocol parameter settings                RTP
The Protocol is a new protocol proposed by IETF in 1996 for real-time data transmission. The RTP protocol is composed of real-time transport protocol and real-time transport control protocol. RTP provides real-time transmission of continuous media data based on multicast or unicast networks. RTCP The Protocol is RTP. The control part of the protocol is used to monitor the quality of data transmission in real time and provide congestion control and Flow Control for the system. The RTP protocol is described in detail in rfc3550. Every
RTP packets are composed of two parts: Fixed Header and payload. The first 12 bytes of the header have a fixed meaning, while the load can be audio or video data. The fixed RTP Header Format is 1:           The key parameter settings are described as follows:

          (1) Mark bit (m): 1 bit. The meaning of this mark bit is generally defined by a specific media application framework (profile) to mark important events in the RTP stream.

          (2) load type (PT): 7-bit, used to indicate the specific RTP load format. In rfc3551, the default values of RTP transmission load types in common audio and video formats are defined. For example, type 2 indicates that the RTP data packet carries voice data encoded using the ITU g.721 algorithm. The frequency is 8000Hz and the single channel is used.

          (3) serial number: 16 bits, each sending an RTP Data packet, with the serial number plus 1. The receiver can use it to detect group loss and restore the Group order.

          (4) timestamp: 32 Time stamp indicates RTP The sampling time of the first byte in the Data Group reflects the deviation between each RTP packet and the initial value of the timestamp. For the RTP sending end, the sampling time must come from a linear monotonic increasing clock.

            From RTP The data packet format is not hard to see. It contains the type, format, serial number, timestamp, and whether additional data is transmitted. These provide a foundation for Real-Time Streaming Media transmission. The Transport Control Protocol RTCP provides congestion control and Traffic Control for RTP transmission. For the specific packet structure and meaning of each field, see rfc3550.
3. H.264
Basic stream structure and Transmission Mechanism3.1
H.264 structure of the basic streamH.264
The structure of the basic stream (ES) is divided into two layers, including the video encoding layer (VCL) and network adaptation layer (NAL ). The video encoding layer is responsible for efficient video content representation, while the network adaptation layer is responsible for packaging and transmitting data in an appropriate manner as required by the network. The benefits of introducing Nal and separating it from VCL include two aspects: first, separating signal processing from network transmission. VCL and NAL can be implemented on different processing platforms; the VCL and NAL separation design eliminates the need for the gateway to reconstruct and recode VCL bit streams in different network environments.

            H.264 basic streams are composed of a series of nalu (Network isolation action layer unit), different NALU data volumes are different. Draft H.264 States [2] That when a data stream is stored on the media, a start code 0x264 is added before each NALU to indicate the start and end positions of a NALU. In this mechanism, the decoder detects the starting code in the code stream as a NALU start ID. When the next starting code is detected, the current NALU ends. Each NALU unit consists of one byte NALU header (NALU
Header) and several bytes of load data (rbsp. The format of the NALU header is 2:

              F: forbidden_zero_bit.1
If there is a syntax conflict, it is 1. When the Network identifies that this unit has a bit error, it can be set to 1 so that the receiver can lose this unit.

              NRI: nal_ref_idc.2 bits are used to indicate the importance level of the NALU. The greater the value, the more important the current NALU is. If the value is greater than 0, there is no specific rule.  Type: 5
Indicating the NALU type. See table 1:

            Note that nri
The NALU values of 7 and 8 are the sequential parameter set (SPS) and the image parameter set (PPS), respectively ). The parameter set is a set of data that seldom changes and provides decoding information for a large number of vcl nalu. The sequence parameter set acts on a series of continuous encoded images, while the image parameter set acts on one or more independent images in the encoding video sequence. If the decoder fails to correctly receive the two parameter sets, the other nalu It cannot be decoded. Therefore, they are generally sending other nalu It can also be transmitted over different channels or more reliable transmission protocols (such as TCP.  3.2 Applicable
H.264 Video Transmission Mechanism

              RTP is discussed earlier.
Protocol and the basic stream structure of H.264, how can we use RTP to transmit H.264 videos? An effective method is to strip each NALU from the H.264 video, add the corresponding RTP Header before each nalu, and then send packets containing the RTP Header and NALU. The following describes the RTP Header and NALU respectively.

          Complete RTP The Fixed Header Format has been pointed out in figure 1 above. According to rfc3984 [3], the specific settings of each bit are given here.

          V: version number, two digits. According to rfc3984, the current RTP version number should be set to 0x10.

          P: Fill bit, 1 bit. Currently, no special encryption algorithm is used, so this bit is set to 0.

          X: Extended bit, 1 bit. The current Fixed Header does not follow the header extension, so this bit is also 0.

      Cc: CSRC count, 4 bits. Indicates that The number of CSRC after the Fixed Header. For the basic Streaming Media Server to be implemented in this article, the mixer is not used, and the bit is set to 0x0.

        M: 1 digit. If the current NALU is the NALU at the end of an access unit, the m position is 1; or when the current RTP packet is the last part of a NALU (the NALU fragment is described later ), m position 1. In other cases, the M bit remains 0.

          PT: load type, 7-bit. For H.264 video formats, no default PT value is specified currently. Therefore, you can select a value greater than 95. The value is 0x60 (96 in decimal format ).

          Sq: serial number, 16 bits. The starting value of the sequence number is a random value. Here it is set to 0. Each time an RTP packet is sent, the sequence number value is added to 1.

          TS: Timestamp, 32 bits. Like the serial number, the start value of the timestamp is also a random value, where it is set to 0. According to rfc3984, the clock frequency corresponding to the time stamp must be 90000Hz.

      SSRC: synchronization source ID, 32-bit. SSRC should be randomly generated so that no two synchronization sources in the same RTP session have the same SSRC identifier. There is only one synchronization source, so it is set to 0x12345678.

          The size of each NALU varies depending on the data volume it contains. In an IP network, when the IP packet size to be transmitted exceeds the maximum transmission unit MTU (maximum transmission unit), IP fragmentation occurs. The maximum size of IP messages (MTU) that can be transmitted over the Ethernet is 1500 bytes. If the IP packet sent is greater than MTU, the packet will be split for transmission, which will generate many data packet fragments, increase the packet loss rate and reduce the network speed. For video transmission, if the RTP package is larger than MTU and the package is split randomly by the underlying protocol, it may cause delayed playback or even abnormal playback by the receiver player. Therefore
The NALU unit must be split.Rfc3984
Different RTP packaging schemes in 3 are provided: (1) single
NALU packet: Only one NALU is encapsulated in an RTP package. This package is used for NALU smaller than 1400 bytes in this article.

            (2) Aggregation packet: Multiple NALU is encapsulated in an RTP package, which can be used for smaller nalu, thus improving transmission efficiency.

          (3) fragmentation unit: a nalu is encapsulated in multiple RTP packets. In this article, a NALU larger than 1400 bytes is used for unpacking.
4. H.264
Implementation of streaming media transmission system            A complete streaming media transmission system consists of two parts: server and client [5] [6]. For the server, the main task is to read H.264
Video, separate each NALU unit from the code stream, analyze the NALU type, and set the corresponding RTP Header, encapsulate RTP Send and receive data packets. For the client, the main task is to receive RTP data packets, parse the NALU unit from the RTP packet, and send it to the decoder for decoding and playback. The frame 3 of the streaming media transmission system is shown in.

Principles of h.2rtp packets


Address: http://blog.sina.com.cn/s/blog_6fe1bc2e0100s5x2.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.