H.264 Video Transmission System Based on RTP protocol

Source: Internet
Author: User
Tags coding standards

1.Introduction
With the development of the information industry, people's requirements for information resources have gradually transitioned from text and images to audio and video, and are increasingly emphasizing the real-time and interactive access to resources. However, people are faced with another unavoidable embarrassment, that is, they have to spend a lot of time waiting for file transmission while seeing vivid and clear media presentations on the network. To solve this conflict, a new media technology emerged, which is the streaming media technology. Streaming media has gradually become the first choice because of its advantages such as low startup latency and saving client storage space. streaming media network applications are also evolving globally. The real-time stream transmission protocol
RTP details the standard data packet format for transmitting audio and video on the Internet. It is used in combination with the transmission control protocol RTCP and has become one of the most widely used protocols in streaming media technology.

H. 264/AVC is a new generation of video coding standards jointly developed by a Joint Video group (JVT) consisting of the ITU-T video coding Expert Group (VCEG) and the ISO/IEC dynamic image Expert Group (mPEG, its biggest advantage is its high data compression ratio. h. 264 of the compression ratio is more than 2 times of the MPEG-2, is the MPEG-4 of 1.5 ~ 2 times. At the same time, the video encoding layer (VCL) and network extraction layer (NAL) are used.
The hierarchical design is very suitable for real-time transmission of streaming media technology. This article is based on the RTP protocol. 264 video streaming package transmission, implementing a basic Streaming Media Server function, and using the open-source player VLC as the receiving end, forming a complete H. 264 video transmission system.

2. settings of key RTP Parameters RTP protocol is a new protocol proposed by IETF in 1996 for real-time data transmission. The RTP protocol is composed of real-time transport protocol and real-time transport control protocol. The RTP protocol provides real-time transmission of continuous media data based on multicast or unicast networks. The RTCP protocol is
The control part of RTP protocol is used to monitor the quality of data transmission in real time and provide congestion control and Flow Control for the system. The RTP protocol is described in detail in rfc3550. Each RTP packet consists of a fixed header and a payload. The meaning of the first 12 bytes of the packet header is fixed, the load can be audio or video data. The fixed RTP Header Format is 1:
  

 The key parameter settings are described as follows:
(1) Mark bit (m): 1 bit. The meaning of this mark bit is generally defined by a specific media application framework (profile) to mark important events in the RTP stream.
(2) load type (PT): 7-bit, used to indicate the specific RTP load format. In rfc3551, the default values of RTP transmission load types in common audio and video formats are defined. For example, type 2 indicates that the RTP data packet carries voice data encoded using the ITU g.721 algorithm. The frequency is 8000Hz and the single channel is used.
(3) serial number: 16 bits. Each time an RTP packet is sent, the serial number is added to 1. The receiver can use it to detect group loss and restore the Group order.
(4) timestamp: 32-bit. The timestamp indicates the sampling time of the first byte in the RTP data group, reflecting the deviation between each RTP packet and the initial value of the timestamp. For the RTP sending end, the sampling time must come from a linear monotonic increasing clock.
It is not difficult to see from the RTP data packet format that it contains the media transmission type, format, serial number, timestamp, and whether there is additional data. These provide a foundation for Real-Time Streaming Media transmission. The Transport Control Protocol RTCP provides congestion control and Traffic Control for RTP transmission. For the specific packet structure and meaning of each field, see rfc3550.

3. H.264 basic stream structure and Transmission Mechanism 3.1 H.264 basic stream Structure The elementary stream (ES) structure of H.264 is divided into two layers, including the video encoding layer (VCL) and network adaptation layer (NAL ). The video encoding layer is responsible for efficient video content representation, while the network adaptation layer is responsible for packaging and transmitting data in an appropriate manner as required by the network.
The benefits of introducing Nal and separating it from VCL include:
1. Separate signal processing from network transmission. VCL and NAL can be implemented on different processing platforms;
The VCL and NAL separation design eliminates the need for the gateway to reconstruct and recode VCL bit streams in different network environments.
The basic stream of H.264 is composed of a series of NALU (Network isolation action layer unit), with different NALU data volumes. Draft H.264 States [2] That when a data stream is stored on the media, a start code 0x264 is added before each NALU to indicate the start and end positions of a NALU. In this mechanism, * the start code is detected in the code stream. As a NALU start ID, when the next start code is detected, the current NALU ends. Each NALU unit consists of a single-byte NALU header and several bytes of load data (rbsp. Nalu
Header Format 2:

F: forbidden_zero_bit.1. If there is a syntax conflict, it is 1. When the Network identifies that this unit has a bit error, it can be set to 1 so that the receiver can lose this unit.
NRI: nal_ref_idc.2 bits are used to indicate the importance level of the NALU. The greater the value, the more important the current NALU is. If the value is greater than 0, there is no specific rule.
Type: 5 bits, indicating the NALU type. See table 1:
Note that the NALU with the nri values of 7 and 8 is a sequential parameter set (SPS) and an image parameter set (PPS), respectively ). The parameter set is a set of data that seldom changes and provides decoding information for a large number of vcl nalu. The sequence parameter set acts on a series of continuous encoded images, while the image parameter set acts on one or more independent images in the encoding video sequence. If * cannot correctly receive the two parameter sets, other NALU cannot be decoded. Therefore, they are generally sending other
Previously sent by nalu, and transmitted using different channels or more reliable transmission protocols (such as TCP), you can also transmit data repeatedly.
  3.2 suitable for H.264 Video Transmission
We have discussed the structure of the RTP protocol and the basic stream of H.264. How can we use the RTP protocol to transmit H.264 videos? An effective method is to strip each NALU from the H.264 video, add the corresponding RTP Header before each nalu, and then send packets containing the RTP Header and NALU. The following describes the RTP Header and NALU respectively.
The complete RTP Fixed Header Format has been pointed out in figure 1 above. According to rfc3984 [3], specific settings of each bit are given here.
V: version number, two digits. According to rfc3984, the current RTP version number should be set to 0x10.
P: Fill bit, 1 bit. Currently, no special encryption algorithm is used, so this bit is set to 0.
X: Extended bit, 1 bit. The current Fixed Header does not follow the header extension, so this bit is also 0.
Cc: CSRC count, 4 bits. Indicates the number of CSRC following the RTP Fixed Header. For the basic Streaming Media Server to be implemented in this article, the mixer is not used, and the bit is set to 0x0.
M: 1 digit. If the current NALU is the NALU at the end of an access unit, the m position is 1; or when the current RTP packet is the last part of a NALU (the NALU fragment is described later ), m position 1. In other cases, the M bit remains 0.
PT: load type, 7-bit. For H.264 video formats, no default PT value is specified currently. Therefore, you can select a value greater than 95. The value is 0x60 (96 in decimal format ).
Sq: serial number, 16 bits. The starting value of the sequence number is a random value. Here it is set to 0. Each time an RTP packet is sent, the sequence number value is added to 1.
TS: Timestamp, 32 bits. Like the serial number, the start value of the timestamp is also a random value, where it is set to 0. According to rfc3984, the clock frequency corresponding to the time stamp must be 90000Hz.
SSRC: synchronization source ID, 32-bit. SSRC should be randomly generated so that no two synchronization sources in the same RTP session have the same SSRC identifier. There is only one synchronization source, so it is set to 0x12345678.
The size of each NALU varies depending on the data volume it contains. In an IP network, when the IP packet size to be transmitted exceeds the maximum transmission unit MTU (maximum transmission unit), IP fragmentation occurs. The maximum size of IP messages (MTU) that can be transmitted over the Ethernet is 1500 bytes. If the IP packet sent is greater than MTU, the packet will be split for transmission, which will generate many data packet fragments, increase the packet loss rate and reduce the network speed. For video transmission, if the RTP package is larger than MTU and the package is split randomly by the underlying protocol, it may cause delayed playback or even abnormal playback by the receiver player. Therefore
The NALU unit must be split.
Rfc3984 provides three different RTP packaging solutions: (1) single NALU packet: Only one NALU is encapsulated in an RTP package. In this article, this packaging scheme is used for NALU smaller than 1400 bytes.
(2) Aggregation packet: Multiple NALU is encapsulated in an RTP package, which can be used for smaller nalu, thus improving transmission efficiency.
(3) fragmentation unit: a nalu is encapsulated in multiple RTP packets. In this article, a NALU larger than 1400 bytes is used for unpacking.
4. H.264 Implementation of streaming media transmission system A complete streaming media transmission system consists of two parts: server and client [5] [6]. On the server side, the main task is to read H.264 video, separate each NALU unit from the code stream, analyze the NALU type, set the corresponding RTP Header, encapsulate and send RTP data packets. For the client, the main task is to receive RTP data packets, parse the NALU unit from the RTP packet, and then send it to * for decoding and playback. Framework 3 of the Streaming Media Transmission System
.

5. Conclusion The streaming media transmission system designed in this article runs on Windows XP and uses the VLC player as the client to receive the H.264 video RTP data packets. After testing, the client can play smoothly after two seconds of buffering, and the transmission speed is set to 30 frames per second, without the phenomenon of packet loss or shadow. the subjective quality of the video is good, play the H. 264 videos have no obvious difference. Anychat adopts the international leading video encoding standard H. 264 (MPEG-4 Part 10 AVC/h. 264) encoding, H. 264/AVC has a special performance in the compression efficiency, generally reached the MPEG-2 and MPEG-4 simplified class compression efficiency of about 2 times. H.264 has many new features that are different from the old standard, and they work together to improve coding efficiency. Especially in intra-Frame Prediction and encoding, inter-Frame Prediction and encoding, variable vector block size, 1/4 Pixel Motion Estimation, multi-reference frame prediction, adaptive loop block filter, integer transformation, quantization and transformation coefficient scanning, entropy coding, and weighted prediction all have their own unique considerations.

Bai Rui uses advanced mosaic technology to ensure that no screen or mosaic occurs during video communication. Free test:
Http://www2.bairuitech.com/downloads/bairuisoft/AnyChatCoreSDK_V3.0.rar

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.