RTP protocol full solution (h264 bitstream and PS Stream)

Source: Internet
Author: User

RTP h264 PS MPEG-2

Preface: RTP analysis. I found a lot of information on the Internet, but it was not complete. So I tried to make a comprehensive analysis,

I have referenced many articles and listed them at the end of the article. I would like to express my gratitude.

The development of the Internet is inseparable from the selfless dedication of everyone. I decided to start from me and hope everyone can support it.


Original is not easy, reprint Please attach link, thank you http://blog.csdn.net/chen495810242/article/details/39207305


1. RTP Header Parsing

Figure 1

1) V: the version number of the RTP protocol, which occupies 2 digits. The current version number is 2.

2) P: Fill sign, which occupies 1 digit. If p = 1, fill one or more additional eight-digit groups at the end of the message. They are not part of the payload.

3) X: the extension flag occupies 1 position. If x = 1, an extension header is followed by the RTP Header.

4) CC: CSRC counter, which is 4 digits and indicates the number of CSRC identifiers

5) M: indicates the start of a session. The effective loads have different meanings. For a video, the end of a frame is marked. for audio, the start of a session is marked.

6) Pt: indicates the type of the payload in RTP packets, such as GSM audio and jpem images, most of the streaming media is used to distinguish audio streams from video streams, which facilitates client resolution.

7) serial number: The 16-digit serial number used to identify the serial number of the RTP message sent by the sender. Each sent packet increases by 1. This field can be used to check packet loss when the underlying bearer protocol uses UDP and the network condition is poor. Network jitters can be used to re-Sort data. The initial values of the serial number are random, and the sequence of the audio package and the video package is recorded separately.

8) Time Stamp (timestamp): 32-bit, must use a 90 kHz clock frequency. The timestamp reflects the sampling time of the first eight-bit Group of the RTP packet. The timestamps used by the receiver are used to calculate latency and latency jitter and perform synchronization control.

9) Synchronous source (SSRC) identifier: 32-bit, used to identify synchronous source. This identifier is randomly selected. Two synchronous sources participating in the same video meeting cannot have the same SSRC.

10) Special Source (CSRC) identifier: Each CSRC identifier occupies 32 bits and can have 0 ~ 15. Each CSRC identifies all the special sources contained in the RTP message payload.

Note: The basic RTP description does not define any header extension itself. If x = 1 is encountered, special processing is required.


The bitstream is as follows:

80 E0 00 1E 00 D2 F0 00 00 00 41 9B 6B 49 € ?....??.... A? Ki

E1 0f 26 53 02 1A ff06 59 97 1D D2 2E 8C 50 01 ?. & S... y ?.?.? P.

CC 13 EC 52 77 4E e50e 7b FD 16 11 66 27 7C B4 ?.? RWN ?. {?.. F' |?

F6 E1 29 D5 D6 A4 ef3e 12 D8 FD 6C 97 51 E7 E9 ???) ???> .?? L? Q ??

Cfc7 5E C8 A9 51 F6 82 65 D6 48 5A 86 B0 E0 8C ?? ^ ?? Q ?? E? Hz ???? Where, 80 is v_p_x_cce0 is m_pt00 1e is sequencenum00 00 D2 f0 is Timestamp
00 00 00 is SSRC
Replace the first two bytes with the binary values as follows: 1000 0000 1110 0000 in order: 10 is V; 0 is P; 0 is X; 0000 is CC; 1 is m; 110 0000 is pt;

The layout is not as clear as the word. Let's take a look.
Original is not easy, reprint Please attach link, thank you http://blog.csdn.net/chen495810242/article/details/39207305

2. RTP load h264 code stream
Figure 2

The load format defines three different basic load structures. The receiver can identify the load structure by the first byte and the last five digits (2) of the RTP load.

1) single nal unit package: the load contains only one nal unit. The Nal header type field is equal to the original nal unit type, that is, within the range of 1 to 23

2) Aggregation package: This type is used to aggregate multiple NAL units to a single RTP load. This package has four versions, single time aggregation package type A (STAP-A), single time aggregation package type B (STAP-B), Multi time aggregation package type (mtap) 16-bit displacement (mtap16), mtap 24-bit displacement (mtap24 ). Given STAP-A, STAP-B, mtap16, mtap24 nal unit type numbers are 24, 25, 26, 27, respectively

3) Partition unit: Used to partition a single nal unit to multiple RTP packets. Two existing versions of FU-A, FU-B, identified with NAL unit type 28, 29

 

The common packaging rules are as follows: if the package is smaller than MTU, a single nal unit package is used, and if the package is larger than MTU, FUS is used.
Because the common packaging method is a single nal package and FU-A, so we only parse these two. 2.1 single nal unit package
Figure 3

The Nal unit package defined here must contain only one. This means that the aggregation package and the partition unit cannot be used in a single nal unit package. And the RTP serial number must conform to the decoding sequence of the nal unit. The first byte of the nal unit overlaps with the first byte of the RTP load header. 3.

When packaging an h264 code stream, you only need to add a 12-byte RTP Header before the frame.
2.2 partition unit (FU-A)
Figure 4

Fragments are defined only for a single nal unit and are not used for any aggregation package. A part of the nal unit consists of an integer continuous nal unit bytes. Each nal unit byte must be part of a shard of the nal unit. Fragments of the same nal unit must be continuously sent using an incremental RTP sequence number (no other RTP packet exists between the first and last fragments ). Similarly, the nal unit must be assembled according to the RTP sequence number.

When a nal unit is shipped in the FUS, It is referenced as the nal unit. Staps and mtaps cannot be sharded. FUS cannot be nested. That is, one Fu cannot contain another Fu. The RTP timestamp of the shipping Fu is set to the NALU time of the component chip nal unit.

Figure 4 indicates the RTP load format of the FU-A. The FU-A consists of 1 byte fragment unit indication (5), 1 byte fragment unit header (6), and fragment unit load.

S: 1 bit when set to 1, the start bit indicates the start of the nal unit of the shard. When the followed Fu load is not the start of the multipart nal unit load, the start position is set to 0.

E: 1 bit when set to 1, the second bit indicates the end of The multipart nal unit, that is, the last byte of the load is also the last byte of the multipart nal unit. When the followed Fu load is not the last part of the nal unit, the priority is set to 0.

R: 1 bit Reserved Bit must be set to 0, and the recipient must ignore this bit

During packaging, the first three digits of the original nal header are the first three digits of the Fu indicator, and the last five digits of the original nal header are the last five digits of the Fu header.

Analyze the bitstream as follows:

80 60 01 0f 00 0e 10 00 00 0000 00 7C 85 88 82 € '.......... | ???

00 0a 7f ca 94 05 3b7f 3E 7f Fe 14 2B 27 26 F8...?...> .?. + '&?

89 88 dd 85 62 E1 6dfc 33 01 38 1A 10 35 F2 14 ???? B? M? 3. 8... 5 ?.

84 6e 21 24 8f 72 62f0 51 7E 10 5f 0d 42 71 12? N! $? RB? Q ~. _. Bq.

17 65 62 A1 F1 44 dc df 4B 4A 38 AA 96 B7 dd 24. EB ?? D ?? Kj8 ???? $ The first 12 bytes are RTP header7c, Fu indicator85, Fu headerfu indicator (0x7c), and Fu header (0x85) are converted into binary values as follows: 0111 1100 1000 0101 resolution in sequence: 0 is F11 is nri11100 is Fu type, here is 28, that is, the FU-A1 is S, start, it indicates that the first packet 0 is E, end, if it is the last packet of the part, set it to 1. Here not 0 is R, remain, Reserved Bit, always 000101 is nal type, here is 5, it indicates that it is a key frame (I do not know why it is a key frame, please google)

During packaging, F and NRI of fuindicator are f and nri in the nal header, and type is 28. S, E, and r of the Fu header are set according to the starting position of the shard, type is the type in the nal header.

When unpacking, take the first three digits of the Fu indicator and the last five digits of the Fu header, that is, 0110 0101 (0x65) is of the nal type.
3. RTP load PS stream

The following PS encapsulation is made for h264: Each idr nalu usually contains NALU such as SPS and PPS. Therefore, the NALU of SPS, PPs and IDR is encapsulated into a PS package, including the PS header, add PS system header, PS system map, PES header + h264 raw data. Therefore, the order of an idr nalu ps package from external to internal is: psheader | PS system header | PS system map | PES header | h264 raw data. For other non-key frame PS packages, it is much simpler. Simply add the PS header and PES header. The sequence is: PS header | PES header | hsf-raw data. The above is only for video. If you want to package the audio into PS encapsulation, you can also. When audio data exists, add the PES header to the video PES. The sequence is as follows: PS package = Ps header | PES (video) | PES (audio), which can be sent in RTP encapsulation.

Gb28181 specifies the data load type transmitted by RTP (refer to gb28181 appendix B), 96-127 of the load type

Rfc2250 recommended 96 indicating PS encapsulation, recommended 97 for MPEG-4, recommended 98 for h264

That is, the RTP packet we receive must first determine the load type. If the load type is 96, the PS is used for decoding and audio and video are decoded separately. If the load type is 98, it is decoded Based on the h264 decoding type.

Note: This method is not necessarily accurate, depending on whether the packaging format is standard

The value of stream type in PS package is as follows:

1) MPEG-4 Video Stream: 0x10;

2) H.264 Video Stream: 0x1b;

3) svac Video Stream: 0x80;

4) g.711 audio streams: 0x90;

5) g.722.1 audio stream: 0x92;

6) g.723.1 audio stream: 0x93;

7) g.729 audio streams: 0x99;

8) svac audio stream: 0x9b.
3.1. PS Baotou

Figure 7

1) Pack start code: the starting code segment of the package, which is a bit string with a value of 0x000001ba. It is used to mark the start of a package.

2) system clock reference base, system clock reference extenstion: system clock reference field.

3) Pack stuffing length: the package filling length field, which is a three-digit integer and specifies the number of bytes after this field is filled.

80 60 53 1f 00 94 89 00 00 00 00 00 01 Ba €'s ..??........?

7E FF 3E FB 44 01 00 5f 6B F8 00 00 01 E0 14 53 ~.>? D... _ k ?...?. S

80 80 05 2f bf cf bed1 1C 42 56 7b 13 58 0a 1E € ./????. BV {. X ..

08 B1 4f 33 69 35 0453 6D 33 A8 04 15 58 D9 21 .? O3i5. Sm3 ?.. X ?!

9741 B9 F1 75 3D 94 2B 1f BC 0b B2 B4 97 bf 93? A ?? U =? + .?.?????

The first 12 digits are RTP headers, which are not described here;

000001ba is the header start code;

The next nine digits include SCR, scre, and muxrate. For details, see Figure 7.

The last bit is the Reserved Bit (0xf8), which defines whether there is expansion. The binary is as follows:

1111 1000

The first five digits are skipped, and the last three digits indicate the extended length. Here is 0.

3.2 system title


Figure 8systemheader exists only when the pack is the first packet, that is, the PS header is the system title. A bid with a value of 0x000001bb. It indicates the start of the system title and does not need to be processed temporarily. you can skip the header length.
3.3 program systeming stream systemheader exists only when pack is the first packet, that is, Program Stream ing is followed by the system title. Value: 0x000001bc. It indicates the start of Program Stream ing and does not need to be processed temporarily. you can skip the header length. The structure of the first five bytes is the same as that of the system title, as shown in figure 8.


Take a piece of bitstream Analysis System title and program ing stream

00 00 01 Ba 45 A9 D4 5C 34 0100 5f 6B F8 00 00 ...? E ?? \ 4 .. _ k ?..

01 BB 00 0C 80 cc F5 04 E1 7f E0 E0 E8 C0 C0 20 .?.. € ??.?.?????

00 00 01 BC 00 1E E1 ff00 00 00 18 1B E0 00 0C ...?..?......?..

2a 0a 7f ff 00 00 0708 1f Fe A0 5A 90 C0 00 00 *........?? Z ??..

00 00 00 00 00 01 E0 7f E0 80 80 0521 6a 75 .......?.? € .! Ju

The first 14 bytes are PS headers (note that there is no extension );

The next 00 01 BB is the starting code of the system title;

The following 00 0C illustrates the length of the system title (excluding the start code and Length Byte itself );

The next 12 bytes are the specific content of the system title, which is not parsed here;

Continue to see 00 00 01 BC, which is the starting code of the program ing stream;

The following 00 1E also indicates the length;

Skipping E1 FF is useless;

Next, 00 18 represents the basic stream length, indicating that there are still 24 bytes;

The following 1B indicates the h264 encoding format;

The next byte E0 indicates the video stream;

The next 00 0C also represents the next 12 bytes;

Skip the 12 bytes and see 90. This is the g.711 audio format;

The next byte is C0, indicating the audio stream;

The next 00 also indicates the length, which is 0;

The next four bytes are CRC, and cyclic redundancy is verified.

The program ing flow is parsed. (Tired ).


Original is not easy, reprint Please attach link, thank you http://blog.csdn.net/chen495810242/article/details/39207305


The drama is still coming soon.

3.4. PES group Header


Figure 9

Don't be scared by such a long image. In fact, the principle is the same, but you have to deal with each of them.

1) packet start code prefix: A bit string with a value of 0x000001. It and the subsequent Stream ID constitute the start code of the group that marks the start of a package.

2) Stream ID: specifies the number and type of the basic stream in the Program Stream. 0x (C0 ~ DF) refers to the audio, 0x (E0 ~ EF) is a video.

3) PEs Packet Length: 16-bit field, indicating the number of bytes following this field in the PES group. If the value is 0, the PES group length is either unspecified or unlimited. This condition can only occur in the PES group where the Server Load balancer contains the bytes of a basic video stream from the transmission stream group.

4) pts_dts: two-digit field. When the value is '10', the PTS field should appear in the PES group title; when the value is '11', both the PTS field and DTS field should appear in the PES group title; when the value is '00', the PTS field and DTS field cannot appear in the PES group title. The value '01' is not allowed.

5) ESCR: 1 bit. If '1' is set, the basic and extended fields of ESCR appear in the title of the PES group. If '0' is set, the ESCR field is not displayed.

6) esrate: 1 digit. If '1' is set, the es rate field appears in the title of the PES group. If it is set to '0', the es rate field does not exist.

7) dsmtrick mode: 1 bit. If '1' is set, there are eight-digit stunt mode fields. If '0' is set, this field does not exist.

8) additionalinfo: 1 bit. Append the copyright information flag field. If '1' is set, additional copy information fields exist. If the value is '0', this field does not exist.

9) CRC: 1 bit. If '1' is set, the CRC Field appears in the title of the PES group. If '0' is set, this field is not found.

10) extensionflag: 1-digit flag. If '1' is set, the PES group title contains an extended field. If '0' is set, this field is unavailable.

PES header data length: 8 bits. PES title Data Length field. Specifies the optional fields contained in the PES group title and the total number of bytes occupied by any padding. The bytes before this field indicate whether there are optional fields.


Old Rules: upload code streams:

00 00 01 E0 21 33 80 05 2B 5f DF 5C 95 71 84 ...?! 3 €. + _? \? Q?

AA E4 E9 E9 EC 40 cc17 E0 68 7b 23 F6 89 DF 90 [email protected] ??? H {#????

A9d4 be 74 B9 67 AD 34 6D F0 92 0d 5A 48 DD 13 ??? T? G? 4 m ??. Zh ?.
00 00 01 is the start code;

E0 is a video stream;

21 33 indicates the frame length;

For the next two 80 s, see the binary parsing below;

The next byte 05 indicates the length of the optional field, and the previous byte indicates whether there are optional fields;

The next 5 bytes are PTS;

The binary values of 7th and 8 bytes are as follows:

1000 0000 1000 0000

Resolution in order:

7th Bytes:

10 is the flag position and must be 10;

00 is the disturbance control field. '00' indicates that no encryption is available. The remaining and 11 are customized by the user;

0 is the priority, 1 is high, 0 is low;

0 indicates the Data Alignment field;

0 is the copyright field;

0 is the original or copy field. When '1' is set, it indicates that the content of the relevant PES group's payload is original; '0' indicates that the content is a copy;

8th Bytes:

10 is the pts_dts field. Here is 10, which indicates that there is PTS and no DTS;

0 is the ESCR flag field. Here it is 0, indicating that there is no segment;

0 is the es rate flag field. Here it is 0, indicating that no segment exists;

0 is the DSM stunt method flag field. Here, it is 0, indicating that no segment exists;

0 is the field for attaching the copyright information mark. Here it is 0, indicating that no segment exists;

0 is the pescrc flag field. Here it is 0, indicating that there is no segment;

0 is the PES extension flag field. Here it is 0, indicating that no segment exists;

This code stream only has pts. paste the parsing function.

Unsigned long parse_time_stamp (const unsigned char * P) {unsigned long B; // a total of 33 digits. After overflow, unsigned long Val starts from 0; // 1st, 6, and 7 bits of 5th bytes = * P ++; val = (B & 0x0e) <29; // The first 7 bits of the 8 bits and 2nd bits = (* (p ++) <8; B ++ = * (p ++ ); val + = (B & 0 xfffe) <14 ); // The first 7 bits of the 8 bits and 4th bits = (* (p ++) <8; B ++ = * (p ++ ); val + = (B & 0 xfffe)> 1); Return val ;}

For other fields, refer to Protocol parsing.


Written below:

For the first time, I would like to thank @ cmengwei for his selfless help and a lot of help. Thank you very much.


I put all the documents in my resources and have one download point. Don't be stingy. It's definitely worth it!


RTP payload format for H.264 video

Http://download.csdn.net/detail/chen495810242/7904367

MPEG2-2 (Chinese Version 13818)

Http://download.csdn.net/detail/chen495810242/7904401


For the code of RTP load h264, refer:

Http://blog.csdn.net/dengzikun/article/details/5807694

For the RTP load PS stream code, refer:

Http://www.pudn.com/downloads33/sourcecode/windows/multimedia/detail105823.html

Http://www.oschina.net/code/snippet_99626_23737


Please do not ask for source code with me. Refer to the Code provided by me. You can write a program that can run normally.

It is better to teach people to fish than to teach them to fish.


Other references:

Http://blog.csdn.net/duanbeibei/article/details/1698183

Http://blog.csdn.net/wwyyxx26/article/details/15224879


Original is not easy, reprint Please attach link, thank you http://blog.csdn.net/chen495810242/article/details/39207305


RTP protocol full solution (h264 bitstream and PS Stream)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.