Description of RTP timestamp and load type of multimedia opening

Source: Internet
Author: User

(1) Time stamp

(2) Load type

(3) RTP Header

(1) Time stamp

There are three of

A real time unit timestamp_increse= (unsigned int) (90000.0/framerate);//+0.5);

One is the current time of the sample send Ts_current

One is increment timestamp Timestamp_increse, non-fat increment

Rtp_hdr->timestamp=htonl (ts_current);

(2) Load type

(3) RTP Header

One is a 12-byte RTP header beginning at V, ending to the ssrc sync source tag,

V: Version number

P: Patch Reserve space


CC: Synchronization Source Count

M: Mark

PT: Load Type

SEQNOMBER:RTP serial number does not send a RTP packet + +

TIMESTATMP: Data Packet sampling time

RTP_HDR->SSRC = htonl (10); Randomly specified as 10, and globally unique in this RTP session

131//     ssrc:32 bits132//     SSRC domain identify synchronization source. To prevent the same synchronization source from having the same SSRC identifier in a session, 133///     This identifier must be randomly selected. 134///     generate random identifier algorithm see directory A.6. Although the probability of choosing the same identifier is very small, 135     ///But all RTP implementation must detect and resolve the conflict. 136     ///The 8th chapter describes the probability and resolution mechanism of the conflict and the RTP-level detection mechanism, according to the unique ssrcidentifier forward loop. 137///If the source     has changed its source transport address, 138     ///must select a new ssrcidentifier for it to avoid being identified as a circulating source (see section 8.2).

unsigned long ssrc; /**//* stream number is used here. */

First, look at the format of the RTP protocol Header:

The first 12 bytes are present in each RTP packet, while a series of  tokens are only available when mixer is present.

Version (V): 2 bits
Indicates the RTP version number. The version number specified in the initial version of the Protocol 0,rfc3550 is 2.

Padding (P): 1 bit
If the bit is set, additional additional information is included at the end of the packet, and the last byte of the additional information represents the length of the extra additional information (including the byte itself). This field exists because some encryption mechanisms require a fixed-length block of data, or in order to transfer multiple RTP packets in a single underlying protocol data unit.

Extension (X): 1 bit
If the bit is set, there is an extension header after the fixed head, which is defined in RFC3550 5.3.1.

Marker (M): 1 bit
The function of this bit depends on the definition of the profile. Profile can change the length of the bit, but keep the marker and payload type Total length constant (8 bit altogether).

Payload Type (PT): 7 bits
The type of information that is tagged with the RTP packet, the standard type is listed in RFC3551. If the receiver does not recognize the type, the packet must be ignored.

Sequence Number:16 bits
Serial number, after each RTP packet is sent, the serial number is added 1, and the receiver can rearrange the packet order based on the serial number.

Timestamp:32 bits
Time stamp. Reflects the sampling time of the first byte in the packet carried by the RTP packet.

Ssrc:32 bits
Identifies the data source. Each data stream should have a different ssrc in the course of an RTP session.

Identifies the data source for the contribution. Only valid when there is mixer. such as a voice stream that combines multichannel voice streams into a single channel, the SSRC of each channel is listed here.

The 10~16 bit is the PT domain, which refers to the load type (PayLoad), and the payload type defines the format of the RTP payload, which is the original protocol that determines its interpretation.
Currently, the load type is mainly used to tell the receiving end (or player) which type of media is transmitted (such as g.729,h.264,mpeg-4, etc.) so that the receiving end (or the player) knows the format of the data stream before invoking the appropriate codec to decode or play, This is the primary function of the load type.
In the case of the ORTP library, the load type is defined as follows:

Each load type has its own unique parameters, which basically cover some of the current mainstream media types, such as PCMU, g.729, H.263 (strangely, no definition of H. i), mpeg-4, and so on. Jrtplib Library should also have a similar definition, you can go to find the source code, here I will not repeat it.

In the ORTP library and the Jrtplib library, there are functions that set the RTP payload type, and it is important to remember to set it according to the actual application, which I was not aware of, using the ORTP default PCMU audio payload type, Transmission of video data encoded in the transfer, the result has been a problem, troubled me for a long time.

OK, let's talk about RTP timestamps.

First, learn a few basic concepts:

Timestamp Unit: The unit of time stamp calculation is not a unit of seconds, but a unit that is replaced by the sampling frequency, so that the purpose is to be more precise in the timestamp unit. For example, if an audio sample frequency is 8000Hz, we can set the timestamp unit to 1/8000.
Timestamp increment: The time difference (in timestamp units) between adjacent two RTP packets.
Sampling frequency: The number of samples sampled per second, such as the audio sample rate is typically 8000Hz
Frame rate: transmission or display of frames per second, e.g. 25f/s

Then look at the definition in the RTP timestamp textbook:

The 2nd 32Bit of the RTP header is the timestamp of the RTP packet, time Stamp, which accounts for 32 bits.
The timestamp reflects the sampling moment of the first byte of data in the RTP packet. The initial value of the timestamp at the beginning of a session is also randomly selected. Even when no signal is sent, the value of the timestamp increases over time. The receiver uses timestamps to know exactly at what time to restore which data block, thereby eliminating jitter in the transmission. Timestamps can also be used to synchronize sounds and images in the video app.
The granularity of timestamps is not specified in the RTP protocol, depending on the type of payload. Therefore, RTP timestamps are also called media timestamps to emphasize that the granularity of such timestamps depends on the type of signal. For example, for a 8kHz sampled voice signal, if every 20ms constitutes a block of data, a data block contains 160 samples (0.02x8000=160). So each time a RTP packet is sent, its timestamp value increases by 160.

Do you understand the official explanation? You didn't read it? It's OK, I didn't understand at first, so listen to my explanation.

     First, a timestamp is a value that reflects the time point at which a block of data is generated (collected), and the time stamp of the collected data block is definitely greater than the data block that was collected first. With such a timestamp, you can mark the order of the data blocks.
     Second, in the real-time stream transmission, the data is transferred to the RTP module for sending immediately after data acquisition, then, in fact, the time stamp of the block is directly used as the timestamp of the RTP packet.
     Third, if you use RTP to transfer a fixed file, this timestamp is the point in time at which the file was read, incremented in turn. This is no longer within the scope of our current discussion and is not considered for the time being.
     IV, the unit of the timestamp is the reciprocal of the sampling frequency, for example, when the sampling frequency is 8000Hz, the time stamp unit is 1/8000, in the Jrtplib library, there is a function interface to set the timestamp unit, In the ORTP library, the unit of time stamp is given directly according to the load type (audio load 1/8000, video payload 1/90000)
     V, timestamp increment refers to the time interval between two RTP packets, in detail, is the time interval (in timestamp units) at which the second RTP packet is sent apart when the first RTP packet is sent.
    If the sampling frequency is 90000Hz, the above discussion shows that the timestamp unit is 1/90000, we assume that the 1s clock is divided into 90,000 time blocks, then, if 25 frames per second, then, How many blocks of time are sent per frame? 90000/25 = 3600, of course. Therefore, we should have a timestamp increment of 3600 According to the definition of "timestamp increment is the time interval when sending the first RTP packet from a second RTP packet."
     does not seem to need to manage the increment of timestamps in Jrtplib, which is managed internally by the library. But in ORTP each time the data is sent to the value of the time stamp itself, that is, each time they need to send a RTP packet, the cumulative timestamp increment, is not very convenient, it is necessary for the RTP time stamp has a relatively deep understanding, I just started because I did not understand, at any time to set the timestamp increase The quantity causes the transmission to have the question, troubled me for a long time.

RTCP bit graph: ( 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
//       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
byte=0 | v=2|    p|   RC |             pt=sr=200 | Length |
//       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
4 | SSRC of Sender |
//       +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
8 | NTP timestamp, most significant word |
//       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
12 | NTP timestamp, least significant word |
//       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
16 | RTP Timestamp |
//       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
20 | Sender ' s packet Count |
//       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
24 | Sender ' s octet Count |
//       +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
28 | v=2|    p|  SC |             pt=sdes=202 | Length |
//       +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

36 |     cname=1 | Length | User and Domain name ...
//       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Multimedia communication synchronization method, there are time-stamp synchronous method, synchronous marking method, multiplexing synchronous method three kinds. The following focuses on the timestamp synchronization method, especially the RTP timestamp synchronization. The content includes the implementation of RTP inter-media synchronization, why do I need rtcp NTP time to achieve inter-media synchronization? Is it possible to synchronize between RTP media without RTCP? DirectShow the difference between timestamp and RTP timestamp, the timestamp of the Mpeg2-ts stream, and so on. This paper simply discusses the principle of time-stamp synchronization, does not involve the implementation of specific methods, such as audio frames and video frame time stamp calculation method, how to do according to the time stamp to do audio and video rendering.

Depending on the RTP specification, different RTP media streams are transferred separately and are synchronized using their own separate timestamps. Suppose that in a video on demand, two RTP media streams are transmitted, one video at a time, all the way to audio. According to the video frame time stamp, can realize the video stream internal synchronization, this very good understanding, through the video frame time stamp can calculate the adjacent video frame time interval, namely the video frame relative time relation is easy to determine through the timestamp, according to this interval to present the video, can obtain the good effect. In the same vein, the audio stream can synchronize itself.

So how do audio and video synchronize between the two media? We only use the RTP timestamp of the audio and video to see if we can achieve the synchronization between the media. The growth rate of the RTP timestamp for audio and video is generally different, but it doesn't matter, knowing the specific units, the two can be linked by unit conversion. Such as:

For now, this approach seems to be synchronized because the audio and video are mapped to the same timeline, and the relative relationship between the video and the frame is clear. Wait, the RTP specification requires that the initial value of the timestamp should be a random value, so assuming that the initial value of the audio frame timestamp is a random value of 1234, the initial value of the video frame timestamp is a random value of 5678, which should look like this: Is this appropriate? We have the audio frame timestamp 1234 and the video frame timestamp 5678 corresponds to the absolute time axis of 0, what is the reason for us to do so? You might say, because that is the first audio frame and the first video frame, so it can correspond to the same point, in the first picture we do this, the audio frame timestamp 0 and the video frame timestamp 0 corresponds to the absolute time axis of 0. However, the RTP specification does not stipulate that the time stamp of the first video frame and the time stamp of the first audio frame must or should correspond to the same point of the absolute time axis, which cannot be derived directly from the entire RTP specification, nor can it be inferred from this conclusion.

The conversion we made on the two images above is not correct, why? Because there is an implicit hypothesis when doing the conversion, we would like to assume that this hypothesis is true, but it is not always true. This assumption is that the timestamps of the first video frame and the first audio frame should correspond to the same point, that is, regardless of their timestamp, they should be played at the same time.

Only using RTP timestamps is not possible to achieve inter-media synchronization, the fundamental reason is that the audio timeline and video timeline is completely independent, through the audio frame and video frame time stamp, can not determine a video frame and an audio frame relative time relationship, that is, they can not be accurately positioned on the absolute time axis, Only one can be positioned exactly.

To achieve RTP inter-media synchronization, it is necessary to use the RTCP in the SR package of RTCP, which contains <NTP time, RTP timestamp > pair, audio frame RTP timestamp and video frame RTP timestamp through <NTP time, RTP timestamp > pair, Can be accurately positioned on the absolute timeline NTP, the relative time relationship between the audio frame and the video frame can be determined.

As mentioned above, our implicit hypothesis is not always tenable, that is to say that it has a time to set up. Does that mean that when it's set up, we can do the media sync without rtcp? The answer is, basically, you can think so.

For example, for RTP real-time streaming, in the sending side of the media synchronization is very good, at the receiving end only a little processing, do not need rtcp, you can achieve inter-media synchronization. Of course, this is only a few exceptions. Because the RTP specification does not include this hypothesis, let's do it according to the RTP specification.

Let's talk about the timestamps of DirectShow and Mpeg2-ts. The timestamp in DirectShow and the timestamp in RTP, except the unit is different, the calculation method is different, the essential difference is that the audio frame and the video frame time stamp in the DirectShow use the same time axis, so there is no need for other things, Media synchronization can be achieved using only audio frame timestamps and video frame timestamps. Mpeg2-ts Stream also has a time stamp, its timestamp and RTP and DirectShow timestamp are different, the audio frame and video frame time stamp in the TS stream is also the same time axis, the audio and video in the TS stream is multiplexed, which plays a role in a certain degree of synchronization, So it's not a time stamp on every frame, like its pts timestamp is every 0.1 seconds, and the missing timestamp is calculated from the other timestamp interpolation. Transfer Protocol Wiki

Http:// RTP Package Send Maximum transmission Unit for UDP

Description of RTP timestamp and load type of multimedia opening

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.