RTP provides an end-to-end network transmission function, suitable for applications that transmit real-time data (multicast or Unicast Network Services), such as audio and video. RTP does not require resource reservation or guarantee the service quality of real-time services. The Data Transmission Control Protocol RTCP is enhanced. This protocol is suitable for monitoring data transmission in large multicast networks. RTCP also provides a small amount of control and identification functions. RTP and RTCP are independent of the following network layer and transport layer. The protocol supports RTP-level conversion and synthesis.
RTP load: the data transmitted in an RTP packet, such as audio sampling or compressed video data.
RTP packet: A packet containing a fixed RTP Header, a list of contributed sources that may be empty, and load data.
RTCP package: A control package that contains a Fixed Header and relies on the structured elements of the RTCP package type. Multiple RTCP packages can be sent as a compositing package in a simple package of the underlying protocol. This is determined by the length field in the Fixed Header of each RTCP package.
Port: in a given host, the transport protocol is used to identify the abstraction of multiple destinations.
Transmission address: A combination of network addresses and ports is used to identify a transmission-level endpoint, such as an IP address and a UDP port. The package is transmitted from the source transmission address to the destination transmission address.
RTP session: a set of participants who use RTP to communicate. For each participant, the session defines a specific destination transmission address (a network address and a port for RTP and RTCP ). the destination transmission address may be common to all participants, for example, in IP multicast or different from each other. In a multimedia session, separated sessions and their RTCP package are used to carry a media. Multiple RTP sessions can be identified by different port pairs or/multicast addresses.
Synchronization source (SSRC): the source of a stream in the RTP package. It is identified by a 32-bit SSRC identifier carried in the RTP Header, with the aim of being independent from the network address. All packages from the synchronization source form the same time and sequence number intervals, so that the receiver can combine the synchronization source package into a group during playback. A synchronization source can change its data format, such as audio encoding. The synchronization source identifier is a randomly selected value. It must be unique throughout the RTP session. In multimedia sessions, a participant does not need to use the same SSRC identifier for all sessions. The binding of SSRC identifiers can be provided through RTCP. If a participant generates multiple streams in a RTP session, for example, from different cameras, each one must be marked with a different SSRC.
Contribution source (CSRC): the source of the RTP packet stream contributes to the synthesis stream produced by the RTP mixer. The mixer inserts a list into the RTP packet header generated by the mixer. The list lists the SSRC identifiers of all sources that contributed to the generated RTP packet and becomes the CSRC list. For example, in an audio conference, a mixer combines the voice of all speakers into a package, allowing the receiver to indicate the current Speaker even if all audio packets produced by the mixer contain the same SSRC identifier.
Terminal System: An application that generates or consumes the content of the RTP packet. In a specific RTP session, a terminal system can be used as one or more synchronization sources.
Mixer: an intermediate system that receives RTP packets from one or more sources. It can change the data format, mix the packets in some way, and then send the new RTP packet. Because the time of multiple input sources is usually not synchronized, the mixer will adjust the time of these streams and generate its own time for the merged stream. In this way, all the data packets generated from the mixer are identified by the synchronization source generated by the mixer.
Translator: sends the intermediate system of the RTP package with the synchronization source identifier of the original RTP package. For example, you do not need to use mixed transcoding, multicast-to-Unicast replication, and application-level firewall filters.
Monitor: an application that receives RTCP packets sent by participants in RTP sessions. It mainly receives reports and estimates the current service quality, fault diagnosis, and long-term statistics. The monitor function can be built in or isolated from the application that participates in the session. As an independent application, it does not send or receive RTP data packets. In this case, the monitor is called a third-party monitor.
Wall Clock Time (absolute time): expressed in a timestamp in the Network Time Standard (NTP) format, which is relative to January 1, January 1, 1900. The full-resolution NTP timestamp is a 64-bit unsigned fixed point. The integer part is represented by the first 32 bits, and the decimal part is represented by the last 32 bits. Use a more compressed expression in some domains, that is, only 32 bits in the middle are used.
Sequence Number: 16 bits
Each time an RTP packet is sent, the sequence number is added with 1, which can be used by the recipient to detect packet loss and store the sequence number of the packet. The initial value of the sequence number is random and unpredictable to prevent plaintext attacks.
Timestamp: 32 bits
The timestamp indicates the sampling time of the first byte in the RTP data packet. The sampling time must come from a time-linearly increasing clock that allows synchronization and jitter calculation. The accuracy of the clock must meet the required synchronization accuracy, and the packet arrival jitter can be well measured (one tick value per video frame is not enough ). The initial values of the timestamp are random and the sequence numbers are the same. Several consecutive RTP packets may have equal timestamps if they are logically generated at the same time, for example, belong to the same video frame. The timestamp contained in a continuous RTP packet may not be monotonous, if the data is not transmitted in its sampling sequence. For example, an MPEG video frame. (The sequence number of packet transmission is still linear ).
SSRC: 32 bits
The SSRC field identifies the synchronization source. This identifier is randomly selected and must not have the same SSRC identifier in the same RTP session. Although the probability of selecting the same identifier for multiple sources is very low, all RTP implementations must be prepared to detect and handle this conflict. If a source changes its source transmission address, it must also select a new SSRC identifier to avoid being interpreted as a circular source.
CSRC list: 0 to 15 items, 32 bits each
The CSRC list shows the load contribution sources included in the package. The number of identifiers is given by the CC field. If there are more than 15 contribution sources, only 15 are identified. The CSRC identifier is inserted by the mixer and uses the SSRC identifier of the contribution Source.
RTCP transmits control packets cyclically based on all participants in a session and adopts the same distribution mechanism as the data packets. Low-layer protocols must provide multiplexing of data packets and control packets. For example, when UDP is used, separate port numbers are used. RTCP provides four functions:
Provide quality feedback on data distribution;
RTCP is a transmission-level identifier that carries a permanent canonical name (cname) to an RTP source;
Each participant sends his/her control package to others, and each user observes the number of participants independently;
Transmits the minimum session control information. This function is optional. For example, the identifier of the participant is displayed on the user interface.
SR: sends the report, which is from the transmission or receipt statistics of the sender.
Rr: receives the report, which is from the receiver's receipt statistics.
Sdes: source description terms, including cname.
Bye: indicates the exit of the participant.
AAP: application function.
Each RTCP package starts with a fixed header, followed by a variable-length structured element. These lengths are determined by the package type and must be 32-bit aligned. The RTCP fixed header contains the length field. Multiple RTCP packages can form a combined RTCP package without the need to interlace separators, and are sent as a simple package through the low-layer protocol. There is no number of independent RTCP packets in the low-layer protocol. It only provides the full length of the packet.
Each Independent RTCP package in the combination package does not need to be processed sequentially. However, to implement the functions of the Protocol, the following restrictions must be imposed:
When receiving statistics (in Sr or RR), when bandwidth limit permits full transmission of statistics, it should be transmitted frequently, so the RTCP package for periodic transmission must contain a report package.
The new receiver needs to receive the source cname to identify the source. Therefore, each combined RTCP package must contain sdes cnmae.
The package type that may appear in the beginning of the package combination should be limited, so that the number of unchanged bits in the first word can be increased, effectively prevent the error description of RTP data packets or other unrelated data packets from being considered as RTP packets. A package must start with an SR or RR package.
RTP is designed to allow an application to automatically change the session size, that is, the number of participants can range from several to several thousand. Control transmission is not self-restrictive. If a report is sent from each Participant at a fixed bit rate, control transmission will linearly increase with the increase of participants. In each session, assume that data transmission is determined by a bandwidth value called "session bandwidth", which is divided by multiple participants and can be reserved or restricted by the network, or be reasonably shared. Control transmission should occupy a small part of the session bandwidth, so that the transmission data function of the transmission protocol is not damaged. It is recommended that the bandwidth allocated to RTCP be fixed to 5% of the session bandwidth, and the sender occupies at least 1/4 of the transmission bandwidth.
We recommend that you assign the RTCP bandwidth to each participant. The bandwidth used to transmit additional information cannot exceed 20%. Therefore, it is not necessary to include all sdes clauses in every application. For example, an application is designed to send only cname, name, and email messages. The priority of name may be higher than that of email, because name will be displayed continuously on the application's user interface, while email will only be displayed at the request. During each RTCP interval, an RR package and an sdes package containing the cname terms are transmitted. In a small Interval Session, the average interval is 5 seconds. After each three intervals, the sdes package will contain an additional clause. Seven of the eight will be the name clause, and only one will be the email clause.
RTCP header:
Partition tion report count (RC): 5 bits
The number of receipt reports contained in this package.
Length: 16 bits
The RTCP package contains the header field and the fill field length (32 bits ).
Packet type (PT): 8 bits
If it is 200, it indicates that the package is an RTCP Sr package.
SSRC: 32 bits
The synchronization source identifier of the initiator of the SR package.
RTP timestamp: 32 bits
The time value is related to the NTP timestamp, but the same unit has the same time offset as the RTP data packet timestamp. This correlation can be used for intra-frame media synchronization between source frames. The NTP timestamp of the source is synchronized. It can also enable the receiver of independent media to estimate the RTP clock frequency.
Sender's octet count: 32 bits
The number of sent bytes, the total number of load bytes sent by the sender (excluding the header and fill information), from the start of transmission to the generation of the SR package. If the source changes its SSRC, this number is reset. This field can be used to estimate the average load data rate.
Ssrc_n (source identifier): 32 bits
The SSRC identifier of the source indicates the source information of the report block.
Fraction lost: 8 bits
The segment of the source ssrc_n that is lost from the last time the RTP packet was sent to the SR or RR packet. This value is a fixed number of points. It defines the number of lost packets divided by the expected number of packets.
Cumulative number of packets lost: 24 bits
The cumulative number of RTP packets lost when the source ssrc_n is received from the beginning.
Extended highest sequence number already ed: 32 bits
The lower 16 bits contain the maximum RTP data packet sequence number received from the source ssrc_n. The higher 16 BITs reflect the number of low 16 bits.
Interarrival jitter: 32 bits
The internal arrival time of RTP packets is used to calculate the variance.
Last Sr timestamp (LSR): 32 bits
The middle 32-bit of the NTP timestamp of the most recent RTCP packet sent from the source ssrc_n. If no Sr is received, the value is zero.
Delay since last Sr (dlsr): 32 bits
Latency between receiving the last Sr package of the source ssrc_n and sending the received data block, measured in 1/65536.
Generally, we assume that all users receive media data in the same format. However, this is not always true. Consider a situation where many participants in a meeting can access the meeting through a high-speed network, but a low-speed linked user joins the meeting. It does not force each participant to use low-bandwidth, low-quality audio encoding. a rtp-level relay called a mixer can be placed in the low-bandwidth field. The mixer synchronizes the incoming audio packets and reassembles these packets into a single stream. This audio stream is converted to encoding in a low-bandwidth audio encoding format and sent to low-speed link users. The RTP Header contains a mixer to identify the contribution source, so that the recipient can know who the audio is sent.
Some audio conferencing users cannot directly connect to high-speed bandwidth links through IP multicast. For example, they may be behind an application-level firewall, which does not allow IP packets to pass through. In this case, the mixer is not required and another RTP-level relay translator is required. The two translators are placed on both sides of the firewall. All the multicast packets received can be sent to the firewall through a secure connection. The firewall's internal translator then transmits them to a multicast address according to the multicast packet.