WebRTC Audio and Video synchronization method

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

016-11-25 DORAWEBRTC series Wind net WEBRTC series Wind Net

Source: Wind NET Series

Author: Weizhenwei, fan network columnist

Audio and video synchronization is the most intuitive user experience for multimedia products, and is the most basic quality guarantee for audio and video media data transmission and rendering playback. If the audio and video is not synchronized, it is possible to cause delay, lag, etc. that affect the user experience very much. Therefore, it is very important. In general, audio and video synchronization to maintain the time line of the media data order, that is, the transmission end in a moment to collect audio and video data, the receiving side at the same time playing and rendering.

Based on the deep study of WEBRTC source code, this paper analyzes the implementation details of the audio and video synchronization, including the production of RTP timestamp, the construction, sending and receiving of RTCP SR message, the initialization and synchronization of audio and video synchronization. The RTP timestamp is the cornerstone of the RTP packet, and the RTCP SR message is the baseline for converting between timestamps and NTP time. It is described in detail below.

First, the production of RTP time stamp

Personally, RTP timestamps and serial numbers are the essence of the RTP protocol: The former defines the sampling time of the media payload data, describes the inter-frame order of the payload data, and the latter defines the order of the RTP packets, and describes the intra-frame order of the media data. About RTP Timestamps:

"The timestamp reflects the sampling instant of the first octet in the RTP data packet. The sampling instant must is derived from a clock, increments monotonically and linearly in time to allow Synchronizat Ion and jitter calculations. The resolution of the clock must is sufficient for the desired synchronization accuracy and for measuring packet arrival J Itter (one tick per video frame is typically not sufficient). ”

The above definition shows that the RTP timestamp reflects the sampling time of the RTP load data, obtained from a monotonically linear increment of the clock. The accuracy of the clock is determined by the sampling frequency of the RTP payload data, such as the video sample frequency is generally 90khz, then the timestamp increases by 1, the actual time is increased by 1/90000 seconds.

Back to the WEBRTC source code, take video capture as an example to analyze the production process of RTP timestamp, as shown in Figure 1.

Figure 1 RTP Timestamp construction process

The video capture thread collects video data from the frame as the basic unit, and the video frame is collected from the system API and then arrives at the Videocaptureimpl::incomingframe () function after preliminary processing, setting the Render_time_ms_ to the current time ( is actually the sampling time).

After the execution process reaches the Videocaptureinput::incomingcapturedframe () function, the function sets the Timestamp,ntp_time_ms and Render_time_ms of the video frame. Where Render_time_ms is the current time, in milliseconds, Ntp_time_ms is the absolute time of the sampling moment, in milliseconds, and timestamp is the timestamp of the sampling time, which is ntp_time_ The product of the MS and sampling frequency frequency, in 1/frequency seconds. It is noted that timestamp and Ntp_time_ms are different representations of the same sampling time.

The video frame is then encoded by the encoder and sent to the RTP module for RTP packaging and forwarding. When constructing the RTP packet header, call the Rtpsender::buildrtpheader () function to determine the final value of the timestamp as Rtphdr->timestamp = Start_timestamp + timestamp, where Start_ Timestamp is the initial timestamp set by Rtpsender at initialization time. After the RTP message is constructed, it is sent to the peer via the network.

Second, SR message construction and transceiver

As discussed in the previous section, NTP time and RTP timestamp are different representations of the same moment, and the difference lies in the different precision. NTP time is the absolute time, in milliseconds, and the RTP timestamp is related to the sampling frequency of the media. Therefore, we need to maintain a correspondence between the NTP time and the RTP timestamp, which is used to convert the two kinds of time. The SR message defined by the RTCP protocol maintains this correspondence, which is described in detail below.
> > > >

2.1 Timestamp Initialization

During the initialization phase, the Modulertprtcpimpl::setsendingstatus () function Gets the timestamp representation of the current NTP time (ntp_time * frequency), Set the Start_timestamp parameter of Rtpsender and Rtcpsender, respectively, as the timestamp initial value (that is, the initial value of the previous section when determining the timestamp of the RTP packet header).

When the video data is sent to the RTP module to construct the RTP message after encoding, the video frame timestamp timestamp and local time Capture_time_ms through the Rtcpsender::setlastrtptime () The function is recorded in the Last_rtp_timestamp and Last_frame_capture_time_ms parameters of the Rtcpsender object to be used for future construction of the RTCP SR message. > > > >

2.2 SR Message Construction and Dispatch

The WEBRTC internally sends RTCP messages periodically through Moduleprocessthread threads, where SR messages are constructed via RTCPSENDER::BUILDSR (CTX). Where CTX contains the NTP time of the current moment as the NTP time in the SR message [1]. Next we need to calculate the corresponding RTP timestamp at this point, that is, if a frame of data is just sampled at the moment, its timestamp is:

Rtp_timestamp = start_timestamp_ + Last_rtp_timestamp_ +
(Clock_->timeinmilliseconds ()-LAST_FRAME_CAPTURE_TIME_MS_) *
(ctx.feedback_state_.frequency_hz/1000);

At this point, the NTP time and RTP timestamp all work together, you can construct the SR message to send. > > > >2.3 sr receive

After receiving the SR message, the receiver records the NTP time and RTP timestamp contained in the Rtcpsenderinfo object for use by other modules. For example, by RTCPRECEIVER::NTP () or senderinforeceived () function.

Three, audio and video synchronization

The first two sections do the necessary matting, this section detailed analysis of the WEBRTC internal audio and video synchronization process. > > > >

3.1 Initializing the configuration

The core of audio-video synchronization is synchronization based on the RTP timestamp carried by the media payload. Within WEBRTC, the basic object of synchronization is Audioreceivestream/videoreceivestream, paired with each other according to Sync_group. The initialization setup process for synchronization is shown in Figure 2.

Figure 2 Audio and video synchronization initialization configuration

The Call object calls Configuresync () to configure the audio and video synchronization when the Audio/videoreceivestream is created. The configuration parameter is Sync_group, which is specified when Peerconnectionfactory is created MediaStream. Inside the Configuresync () function, the Sync_group lookup gets Audioreceivestream, and then finds Videoreceivestream in Video_receive_streams. Get two media streams, call Videoreceivestream::setsyncchannel synchronization, save the audio and video parameters in the Viesyncmodule::configuresync () function, including the VOE_CHANNEL_ ID, Voe_sync_interface, and video Video_rtp_receiver, Video_rtp_rtcp. > > > >3.2 synchronization process

The synchronization process for audio and video is performed in the Moduleprocessthread thread. Viesyncmodule is registered as a module in the Moduleprocessthread thread, and its process () function is periodically called by the thread to achieve the audio/video synchronization operation.

The core idea of audio-video synchronization is to use the NTP time and RTP timestamp carried in the RTCP SR message as the time benchmark. Audioreceivestream and Videoreceivestream each receive the latest RTP timestamp timestamp and corresponding local time Receive_time_ms as parameters to calculate the relative delay of the audio and video stream, Then the final target delay is calculated by combining the current delay of the audio and video, and finally the target delay is sent to the audio/video module for setting. The target delay is used as the lower limit value for audio and video rendering. The entire process is shown in Figure 3.

Figure 3 Audio and video synchronization process

First, get the current video delay Current_video_delay from Videoreceiver, which is the sum of Video_jitter_delay,decode_delay and Render_delay. The current audio delay current_audio_delay, the sum of Audio_jitter_delay and Playout_delay, is then obtained from the Voevideosyncimpl.

The audio and video are then updated with their respective rtp_rtcp and Rtp_receiver, respectively, with their respective measure. The basic operations include: obtaining the RTP timestamp of the latest received RTP packet from Rtp_receiver Latest_timestamp and corresponding local receive time Latest_receive_time_ms, obtaining the latest received RTCP from RTP_RTCP The NTP time and RTP timestamp in the SR message. This data is then stored in the measure. Note that the NTP time and RTP timestamp in the latest two pairs of RTCP SR messages are saved in measure to calculate the sampling frequency for the media stream in the next step.

Next, calculate the relative delay of the newly received audio and video data. The basic flow is as follows: First get the latest received RTP timestamp latest_timestamp corresponding NTP time latest_capture_time. The NTP time and RTP timestamp timestamp of the Latest_timestamp and RTCP SR stored in measure are used here, and the sampling frequency is calculated using two pairs of numerical values frequency, then Latest_capture_time = Latest_timestamp/frequency, the sampling time in milliseconds is obtained. Finally get the relative delay of the audio and video:

Relative_delay = Video_measure.latest_receive_time_ms-
Audio_measure.latest_receive_time_ms-
(Video_last_capture_time-audio_last_capture_time);

At this point, we get three important parameters: Video current delay current_video_delay, audio current delay current_audio_delay and relative delay relative_delay. Next, use these three parameters to calculate the target delay for the audio and video: First calculate the total relative delay Current_diff = Current_video_delay–current_audio_delay + Relative_delay, based on the historical value of the weighted average value. If the Current_diff > 0 indicates that the current video delay is longer than the audio delay, you need to reduce the video delay or increase the audio latency, or the other, if present < 0, you need to increase video latency or reduce video latency. After this adjustment, we get the audio and video target delay Audio_target_delay and Video_target_delay.

Finally, we set the obtained target delay Audio_target_delay and Video_target_delay to the audio and video module respectively, as the lower limit of the future rendering delay. So far, one audio-video synchronization is complete. This operation is performed periodically in the Moduleprocessthread thread.

Iv. Summary This paper analyzes in detail the implementation details of WEBRTC internal audio and video synchronization, including the generation of RTP timestamp, the construction, sending and receiving of RTCP SR message, the initialization and synchronization of audio and video synchronization. In this paper, we have a deeper understanding of RTP protocol, streaming media communication and audio-video synchronization.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

WebRTC Audio and Video synchronization method

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

WebRTC Audio and Video synchronization method

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support