Turn from: http://blog.csdn.net/lixiaowei16/article/details/53407010
Audio and video synchronization is related to the most intuitive user experience of multimedia products, audio and video media data transmission and rendering playback of the most basic quality assurance. If the audio and video is not synchronized, it may cause delays, such as cotton, etc. very affect the user experience phenomenon. Therefore, it is very important. Generally speaking, the audio and video synchronization maintenance of the media data in the timeline sequence, that is, the sender at a certain point of time collected audio and video data, the receiver at another time playback and rendering.
Based on the in-depth study of WEBRTC source code, this paper analyzes the implementation details of its audio and video synchronization, including the generation of RTP timestamp, the construction, sending and receiving of RTCP SR message, the initialization and synchronization of audio and video synchronization. The RTP timestamp is the cornerstone of the RTP packet, while the RTCP SR message is the benchmark for converting between timestamp and NTP time. The following is described in detail.
One, the generation of RTP time stamp
Personally, the RTP timestamp and serial number are the essence of the RTP protocol: The former defines the sampling time of the media load data, describes the sequence of the load data, the latter defines the sequence of RTP packets, and describes the intra order of the media data. About RTP Timestamp:
"The timestamp reflects the sampling instant of the ' the ' the ' the ' the ' the ' the ' the ' the ' in ' RTP The sampling instant must is derived from a clock that increments monotonically and linearly in time to allow Synchronizat Ion and jitter calculations. The resolution of the clock must is sufficient for the desired synchronization accuracy and for measuring packet arrival J Itter (one tick per video frame was typically not sufficient). ”
It is known from the above definition that the RTP timestamp reflects the sampling time of RTP load data and is obtained from the monotone linear increment clock. The precision of the clock is determined by the sampling frequency of the RTP load data, for example, the frequency of the video is usually 90khz, then the time stamp increases by 1, then the actual time increases by 1/90000 seconds.
Below back to WEBRTC source code, take video capture as an example to analyze the production process of RTP timestamp, as shown in Figure 1.
Fig. 1 The RTP time stamp construction process
The video capture thread takes the frame as the basic unit to collect the video data, the video frame is collected from the system API, after the initial processing arrives the Videocaptureimpl::incomingframe () function, sets the RENDER_TIME_MS_ as the current time ( is actually the sampling time).
After the execution process arrives at the Videocaptureinput::incomingcapturedframe () function, the function sets the Timestamp,ntp_time_ms and Render_time_ms of the video frame. Where Render_time_ms is the current time in milliseconds, Ntp_time_ms is the absolute time of the sampling time, in milliseconds, and timestamp is the timestamp representation of the sampling period, which is the ntp_time_ The product of MS and sampling frequency frequency in 1/frequency seconds. Therefore, timestamp and Ntp_time_ms are different representations of the same sampling time.
The video frame is then encoded by the encoder and sent to the RTP module for RTP packaging and delivery. When constructing the RTP packet head, call the Rtpsender::buildrtpheader () function to determine that the timestamp's final value is Rtphdr->timestamp = Start_timestamp + timestamp, where Start_ Timestamp is the initial timestamp set by Rtpsender at initialization time. After the RTP message is constructed, it is sent to the end-to-end via the network.
II. SR Message Construction and transceiver
From the previous section, we know that NTP time and RTP timestamp are different representations at the same time, and the difference lies in different precision. The NTP time is an absolute time, in milliseconds, and the RTP timestamp is related to the sampling frequency of the media. Therefore, we need to maintain a corresponding relationship between NTP time and RTP timestamp, which is used to convert two kinds of time. The SR message defined by the RTCP protocol maintains this correspondence, described in detail below.
> > > >
2.1 Time stamp Initialization
During the initialization phase, the Modulertprtcpimpl::setsendingstatus () function obtains the timestamp representation of the current NTP time (ntp_time * frequency), Sets the Start_timestamp parameters for the Rtpsender and Rtcpsender, respectively, as the timestamp initial value (that is, the initial value of the previous section when determining the non timestamp of the RTP packet header).
When the video data is sent to the RTP module to construct the RTP message, the video frame timestamp timestamp and the local time Capture_time_ms through Rtcpsender::setlastrtptime () Functions are recorded in the Last_rtp_timestamp and Last_frame_capture_time_ms parameters of the Rtcpsender object to be used in future constructs of the RTCP SR message. > > > >
Construction and dispatch of 2.2 SR message
WEBRTC internally sends RTCP messages periodically through Moduleprocessthread threads, where SR packets are constructed by RTCPSENDER::BUILDSR (CTX). The CTX contains the NTP time of the current time as the NTP time in SR message [1]. Next you need to figure out the RTP timestamp for the moment, assuming that a frame of data is just being sampled at the moment, the timestamp is:
Rtp_timestamp = start_timestamp_ + Last_rtp_timestamp_ +
(Clock_->timeinmilliseconds ()-LAST_FRAME_CAPTURE_TIME_MS_) *
(ctx.feedback_state_.frequency_hz/1000);
At this point, the NTP time and the RTP time stamp all work, can construct the SR message to send. > > > > 2.3 sr Reception
After receiving the SR message, the receiving End records the NTP time and the RTP timestamp contained in the Rtcpsenderinfo object for use by other modules. For example, by RTCPRECEIVER::NTP () or senderinforeceived () function.
Third, audio and video synchronization
The first two sections do the necessary matting, this section analyzes the WEBRTC internal audio and video synchronization process in detail. > > > >
3.1 Initialization configuration
The core of audio and video synchronization is to synchronize the RTP time stamp carried by the media load. Within the WEBRTC, the basic object of synchronization is Audioreceivestream/videoreceivestream, which is matched to each other according to Sync_group. The initialization setup process for synchronization is shown in Figure 2.
Figure 2 Audio and video synchronization initialization configuration
The Call object calls Configuresync () to configure the audio and video synchronization when creating Audio/videoreceivestream. The configuration parameter is Sync_group, which is specified when Peerconnectionfactory is created MediaStream. Within the Configuresync () function, find the Audioreceivestream by Sync_group lookup, and then find Videoreceivestream in Video_receive_streams. Get two media streams, call Videoreceivestream::setsyncchannel Sync, and save audio and video parameters in the Viesyncmodule::configuresync () function, including the VOE_CHANNEL_ ID, Voe_sync_interface, and video Video_rtp_receiver, Video_rtp_rtcp. > > > > 3.2 sync Process
The sync process of audio and video is performed in the Moduleprocessthread thread. Viesyncmodule is registered as a module in the Moduleprocessthread thread, and its process () function is periodically invoked by the thread to implement the audio-video synchronization operation.
The core idea of audio-video synchronization is to use the NTP time and RTP time stamp carried in RTCP SR message as the time datum, The relative delay of the audio and video stream is computed by Audioreceivestream and Videoreceivestream respectively receiving the latest RTP timestamp timestamp and corresponding local time Receive_time_ms as parameters. Then, the final target delay is computed with the current delay of the audio and video, and the target delay is sent to the audio and video module for setting. Target latency is the minimum delay value for audio and video rendering. The whole process is shown in Figure 3.
Figure 3 Audio and video synchronization process
First, the current video latency Current_video_delay, the sum of Video_jitter_delay,decode_delay and Render_delay, is obtained from Videoreceiver. The current audio delay current_audio_delay, the sum of Audio_jitter_delay and Playout_delay, is then obtained from Voevideosyncimpl.
Then, the audio and video updates their measure respectively with their respective rtp_rtcp and Rtp_receiver. The basic operation includes: obtaining the RTP timestamp Latest_timestamp of the newest received RTP message from Rtp_receiver and corresponding local receiving time Latest_receive_time_ms, obtaining the latest received RTCP from RTP_RTCP NTP time and RTP timestamp in SR messages. The data is then stored in the measure. Note the NTP time and RTP timestamp in the latest two pairs of RTCP SR messages are stored in measure to calculate the sampling frequency of the media stream in the next step.
Next, calculate the relative latency of the latest received audio and video data. The basic process is as follows: first get the latest RTP time stamp latest_timestamp corresponding NTP time latest_capture_time. Here we use the NTP time and the RTP timestamp timestamp of the Latest_timestamp and RTCP SR stored in the measure, the sampling frequency frequency is obtained by two pairs of numerical calculations, and then there is Latest_capture_time = Latest_timestamp/frequency, the sampling time in milliseconds is obtained. Finally get the relative delay of the audio and video:
Relative_delay = Video_measure.latest_receive_time_ms-
Audio_measure.latest_receive_time_ms-
(Video_last_capture_time-audio_last_capture_time);
So far, we've got three important parameters: Video Current latency current_video_delay, audio current latency current_audio_delay, and relative latency relative_delay. The following three parameters are used to compute the target delay of the audio and video: First compute the total relative delay Current_diff = Current_video_delay–current_audio_delay + Relative_delay, and calculate the weighted average value according to the historical value. If Current_diff > 0 indicates that the current video latency is longer than the audio delay, you need to reduce the video latency or increase the audio latency, or you may need to increase the video latency or reduce the video latency If you are currently < 0. After this adjustment, we get the audio and video target delay Audio_target_delay and Video_target_delay.
Finally, we set the target delay Audio_target_delay and Video_target_delay respectively to the audio and video module, as the lower bound value of future rendering delay. So far, one audio and video sync operation is complete. This operation is performed periodically in the Moduleprocessthread thread.
Summing up the detailed analysis of the implementation of WEBRTC internal audio and video synchronization, including the production of RTP timestamp, RTCP SR message Construction, send and receive, audio and video synchronization initialization and synchronization process. Through this paper, the RTP protocol, streaming media communication and audio and video synchronization have a more in-depth understanding.