Instant Messaging--a detailed audio-video synchronization technology

Last Update:2016-09-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Transferred from: http://tieba.baidu.com/p/2138076570

Abstract: In order to solve the problem of audio and video synchronization caused by delay, jitter and network transmission conditions, a new audio and video synchronization scheme is designed and implemented to adapt to different network conditions. Using the audio and video coding technology, AMR-WB and H. e have the characteristics of rate selectable in complex network environment, combine RTP timestamp and RTCP feedback to detect QoS, and through controlling audio-video coding, realize audio-video synchronization scheme in dynamic network environment. The design process of the synchronization algorithm in the environment of reliable network and dynamic network is introduced emphatically, and the feasibility of this scheme is verified by actual test. The results show that this scheme can guarantee the synchronization of audio and video in different network environments.

Introduction
The synchronization between audio and video media is an important part in the research of quality of service (QoS) of multimedia system. When transmitting multimedia data on the network, it can cause the different steps of the audio and video stream due to the way the terminal handles the data and the delay and jitter in the network. In view of the shortcomings of the traditional solution, such as poor real-time, high-cost and unable to adapt to the dynamic network environment, this paper proposes a synchronous solution based on cyclic buffer queue and RTCP feedback control on the basis of analyzing the definition and influence factors of synchronization between media.
1 definition of inter-media synchronization
Synchronization is the main feature of multimedia communication, and it is also one of the important research contents, the synchronization or not directly affects the quality of multimedia communication. Inter-media synchronization is about maintaining the time relationship between the audio stream and the video stream [1]. In order to describe the synchronization, implement the related control mechanism, define the corresponding quality of service parameters (QoS). For audio and video, the use of time difference is the deviation of the expression. The results show that if the deviation limit is within a certain range, the media is considered to be synchronous. When the offset is between -90ms (audio lag in video) to +20ms (audio forward video), the person does not feel the quality of the audition changes, this area can be considered as a synchronous region, when the offset from 185 to +90, audio and video will have a serious out of sync phenomenon, this area is considered to be an unsynchronized region. This design is considered to be offset between -120ms to +40ms audio and video synchronization.
1.1 Factors affecting audio and video synchronization
In the network environment, the multimedia information is affected by various factors in the transmission process, which can cause it not to play correctly on the receiving side, that is, the audio and video is not synchronized. There are two main reasons why the audio and video is not synchronized: one is the terminal processing data, the transmitting end in the data acquisition, encoding, packaging and other modules and receivers in the processing of unpacking, decompression, playback and other modules, due to the audio and video data volume and encoding algorithm difference caused by the time difference. And the transmitting side does not have a unified synchronization clock, the other is the network transmission delay, the network transmission is affected by the network real-time transmission bandwidth, transmission distance and network node processing speed and other factors, in the network congestion, the media information can not be guaranteed to continuous "stream" data transmission, In particular, the continuous transmission of video information with large data volume is not guaranteed, which causes the step-out between the media stream and the flow in the stream [2-3].
2 audio and video synchronization system design
In the audio and video synchronization system, the transmitting side sends the audio and video stream with relative timestamp to each frame data, and the audio stream and video stream, one as the mainstream, the other as the slave stream. Mainstream continuous play, from the stream playback by the mainstream playback state, so as to achieve synchronization. Considering that people are more sensitive to sound, in this design select the audio stream as the mainstream, the video stream as the slave stream. The transmitting end encodes the audio and video data acquired by the DirectShow through AMR-WB and H. Encode module, and synchronously processes it, and finally realizes the transmission and control of the media stream using RTP/RTCP and other protocols. When the receiver receives the audio and video packets from RTP, the data is decoded, then synchronously processed, and the audio and video is played through DirectShow.
3 Audio and video synchronization scheme design
Considering that the traditional synchronization scheme is only synchronous at the receiving end through RTP timestamp, the audio and video data with the same time stamp will be displayed simultaneously, this scheme is not realized from the angle of effective control and adaptation to different network environment, and the overhead of reading and writing timestamp is too big, and the whole network synchronization clock is needed. Therefore, it is not suitable for synchronization between audio and video media [4]. To solve this problem, this paper proposes a synchronization scheme which can be applied to different network conditions by using RTP/RTCP and controllable audio-video coding technology in combination with sender. The main performance in the following two aspects: 1, the transmission of data collection, encoding is sent control; 2. Using the feedback index of RTCP, the audio-video coding algorithm can adapt to different network environment dynamically.

3.1 RTP Timestamp synchronization
When the network is unblocked, the network transmission time delay is basically constant, the jitter is very small, the transmission end and the receiver side of the audio and video frames between the basic consistency, the media data is basically not lost. Because there is no direct correlation control between the RTP of the audio and video, the synchronization cannot be controlled by correlation. At this time, the main use of RTP Baotou timestamp to solve.
On the sender side, the time stamp control within the same media: dynamically control the increment rate of timestamps for different sample rates of audio and different frame rates of the video; synchronization control between different media: the same time stamp is played on the data collected at the same moment, and the audio and video packets are alternately sent in the same thread.
At the receiving end, when the audio and video data arrives, the two data frames are decoded, and their decoded data is stored in their dynamic loop buffer. Because the decoding time of each data frame of audio and video can not be accurately obtained, in order to realize the video synchronization replay accurately, the first decoding and resynchronization processing method is adopted. When the network is unblocked, the decoding time difference of two kinds of data can be treated as part of jitter delay. However, when the network environment is not good, do not use this method of processing.
(1) The receiving end of the audio frame processing is as follows:
SHAPE
\* Mergeformat
Audio data arrival
Data decoding, storage
Audio playback
n Buffers
Timing Extraction

Figure one receives the audio frame
As shown, in order to eliminate jitter, the receiver uses a cyclic buffer-based approach to ensure the continuity of audio. There are two advantages to this method: one is that the cache space can be dynamically established based on the receiving of RTP data, and the other is that there is enough audio data in the cache for playback. When an audio frame is received by the receiving side, it is decoded first and stored in a dynamic loop buffer, with a threshold of N for the number of nodes in the loop cache block, which is larger than the expected maximum jitter time. Before starting to start playing audio, fill in the buffer, then periodically extract the audio frame playback, and record the current playback timestamp.

(2) The receiving end of the video frame processing is as follows:

SHAPE
\* Mergeformat
Video Data arrival
Decoding storage
Frame extraction
Comparison
Play
Discarded
Delay

Figure two receiving video frames
Two shows that when the video frame arrives, the receiving end decodes it, and then the decoded data is stored in the loop buffer. In order to avoid the block effect of high-speed video, the system uses event-driven method to play video stream. When the buffer receives a video packet, the time stamp of the frame is compared with the timestamp of the current audio data to be played Tvideo Taudio. In this design, the tolerance of audio-video frame synchronization is tmax=120ms. Therefore, the processing result of one frame video data is divided into the following three kinds: if Taudio-tmax<tvideo<taudio+tmax, the video frame is played.
If Tvideo<taudio-tmax, the video frame lags, the frame is discarded.
If Tvideo>taudio+tmax, the video frame is ahead, waiting for the next time to read the audio frame again processing.
The implementation code for synchronous processing of video frames by the receiving end is as follows:
Onrtppacket (Rtppacket *pack,
Const Rtptime &receivetime,
Const rtpaddress *senderaddress)
{
size_t
Buffsize=pack->getpayloadlength ();
memset (m_buf,0,max_packet_size);
The RTP packet that receives the video stream memcpy (M_buf, (void*) Pack->getpayloaddata (), buffsize);
Simultaneous processing of video data
M_psynvideo->issynvideo (TAUDIO,M_BUF);
Switch (Issyn)
Case
1://Play the video frame
M_pvideoout->receivevideo (m_buf,buffsize);
Break
Case
2:
Delete (M_BUF);
Video frame lag, discarding the frame
Break
Case
3:
Waite (M_BUF);//video frame ahead, waiting for the next processing
Break
}
}
3.2 RTCP Feedback Control
When the network environment is poor, unable to provide RSVP to the system, the audio and video stream can not be transmitted at the original transmission rate, otherwise there will be serious packet loss situation, the need to use RTCP for feedback control. That is, using the RTCP Send report SR and receiving the report RR Packet monitoring qos[5].

The receiving side sends RR packets to the source, which contains necessary information to estimate packet loss and packet delay jitter. The source side controls the amount of media data sent according to these information, and solves the synchronization problem in time and effectively.
According to the parameters of the evaluation RR package, the long-time indicator packet loss rate and the short-time indicator interval jitter are obtained. When the packet loss rate and jitter reach a certain value: audio, when the network packet loss rate and jitter reach a certain region, choose different AMR-WB transfer rate, to reduce the rate of audio transmission, improve transmission efficiency and system capacity, for video transmission reduces the bandwidth burden.
Video, according to different values to adjust the amount of video data sent, that is, at the sending end of the video's airspace and time domain performance balance, select Drop frame:
(1) When the packet loss rate and jitter is very high, that is, the channel rate is very low, by reducing the video frame rate, so that each frame can have better airspace quality, so that users at a lower rate, still can get better image quality.
(2) When the packet loss rate and jitter remain at medium level, that is, when the channel rate is high, the time domain quality should be given priority in maintaining a certain airspace quality, and the video continuity will be enhanced.
(3) When the packet loss rate and jitter back to a better level, that is, the high channel rate, in the airspace to a certain degree of quality, continue to improve the airspace quality, efficiency is not too high, but the image continuity of the improvement of video quality is more obvious.
4 Examples:
The Anychat uses dynamic buffering technology to adjust the buffer size in real time according to different network conditions, maintaining a balance between real-time and fluency.
When the network condition is good, the anychat will reduce the buffer capacity and improve the real-time of the audio and video.
When the network condition is poor, the anychat will increase the buffer capacity, which will bring some delay, but can guarantee the smoothness of the audio and video, effectively eliminate the impact of network jitter on the quality of audio and video playback;
According to the actual network test, Anychat's audio and video delay indicators are as follows:
Network status is good (no packet loss, network delay <=10ms): <1s
Network Status General (no packet loss, network delay <=50ms): <=1s, >=0.5s
When the network status is poor (packet loss rate <=5%, network delay <=100ms): <=1.5s
Network status is good (no packet loss, network delay <10ms): <100ms
Network Status General (no packet loss, network delay <50ms): <=100ms
When the network status is poor (packet loss rate <=5%, network delay <100ms): <=250ms
Network status is poor (packet loss rate <=20%, network delay <500ms): <=1100ms
Note: The above indicators are test values in speech mode, such as the use of singing mode, the kernel in order to ensure the smoothness of playback, will appropriately increase the buffer size, resulting in increased latency.
The Anychat Platform Core SDK V4.6 is optimized for latency, and in LAN environments, real-time HD video (720P,25FPS) calls are delayed <100ms.
5 Conclusion
This paper designs and realizes a kind of audio-video synchronization scheme which adapts to different network environments. In the design, we use RTP timestamp and cyclic buffer to synchronize audio and video in a reliable network environment, and use RTCP feedback control to dynamically change the synchronization scheme of audio and video coding mode under the dynamic network environment. This scheme has been successfully applied to the network multimedia terminal developed by the author, keeping the low packet loss rate and guaranteeing the transmission quality of multimedia information between terminals.

Instant Messaging--a detailed audio-video synchronization technology

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Instant Messaging--a detailed audio-video synchronization technology

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Instant Messaging--a detailed audio-video synchronization technology

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support