Android IOS WebRTC Audio and Video Development summary (76)--a discussion on the live low latency low-flow fan-to-Mac technology

Source: Internet
Author: User

This paper focuses on the WEBRTC-based direct-to-peer streaming technology (Shi, Pro Gajun CTO, Editor: Dora), first published in " here "

Support the original, reprint must indicate the source, welcome attention to the public number blacker (Id:blackerteam or WEBRTCORGCN)

So far, the live industry continues as expected in full swing development, in the competition after the delay, HD, beauty, seconds open and other functions, the recent major live platform to compete a hot spot is even wheat. What is a Mai? Simple?? This is when the host can interact with one of the fans during the live broadcast, and other fans can watch the interaction.

The operation of the MAI to the host and fans of the interaction from the text chat to upgrade to audio and video interaction, this function instantly?? With the participation of the fans, and other fans of this interactive process can see that the happiness of even wheat fans is greatly satisfied, even the flow chart of the wheat is as follows:

1 during the host live broadcast, the host?? Interactive session, the user can participate in interactive 2 user requests to participate in the interaction, the moderator agreed to a user's request;
3 The user participates in the live broadcast, the user and the host's interactive process live to all other fans;

So how do you achieve a function like this? Today we will introduce several methods of implementation;

The first way is through two RTMP streams to achieve

The current live protocol is commonly used by the RTMP protocol, which is an adobe implementation of a proprietary protocol for audio and video and data transfer between Flash players and the server. This protocol is based on TCP implementations, with multiplexing, signaling and media transmission through a single channel.

At present, the domestic live CDN basically use this protocol, its latency is about 3 seconds or so, because the data of this protocol is unidirectional, so if the use of this protocol is used by the Mac function, it is necessary to publish and subscribe the two video streams, the schematic is as follows:


1 The host first publishes the video to the streaming media server, the user pulls the video information from the streaming media server;
2 One of the users want to connect with the host, he through the signaling server to the host to request the MAI, the host agreed to the wheat request, 3 Even the wheat release video to the streaming media server;
4 the host side and other users to obtain the video released by the wheat, in the mobile phone using picture-in-picture form display;

In this program, the host and the fans involved in the MAI released a stream of video, watching fans simultaneously pull two video streams. This is a very simple way to achieve the technology, but there are many problems with the experience:

First, there is too much delay between the host and the fans involved in the Mac. As you know, the delay of rtmp is about 3 seconds. If the host and the users involved in the MAI need to have a conversation, then the host from?? Asked to hear the other side of the answer in principle almost 6 seconds or so, this for real-time interaction is completely no way to accept;

Secondly, the sound effect is not good, will produce echoes; the audio processing module of the normal live broadcast does not have the echo cancellation processing, so the host side can not open the audio listening of the MAI while watching the video of the MAI, otherwise it will be collected by the audio acquisition device, and the echo can be formed.

Finally, the client receives two channel video, the traffic consumes high; The general client needs to receive two-channel video to see the host and even the wheat, two video resulting in higher traffic consumption, while the two-way decoding is also more consumption of CPU resources.

From the above analysis, we can see that the above-mentioned scheme is not an acceptable programme of continuous wheat; Even the scene of the MAI is highly demanding for latency, and the R T M P protocol clearly does not meet the requirements. A better solution is to ensure that the interaction between the 2 or more of the followers meets the standards of the video conference, which is the delay within 600ms, and the overall interactive process for video blending, which is output in RTMP mode. In other words, this scheme actually involves two sets of systems, one is to ensure low-latency multi-person audio and video interactive system, the other is a standard CDN live system, live system you already know, the following highlights the low-latency Interactive system features:

1 Live system is a one-way data channel, and the low-latency video conferencing system is a two-way channel. This makes the system so easy to expand in support of large concurrency, and its network topology is more complex.

2 Low-latency system transport layer is generally used UDP, the application layer uses the RTP/RTCP protocol to ensure the instant packet, in order to ensure security, more systems in the use of SRTP protocol, it is based on RTP a layer of security and authentication measures; client connections are established using the ICE protocol, It combines the environment of the host in the private network, the communication parties first collect as many connection addresses from Stun,turn, then prioritize the addresses and choose the best way to connect, which is also good for scenes that do not use NAT penetration; It can guarantee the connection rate of different network customers, for example, some overseas customers directly connected to the domestic server effect is not good enough, can consider through the TURN service to transit, thereby guaranteeing the quality of service;

3 The use of UDP will involve network latency, packet loss, so to consider QoS, the main strategy includes:
A jitter cache (jitter buffer) is used to eliminate the jitter characteristics of the network packet, and the packet is delivered at a steady rate

To the subsequent module processing, audio and video need to have their own jitter cache, and then achieve synchronization;
b in the audio aspect, it is necessary to implement the packet loss concealment algorithm; Gips Company's NETEQ algorithm should be the industry's most recognized

Good VOIP anti-jitter algorithm, currently in the WebRTC project open source;
c Video, we need to implement an adaptive feedback model, which can adjust the packet loss protection strategy according to the network congestion situation. A little;

When the RTT is large, FEC can be used for data protection, and when the RTT is small, the NACK mechanism is chosen.

Next, based on the model discussed above, this paper introduces two ways to realize the continuous wheat, both of which can guarantee the effect of the wheat, their main difference is one of the use of peer-to-peer technology to connect the Mac, the other one uses multi-person video conferencing system to support the MAI, as follows.

The second way is to Peer + live with the same way, the schematic diagram is as follows:

1 The host first publishes the video to the streaming media server, the user pulls the video information from the streaming media server;
2 Even MAK request even Mak, at this time the host side will pop up a request, the host choose even MAK user, even the wheat and the host to establish peer-to connection;
3 the host side and the followers have established the peer channel, through this channel for audio and video data interaction;
4 host from the camera to capture the host video, from the peer-to access to the video of the MAI, and then the two images mixed, then released to the host module, live out;

The advantages of this approach are:
1 The interaction delay between the host and the MAI is small, since the two are peer-to, so the network latency is very small, generally in the magnitude of hundreds of milliseconds. The interaction between the host and the MAI is very smooth;
2 sound effect is good, the host side uses the Echo cancellation module, the echo of the wheat will be eliminated, and the voice communication between the host and the headset will be broadcast as a whole.

The problem with this approach is that:
1 host side equivalent to have two video uploads (live video + The video interaction of the MAI), one video download (even the Michael's video), the network requirements will be higher. Our team in the normal telecommunications, Unicom and other WiFi and 4G network testing, host-side bandwidth can fully meet the requirements;
2 does not support the simultaneous communication of multi-channel wheat;

The third way through video conferencing + Live broadcast way to achieve

To enable multiple fans to connect simultaneously, consider using a video conferencing system between the host and the Multi, using an MCU (Control Unit) for media data forwarding. Then the multi-channel data is mixed through the MCU, and then the mixed stream is sent to the CDN, the schematic diagram is as follows:

1 host side to join the video conferencing system; Note that the host side no longer directly push the video to the CDN;

2 video conferencing system to the host's video stream to the CDN, the audience through the CDN to watch the host video;
3 participating in the MAI audience log in to the same video conferencing channel with the host, at which point the host and the headset interact through real-time video conferencing, and the video of the host and the Mac is then output to the CDN after the server is mixed;
4 other users through the CDN to watch the host and the interaction of the followers;

The advantage of this approach is that:
1 The interaction delay between the host and the MAI is very small; due to the use of video conferencing system, through the server to do a forwarding, the basic delay is less than one second;
2 The host side only undertakes the video conference interactive traffic, but does not need to undertake the live stream upload traffic, to the network request is lower than the peer-to way;
3 support multi-person interaction;

The downside is:
1 The service side compared to the general live broadcast system, but also increased the video conferencing system, the development complexity is high; 2 audio and video mixing is done on the server, and the performance requirements are high;

The above is a simple introduction to the implementation of the way, these three ways in the actual project has been used, in principle, the latter two methods of experience will be better, especially the third scenario, he can support small-scale multi-person real-time interaction, but the development of this program is large, but also familiar with the video conferencing and live broadcast team is missing, High demand for research and development team; The second option can be implemented on the basis of WEBRTC and live streaming technology, and the more familiar teams can try to integrate.


Question 1: Does the MAI technology be implemented on the client or server side? What are the pros and cons of both implementations?
Answer 1: The second scenario just described is implemented on the client side, of course, the server also needs to do some work; and

The third kind of solution is mainly in the service-side realization; The relevant advantages and disadvantages above have also been answered, we can refer to the following;

2: Does the MAI technology have an open source base version?
The solution to 2:p2p can be considered on the basis of WEBRTC, while the video conferencing + Live program is not yet

See the Open source project, you can consider the video conferencing system to transform, so that it output RTMP live;

Question 3: How much does live and user broadband need at least to smooth the Mac
Answer 3: If it is a peer-to plan, the host side bandwidth requirements will be higher; In the case of a third meeting mode, it is required

is not high, basically is the journey along the way, all the way to download; Second, we are in the 4g,10m unicom, telecommunications and other networks under the experiment are OK;

Question 4: Do you develop your peers or are they based on others?
Answer 4: We are WEBRTC on the basis of the transformation, WEBRTC video image to and camera video image synthesis;

And in the case of headphones, the audio also requires a program synthesis;

Question 5: Do you have any use of STUN or ICE technologies for firewalls or NAT?
Answer 5:ice is sure to use; For the peer network, there are many networks can not directly connect, must use TURN

Service to do the relay; For the meeting mode, can also through the TURN to do the relay, so as to solve the remote network connection unstable


Question 6: If the client is disconnected in each scenario, is the user re-connected to the MAK process again? or is it possible to hang up the video system for automatic re-connection?

Answer 6: It is possible to reconnect, do not need to go to the process of wheat;

Question 7: Why does the second scenario not support the simultaneous communication of multi-channel headsets?
Answer 7:P2P can also support multi-person interaction, but many people communicate at the same time, for the host side of the CPU pressure and

The network pressure is very big;

Question 8: What encoding do you use for your video and audio separately?
Answer 8: Universal coding scheme is: Video using H264, audio using AAC; If the end-to-end is controllable,

It is recommended to use H265 for higher compression rate;
Question 9: What is the recommended video conferencing system in the third scenario?

Answer 9: If you are interested, you can see Licode.

Question 10: How many people does the development team of the third scenario have, and how long is the development cycle generally
Answer 10: This is not a lot of people, the main or video conferencing system to better understand; If you use Licode retrofit

, the need for service-side implementation RTMP Push stream transformation, if the ffmpeg and other familiar words, one months or so can come out a basic version, but really stable down there is still a lot of work needs to be perfected;

Android IOS WebRTC Audio and Video Development summary (76)--a discussion on the live low latency low-flow fan-to-Mac technology

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.