Transferred from: http://www.aiweibang.com/m/detail/104476372.html?from=p
With the gradual popularization of WEBRTC standard, real-time audio and video communication technology has been paid more and more attention by companies and technicians. For interactive audio and video applications, stability, low latency, and clear and reliable call quality are the basic requirements. In the Internet environment, the quality of voice and video communication is related to the following factors: first, encoding factors such as coding rate, frame rate and resolution, and the other is the network access type and the performance of the access device; The third is the adaptive adjustment of packet loss, jitter, chaos, and network congestion, i.e. QoS (quality of Service, Quality of service). Jong communication is the earliest and most communication capability of the PAAs service provider, in the introduction of audio and video calls this key capability, pay more attention to ensure QoS (quality of service, quality of services), improve the user experience. This paper mainly introduces the key technologies used in the process of audio and video transmission and processing in order to guarantee the QoS.
Interactive real-time video applications typically use RTP protocols for audio and video transmission, and RTP headers provide information such as load types, timestamps, serial numbers, and synchronization sources to ensure basic audio and video transmission requirements. However, unlike TCP, the RTP protocol is based on an unreliable UDP Transport layer protocol, and when the network is overloaded or congested, the adaptive adjustment of packet loss, jitter, chaos and network congestion cannot be realized. Compared with audio, video transmission due to the larger bandwidth, more susceptible to the impact of network environment changes, so the following video as an example to analyze the QoS improvement approach.
First, handling the packet loss
To the real-time video, the network drops will directly lead to the reception screen mosaic and huaping. There are several strategies that can be addressed, including: Packet retransmission based on nack feedback, forward error correction FEC, and reference frame selection RPS, which are often used in conjunction with fault-tolerant techniques at the codec end, such as intra-frame refresh and error concealment.
Packet loss retransmission method based on Nack feedback: receiving loop checks the receiving buffer, when the packet is found, the RTCP nack feedback message is used to feed the packet back to the sending side, and the sending side receives nack feedback and resolves the corresponding RTP packet from the sending cache and sends it back to the receiving end. The disadvantage of this method is that it increases the end-to-end delay, especially when the packet loss occurs significantly.
The forward error correction FEC:FEC mechanism is to send redundant video RTP packets at the receiving end based on the importance of the video frame (reference frame or non-reference frame), and if packet loss is detected at the receiving end, the redundancy packet is used to recover, otherwise the redundant packet is discarded. The advantage of this method is that there is no delay in video, but sending redundant packets takes up additional bandwidth resources.
A more feasible scenario is the hybrid NACK/FEC mode, where the receiver estimates the available bandwidth based on the frame size and receive delay, and the sender calculates the allocation protection overhead based on feedback from available bandwidth, drops, and RTT (protection overhead, including FEC bitrate, NACK bitrate) and the ratio of video encoding bitrate. Specifically, the level of protection of FEC (protection levels) depends on the round-trip time RTT, when the RTT is small, the delay of packet retransmission will not cause noticeable video lag, so the number of FEC packets can be reduced correspondingly, when the RTT is large, the delay has obvious effect on the video smoothness. Therefore, the number of FEC packets should be increased accordingly. In addition, you can use multi-frame FEC and FEC that combine time-domain hierarchical information, both to reduce protection overhead while providing lower rendering jitter, lower end-to-end latency, and higher video quality.
Two, congestion control and adaptive bandwidth adjustment
The congestion control technology has been put forward for a long-standing, TCP protocol stack by default realizes the congestion control of the network to ensure reliable transmission. But in some cases TCP does not apply, such as: Wireless transmission channel, high-speed long-distance transmission network, real-time communication applications. To this end, the IETF rmcat (RTP mediacongestion avoidance Techniques) workgroup proposes a series of congestion control algorithms for real-time communication applications, including: Effective control of end-to-end latency, effective control of packet loss, The stream sharing link bandwidth with other applications, the ability to compete fairly with the TCP long connect stream, and the available link bandwidth. Companies such as Google, Cisco and Ericsson have put forward their own congestion control algorithms for real-time interactive applications, and the internal implementation of open source engineering WEBRTC uses Google's algorithm: Google congestion Control, or GCC.
The GCC algorithm is a mixture of packet loss based and delay based methods, the principle is as follows:
The sending end adjusts the target bandwidth according to the packet loss, specifically: the low packet loss rate (less than 2%) increases the target rate, the high packet loss rate (greater than 10%) to reduce the target rate, the drop rate between the two when the target code rate remains unchanged;
The receiver side estimates the maximum bandwidth according to the time delay, consists of three modules: Queue delay estimation, chain pass-through detection and maximum bandwidth estimation module, the relationship between three modules is: when the queueing delay is less than the threshold (according to the Network State Adaptive Adjustment), the link detection result is underuse; when the queueing delay is greater than the threshold value, The link detection result is overuse; In between, the link detection result is normal; the implementation of the maximum bandwidth estimation module is a finite state machine that represents the current link state (increase, hold, decrease), and the initial state is The State migration is based on the link detection results, and the maximum bandwidth remb is estimated based on the migrated link state and the current receive bitrate.
The combination of the two processes: The receiver calculates the REMB value through the RTCP remb feedback to the sending side, the final target bitrate of the sender should not exceed the REMB value.
Three, key frame request
KeyFrames are also called instant refresh frames, referred to as IDR frames. For video, the decoding of IDR frames does not need to refer to the previous frame, so when packet C is critical, you can restore the screen by sending a keyframe request. There are three ways to request keyframes: RTCP fir Feedback (full intra frame request), RTCP PLI feedback (picture Loss indictor), or SIP info messages, which can be determined by negotiation.
Iv. Other
In addition to the above methods, video pre-processing module can also be used to analyze the content, such as: motion complexity, texture complexity, and congestion control module with the adaptive frame rate and adaptive resolution adjustment.
To sum up, the internet for real-time interactive audio and video applications to provide QoS assurance is still a challenge, the need for audio and video encoders, transmission, preprocessing and other multi-module collaboration, or the use of existing network protocols and equipment support, to provide customers with more choices and service assurance.
Analysis of QoS key technology in real-time video application