Using FEC to improve UDP (RTP) audio and video transmission effects

Last Update:2018-07-26 Source: Internet

Author: User

Tags constant pack

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

real-time audio and video domain UDP is the kingly

The Transport layer scheme for real-time audio and video interaction on the Internet has two types: TCP (e.g. RTMP) and UDP (e.g. RTP). The TCP protocol provides a relatively reliable guarantee for data transmission between two endpoints, which is achieved through a handshake mechanism. When the data is passed to the receiver, the receiver checks the correctness of the data. The sender only receives the receiver's correctness approval to send the next block of data. If no confirmation message is received, the data block will have to be re-transmitted. Although this mechanism is very reasonable for transmitting data, it can cause a lot of problems when using it to transmit real-time audio and video data over the Internet. The first is the delay problem, when the transmission channel packet loss rate is high, the transmission quality of TCP is serious, the retransmission congestion causes the audio and video delay is very big, loses the real-time interoperability significance. In particular, the wireless channel (WiFi, 4G, 3G), using TCP to do two-way communication stability is not good, easy to appear audio and video long time stuck and quickly put phenomenon.

More products are selected using the Protocol is UDP (general upper layer Protocol for RTP, to provide serial number and audio and video synchronization Services). UDP provides higher throughput and lower latency compared to TCP, and is ideal for low-latency audio and video interactions.

problems with UDP transport:

The improvement of UDP performance is at the cost of not guaranteeing data integrity, it can not guarantee the data transmitted, common problems include packet chaos, packet loss, packet duplication. Wireless channel (WiFi, 4G, 3G), UDP packet chaos and packet loss can be said to be the norm.

For the reasons of packet chaos and packet loss, refer to a number of literature summarized as follows:

Cause of disorderly order:

(A) packet chaos caused by the router's storage queue.

(B) UDP packets are routed through different routes causing confusion in sending data.

Packet Drop Reason:
(A) When routers and gateways are congested, some packets may be discarded, which typically occurs because the packets that are transmitted over the network are larger than the load capacity of the network channel.
(B) packet data is transmitted with a time-to-live limit to avoid the occurrence of dead loops in the routing and the packet may be lost when the network is in bad condition.
(C) The operation of the receiving terminal overload may be difficult to process the network port data in a timely manner.

A small loss of video stream can cause the video to appear in the screen after decoding. H264, HEVC Such high compression rate video compression standard makes the compression redundancy is very low, in addition to the loss of the code stream to the decoding of this frame, will also affect the video frame decoding as a reference, resulting in the cumulative spread of the flower screen, until the next key frame of the arrival of the video screen can be restored. Although the decoder inside will do some error masking processing, but the effect is not ideal, especially the use of ffmpeg such an open source decoder, its error concealment algorithm is relatively simple. To this end, in many products have to adopt a smaller GOP (small I-frame interval), in order to appear after the loss of the screen can be as soon as possible with the I-frame stream to refresh the screen. This method has side effects and may even backfire in some cases. Because I-frame compression efficiency is not as good as P-frame, B-frame, I-frame is often much larger than P-frame, B-frame, the frequent I-frame will give the transmission channel continuous fluctuations in pressure, resulting in more serious packet loss, chaos sequence. In addition, because of the encoder code rate control, I frame occupies a lot of code flow, immediately after the P, B frame will have to use a large QP quantization parameters (poor image quality) to ensure the local control of the rate, so that the intuitive feeling is the image with I frame interval periodic hollow, mosaic. The sequence of UDP packets without sequential recovery of the direct delivery decoder will also cause the decoding of the flower screen, because the decoder inside will be late packets discarded.

To sum up, the project urgently needed a anti-loss packet, anti-Chaos enhanced UDP scheme to enhance the real-time audio and video transmission effect, after years of accumulation and improvement, we have introduced a set of RTP-based and FEC forward error correction and back-end QoS processing of a complete solution, the effect is very obvious.

armed RTP using Fec\qos

For packet loss, we use the improved Vandermonde matrix FEC (Forward error/erasure Correction) forward error correction technique for packet loss recovery, which is introduced into the redundancy packet by FEC coding by the sender. The receiver makes FEC decoding and recovers the lost packets.

For the packet chaos and packet duplication, we adopt QoS Chaos Recovery processing, the QoS scheme is characterized in the absence of packet loss, no system delay, and can be controlled by packet loss waiting time delay to adapt to different channel chaos degree. QoS needs to be performed prior to FEC decoding at the receiving end, ensuring that the packet sequence number of the FEC decoding module is correct (there is no disorderly order and only drops are present).

A number of product examples show that: The combination of FEC+QOS+RTP, can significantly improve the UDP transmission of packet loss, disorderly sequence resistance, for the upper audio and video services to provide a strong guarantee. Figure 1 Below is a description of the location of each module in the system.

Figure 1 The position of FEC and QoS in the RTP system

Here are a few points to note:

(A) from the point of view of error control, the transmission channel can be divided into random channel, burst channel and mixed channel. In random channels, packet loss occurs randomly and is independent of each other and satisfies the normal distribution. In the burst channel, the packet loss is concentrated, and in some short time intervals there is a large number of drops, and there is a long absence of packet loss outside these time intervals. The hybrid channel is the combination of the above. This scheme is focused on improving and optimizing the transmission link with stochastic channel characteristics.

The research on the packet loss characteristics of Internet channel shows that in most cases it satisfies the characteristics of random channel, and the missing is a single packet. Although the probability of two or more packets losing at the same time is higher than the pure stochastic process, the probability of occurrence is lower than the single packet loss, and the probability of losing more than 10 packets in a row is lower. Because single-packet loss occurs most frequently, our anti-drop package focuses on the repair of single-packet loss, and should also take into account the repair of a small number of consecutive lost packages. The repair of a large number of consecutive lost packages is relatively less important (low probability of occurrence, the cost of repair is large).

(B) Of course, any error control scheme has its maximum error correction ability limit, when the packet loss rate exceeds the current system's error correction ability, the packet loss can not be restored, for video applications means that the video will appear a flower screen.

In order to improve the user experience of the system in the high packet loss rate, to avoid the phenomenon of long time screen cannot be refreshed, we recommend that users adopt ARQ (Automatic request retransmission) +FEC mechanism, where the ARQ request is not to request the far end to re-send the lost packets, because that is equivalent to go TCP this kind of embedded ARQ function protocol. , it is inevitable to introduce an uncontrolled delay. The ARQ here only requests the remote immediate encoding video key frame, avoids the long time the screen cannot refresh the phenomenon, the ARQ request generally sends through the additional TCP channel (in most of the system, the communication both sides generally will have the TCP signaling channel, for the mutual business Layer signaling interaction). The initiation of ARQ is based on FEC decoding output video stream whether or not packet loss as the basis for judgment, both the sending and receiving sides need to do a certain amount of ARQ frequency protection measures to avoid frequent initiation and response, resulting in excessive I frame (excessive I-frame side effects have been enumerated).

Test Effect

This program is developed for C + + and provides cross-platform support for PC, Android (JNI), and iOS. To facilitate testing, we developed a few simple test demos under the PC to verify the demo.

(A) Data validation Demo

The following illustration shows the data validation demo interface, which takes the specified data as a test source to help users better understand the process flow.

Figure 2 Data Flow validation Demo

The test tool is a point-to-point mode of operation that can be run on both PCs (while also supporting standalone mode, by setting the send and receive IP addresses to the local IP) to enable RTP (FEC+QOS) communication between the two parties.

Software to send and receive customized test package data, provide simulation packet loss, support to drop packets at a fixed interval or random rate drops, support set FEC redundancy or choose redundancy Adaptive, Support set QoS packet latency delay and other parameters.

The test tool internally defaults to 10 media packets plus redundancy (the number is determined by the choice of redundancy) as a group, when the redundancy 20% is selected, a group consists of 10 media packets with 2 redundant packets attached. The following figure is an observation of Wireshark, followed by 2 redundant packets after 10 media packets.

Figure 3 Wireshark observation of redundant packets

It is necessary to note that the program is actively dropping packets at the UDP send layer, so that the possibility of losing media packets may also throw redundant packets.

Below, we use 20% redundancy as an example to illustrate the system's resistance to various packet loss rates.

When choosing to lose 1 packets per 10 packets (packet loss rate of 10%), a group of up to only discard 1 packets, 20% of the redundancy enough to resist this packet loss rate, the test results also verify this conclusion, received all the media packet serial number is continuous, the packet loss rate from 10% to 0%, the experimental situation as shown in Figure 2 above.

When choosing to discard 1 packets per 5 packets (packet loss rate 20%), the packet loss scenario is shown in Figure 4 below:

Figure 4 Case when 1 packets are discarded per 5 packets

For the first group, a total of three packets were discarded, including the No. 0 Media pack, the 5th Media pack, and the No. 0 redundancy package. Because the number of media packets received is 8 plus the number of redundant packets received 1, the total number is less than the total number of media packets (10), so receive-side FEC cannot be recovered. For the second group, only two media packets are lost and can be recovered normally. The experimental results are shown in Figure 5 below, which illustrates the correctness of the inference, No. 0 Media packets, 5th Media packets lost, 13th, 18th media packets were successfully restored, the system packet loss rate decreased from 20% to about 10%.

Figure 5 20% redundancy, recovery when 1 packets are discarded per 5 packets

(B) Audio and video test demo

Figure 6 Audio and video test demo

This demo supports the following features:

(1) Use direct for camera, microphone acquisition and output

(2) using ffmpeg for efficient image scaling, such as front filter

(3) Video H264 highprofile encoding, decoding

(4) Audio AAC-LC, aac-ld, Aac-eld encoding, decoding (three standard selectable, 44.1KHZ 16bit 2 channel Stereo)

(5) Audio and video RTP transmission (with Fec\qos function)

(6) Man-made packet loss test function

(7) Real-time statistics output line packet loss delay situation

The internal framework of the demo is shown in Figure 7 below:

Figure 7 Demo Internal Threading framework

Between the video capture zoom thread and the video encoding thread, we adopt an efficient dual queue mechanism (the queue element is a pointer, no data copy in and out of the queue), if the video encoding performance is very sufficient, we can also combine the two into one thread. The encoding thread is separated from the network send thread to avoid network congestion affecting the encoding thread (this is not true for UDP, but for the TCP system such as rtmp, the separation of the network sending and receiving threads from the audio and video codec threads is necessary because the jitter of the network will affect the audio and video processing links. At the point of view of system design, we use the above architecture for UDP and TCP uniformly.

In the audio and video transmission module, we are equipped with timing handshake packet sending thread, this thread and audio and video use the same send channel (port), only in the Baotou to distinguish. Its role is very important, including two aspects: to provide protection for NAT traversal, when our client (intranet IP) to the server (public IP) to send data, the link router for the communication link map "port", so that the server (public IP) to the client to send data, Just send the IP address and port rollover of the received packets as the destination IP and port to send the data to. When the router receives the packet from the server, it checks the local presence mapping "port" to be released, otherwise it will discard this packet (this is based on security considerations, the external network to send data to the intranet has to prevent). It is worth noting that the "port" on the router is time-sensitive and expires over a certain period, in order to ensure that the server can continue to effectively send data to the client, the client must send data to the server in the heartbeat packet to maintain "port" validity (the client does not necessarily send packets to the server according to the business situation) , may only be used as a recipient). The above is a brief description of NAT in the C/S mode, and other situations such as peer mode, please refer to the specific information. The other function of the timing handshake packet is to transmit the custom channel statistics, which is similar to the RTCP protocol, and the receiver can tell the sender through the handshake packet, so as to notify the other party to adjust the sending and even encoding policy.

The audio processing process is similar to video, because audio encoding is very time consuming and we typically put audio capture and encoding into one thread. The output of the audio is different from the video because it needs to operate at a fixed output frequency, and the driver initiates a timed output thread, and we only need to deposit the specified amount of PCM data to the specified memory within that thread. (The output frequency, the number of storage by the number of audio output channels, sample rate, the number of sample point bytes configuration)

If the local IP is the same as the remote IP setting, the demo enters the local loopback mode, there will be no network drops at this time, we can simulate the test by setting up a manual packet drop. If the local IP is different from the remote IP and does not belong to the same network segment, we can use the open source Wanem to simulate packet drops, delay, jitter, repeat packets, and so on, which we will specifically introduce later.

Note: The real video contrast effect please jump video contrast effect watch

　　　　　　　
Figure 8 using 4% random drops, frequent flower screens when turning off FEC, intermittent sound

When using 4% random packet loss, if the originator FEC function is turned off, the receiving video will appear regular flower screen, the sound loss intermittent. If 20% redundancy is used, the video flower screen probability will be greatly reduced (will not be completely eliminated, because the packet loss is random, there may be a large number of drops in a short period of time, more than 20% redundancy of non-distortion anti-packet loss rate is 16.67% will appear flower screen)

If the use of interval drops, every 6 packets lost one (packet loss rate will be constant 16.67%), the choice of 20% redundancy can be achieved without distortion recovery, video smooth without a flower screen, sound quality without interruption.

Figure 7 Using a 16.67% constant packet loss, FEC uses 20% redundancy when the audio and video effect is good

more

For more FEC related articles, please refer to www.mediapro.cc

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More