Real-time audio and video domain UDP is the king
In the Internet, audio and video real-time interaction using the Transport Layer Scheme has TCP (such as: RTMP) and UDP (such as: RTP) two kinds. The TCP protocol can provide a relatively reliable guarantee for data transmission between two endpoints, which is achieved through a handshake mechanism. When the data is passed to the receiver, the receiver checks the correctness of the data. The sender can send the next block of data only if it receives the correct acknowledgement from the receiver. If no confirmation message is received, the data block will have to be transmitted again. Although this mechanism is very reasonable for transmitting data, it can cause many problems when it transmits real-time audio and video data to the Internet. The first is the delay problem, in the transmission channel packet loss rate is high, the TCP transmission quality falls seriously, retransmission congestion causes audio and video delay is very large, loss of real-time interoperability significance. In particular, wireless channel (WiFi, 4G, 3G), the use of TCP two-way communication stability is poor, easy to appear audio and video long time stuck and then quickly put the phenomenon.
More products to choose from the Protocol is UDP (the general upper Application layer protocol for RTP to provide serial number and audio and video synchronization Services). Compared with TCP, UDP can provide higher throughput and lower latency, which is very suitable for low latency audio and video interactive occasions.
Problems with UDP transport:
The improvement of UDP performance is not guaranteed data integrity at the expense of, it can not guarantee the data transmitted, the common problems are packet chaos, packet loss, packet duplication. Wireless channel (WiFi, 4G, 3G), UDP packet chaos and packet loss can be said to be the normal.
The reasons for the packet chaos and packet loss are summarized as follows in reference to many literatures:
Reason for disorderly order:
(A) packet scrambling caused by a router's storage queue.
(B) UDP packets have been routed through different routes causing confusion in sending data.
Reason to lose package:
(A) When routers and gateways are congested, some packets may be discarded, which typically occurs because the packets transmitted in the network are larger than the network channel's carrying capacity.
(B) packet data has a lifetime limit at the time of transmission to avoid the presence of a dead loop in the route, and when the network is in bad condition, the grouping may be lost.
(C) When the receiver is overloaded, it may not be able to process the network data in time due to scheduling difficulties.
A small amount of video stream loss will result in the video after decoding the phenomenon of flower screen. The high compression rate video compression standard such as H264 and HEVC makes the compression redundancy very low, the loss of code stream in addition to affect the decoding of this frame, will also affect the video frame decoding as a reference, resulting in the cumulative spread of the flower screen, until the next key frame of the arrival of the video screen can be restored. Although the decoder will do a certain amount of error concealment, but the effect is not ideal, especially the use of ffmpeg this type of open source decoder, the error concealment algorithm to do a relatively simple. To this end, in many products have to adopt a small GOP (small I frame interval), in order to appear after the loss of the packet screen can be used as soon as I frame code flow refresh screen. This method has a large side effect and may even backfire in some situations. Because I frame compression efficiency is far inferior to P frame, b frame, I frames tend to be much larger than P frame, b frame, frequent I frame will bring constant fluctuation pressure on the transmission channel, resulting in more serious packet loss, chaos sequence. In addition, because of the encoder code rate control, I frame occupy a lot of code flow, followed by the p, B frame will have to adopt a larger QP quantization parameters (poor image quality) to ensure that the bit rate of the local control, so that the intuitive feeling is the image with the I frame interval periodic hollow, mosaic. An ordered UDP packet recovery in a direct delivery decoder will also cause the decoding of the splash screen, because the decoder will discard the late packets inside.
To sum up, the project is in urgent need of a packet-loss, anti-chaos of the enhanced UDP scheme to enhance real-time audio and video transmission effect, after years of accumulation and improvement, we have launched a set of RTP and using FEC forward error correction and back-end QoS processing complete solution, the effect is very obvious.
Use Fec\qos armed RTP
For the packet loss, we use the improved Vandermonde matrix FEC (Forward error/erasure correction) forward error correction technology to carry out packet recovery, by the sender of FEC code to introduce redundant packets, The receiver decodes the FEC and recovers the missing packets.
For packet chaos and packet duplication, we use QoS chaotic recovery processing, which is characterized by no system delay, and can be adapted to different channel chaos sequences by the controllable packet-loss waiting delay. The QoS needs to be carried out before FEC decoding at the receiving end to ensure that the data packet number of the FEC decoding module is correct (there is no disorderly order and only packet loss).
A number of product cases show that: the use of FEC+QOS+RTP combination, can significantly improve the UDP transmission packet loss, disorderly order resistance, for the upper voice and video services to provide a strong guarantee. Figure 1 Below is a description of the location of each module in the system.
Fig. 1 Position of FEC and QoS in RTP system
There are several points to note:
(A) from the point of view of error control, the transmission channel can be divided into random channel, burst channel and mixed channel. In the random channel, the packet loss appears to be random and independent of each other, satisfying the normal distribution. In the burst channel, the packet loss is concentrated, in some short time interval there will be a large number of packet loss, and in these time intervals there is a longer packet-free interval. Mixed channel is the combination of the above two. This scheme focuses on the improvement and optimization of the transmission link with stochastic channel characteristics.
The research on the packet loss characteristics of Internet channel shows that in most cases it satisfies the characteristics of the random channel and the missing is a single packet. The probability that two or more packets are lost at the same time is higher than the pure stochastic process, but the probability of occurrence is lower than that of single packet loss, and the probability of losing more than 10 packets continuously is lower. Because of the most frequent loss of single packets, our anti-loss package focuses on the repair of single packet losses, as well as on a small number of consecutive missing packages. The repair of a large number of consecutive lost packages is relatively less important (the probability of occurrence is low, the cost of repair is large).
(B) Of course, any error control scheme has its maximum error-correcting capability limit, when the packet loss rate exceeds the current system's error correction capability, the packet loss can not be recovered, for video applications means that the video will appear flower screen.
To improve the system's user experience at high packet loss rates, to avoid the phenomenon of long time flower screen can not be refreshed, we recommend that users adopt ARQ (automatic request for +FEC) mechanism, where the ARQ request is not a request for remote repeat the loss of the packet, because that is equivalent to go the TCP such embedded ARQ protocol in the old way , it is necessary to introduce an uncontrollable delay. The ARQ here only requests the remote immediate coding video key frame, avoids the long time the flower screen cannot refresh the phenomenon, the ARQ request generally passes through the additional TCP channel to emit (in most systems, the communication two sides generally will have the TCP signaling channel, uses for the two business layer signaling interaction). The launch of ARQ is based on the FEC decoding output video stream is lost packet as the basis for judgment, both the sender and receiver need to do some protection to the frequency of ARQ, to avoid frequent initiation and response, causing too much I frame (too many I frame side effects have been enumerated before).
Test effect
This program is for C + + development, provide PC, Android (JNI), iOS cross-platform support. To facilitate the testing, we developed a few simple test demos under the PC to validate the demo.
(A) Data validation Demo
The following illustration shows the data validation demo interface, which takes the specified data as a test source to help the user better understand the processing process.
Figure 2 Data Flow Verification Demo
The test tools are point-to-point mode, can be run on both PCs (also support stand-alone mode, only send and receive IP addresses are set to local IP), in order to achieve RTP (FEC+QOS) communication between the two sides.
The software sends and receives the custom test package data, provides the simulated packet loss function, supports dropping packets at fixed intervals or dropping packets at random rate, supports setting FEC redundancy or selecting redundancy adaptive, and supports setting QoS packet latency delay and other parameters.
Within the test tool, the default is to use 10 media packets plus redundancy (the amount is determined by the choice of redundancy) as a group, when the redundancy 20% is selected, a group consists of 2 redundant packages attached to 10 media packets. The following figure is an observation of Wireshark, with 10 media packs followed by 2 redundant packets.
Figure 3 Wireshark observation of redundant packets
Need to explain: The program active packet loss is in the UDP send layer, so that may lose the media packet may also lose redundant packets.
Below we take 20% redundancy as an example to illustrate the system's resistance to all kinds of packet loss rate.
When you choose to drop 1 packets per 10 packets (packet loss rate 10%), a group will discard up to 1 packets. 20% redundancy enough to resist the loss rate, the test results also validated this conclusion, received the number of all media packets remain continuous, drop rate from 10% to 0%, the experiment as shown in Figure 2.
When you choose to discard 1 packets per 5 packets (packet loss rate 20%), the packet loss scenario is shown in Figure 4 below:
Figure 4 When 1 packets are discarded per 5 packets
For the first group, three packages were discarded, including NO. 0 media packets, 5th media packs, and NO. 0 redundant packets. The receiving FEC is unable to recover because the number of packets received is 8 plus the number of redundant packets received is 1, the total is less than the total number of media packets (10). For the second group, only two media packets are lost and can be recovered normally. The experimental results shown in Figure 5 show that the inference is correct, No. 0 media packets, 5th media packets are lost, 13th, 18th Media packets are successfully restored, and the system drop rate from 20% to about 10%.
Figure 5 20% Recovery When redundancy is discarded for each 5 packet
(B) Audio and video test demo
Figure 6 Audio and video test demo
This demo supports the following features:
(1) Use direct for camera, microphone acquisition and output
(2) using ffmpeg for efficient image scaling, such as front filter
(3) Video H264 highprofile encoding and decoding
(4) Audio AAC-LC, aac-ld, Aac-eld encoding, decoding (three standard optional, 44.1KHZ 16bit 2 channel Stereo)
(5) Audio and video RTP transmission (with Fec\qos function)
(6) Artificial packet loss testing function
(7) Real-time statistics output line loss packet delay
The internal framework of the demo is shown in Figure 7 below:
Figure 7 Demo Internal Threading framework
Between the video capture scaling thread and the video coding thread, we employ an efficient two-queue mechanism (the queue element is a pointer, access queue without data copy), if the video coding performance is very sufficient, we can also combine the two into one thread. The encoding thread separates from the network sending thread, avoids the network congestion to affect the coding thread (this is not valid for UDP, but for rtmp such TCP systems, the separation of the network transceiver thread and the audio video codec thread is necessary, because the network jitter will affect the audio and video processing link. From the point of view of system design, we use the above architecture for UDP and TCP uniformly.
In the audio and video transmission module, we are equipped with a timed handshake packet send thread, this thread and audio and video use the same send channel (port), only on the Baotou to distinguish. Its role is very important, mainly includes two aspects: provides the safeguard for the Nat pass through, when our client (intranet IP) sends the data to the server (public network IP), the link router will map "the port" for this communication link, such server (public network IP) sends the data to this client, Simply flip the IP address and port of the received packet as the destination IP and port to send data to it. When the router receives the server's packet, it checks for the corresponding map "port" and releases it, otherwise the packet will be discarded (this is based on security considerations, the external network to send data to the intranet has to prevent). It is noteworthy that routers on the "port" is time-sensitive, more than a certain amount of time will be invalidated, in order to ensure that the server can continue to effectively send data to the client, the client must be a heartbeat packet to send data to the server to maintain the "port" validity (the client is not necessarily to the server to send packets , possibly only as a recipient). The above is a brief description of NAT in C/s mode, Peer-to-peer mode and other circumstances please refer to the special information. The second role of the timing handshake package is to transmit the custom channel statistics, which is similar to the RTCP protocol, and the receiver can tell the sender by this handshake packet after the packet loss rate is dropped, in order to inform the other party to adjust the sending or even coding strategy.
Audio processing flow is similar to video, because the audio coding time is very low, we generally put audio acquisition and coding in a thread. The output of the audio is different from the video because it needs to work at a fixed output frequency, the driver will initiate a timed output thread, we only need to deposit the specified amount of PCM data within the thread. (Output frequency, the number of storage by the audio output channel number, sampling rate, sampling point of the configuration of the number of bytes)
If the local IP and remote IP settings, the demo into the local loop mode, there will be no network packet loss, we can set up manually dropped packets to simulate the test. If the local IP is different from the remote IP, and does not belong to the same network segment, we can use open source Wanem to simulate packet loss, delay, jitter, repeat package, and so on, this way we will be specifically introduced.
Note: True video contrast effect please jump video contrast effect watch
Figure 8 using 4% random packet loss, close FEC regular flower screen, intermittent intermittent sound
When the use of 4% random packet loss, if the turn off the FEC function, receiving video will appear regular flower screen, sound loss intermittent. If the use of 20% redundancy, video screen probability will be significantly reduced (will not be completely eliminated, because the packet loss is random, a short period of time may appear a large number of continuous packet loss, more than 20% redundancy of the distortion-free packet rate is 16.67% will appear flower screen)
If the use of packet loss per interval, each 6 packets lost one (packet loss rate will be constant to 16.67%), at this time choose 20% Redundancy can achieve no distortion recovery, video smooth without flower screen, sound quality without interruption.
Fig. 7 using 16.67% constant packet loss, FEC uses 20% redundancy audio-video effect is good
More
For more FEC related articles please refer to www.mediapro.cc