Video Broadcast Technology details Series 3: Coding and Encapsulation,

Last Update:2016-11-09 Source: Internet

Author: User

Tags install brew docker run

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Video Broadcast Technology details Series 3: Coding and Encapsulation,

There are a lot of technical articles on live broadcasting, and there are not many systems. We will use seven articles to give a more systematic introduction to the key technologies of live video in all aspects of the current hot season, and help live video entrepreneurs to gain a more comprehensive and in-depth understanding of live video technology, better technology selection.

Video Encoding is the third article in the live video technology series. It is a very important part of this series and a required basic course for mobile development. This article covers the mainstream encoder from theory to practice.

If the streaming media is compared to a logistics system, codec is the process of goods distribution and loading. This process is very important. Its speed and compression ratio are of great significance to the logistics system, affects the overall speed and cost of the logistics system. Similarly, for streaming media transmission, encoding is also very important. Its Encoding performance, encoding speed, and encoding compression ratio will directly affect the user experience and transmission costs of streaming media transmission.

The outline of this series of articles is as follows. If you want to review previous articles, click the direct link:

(1) Collection

(2) handling

(3) coding and Encapsulation

(4) streaming and transmission

(5) latency Optimization

(6) Principles of Modern players

(7) SDK Performance Test Model

Significance of Video Encoding

The storage space of raw video data is large. A 1080 P 7 s video requires 817 MB
The original video data transmission occupies a large bandwidth. It takes 11 minutes to transmit the above 7 s video at 10 Mbps.

After H. after 264 encoding compression, the video size is only 708 k, 10 Mbps bandwidth only needs 500 MS, can meet the needs of real-time transmission, therefore, the original video collected from the video capture sensor must undergo video encoding.

Basic Principles

Why can a huge original video be encoded into a small video? What is the technology here?
The core idea is to remove redundant information:

Spatial redundancy: there is a strong correlation between adjacent pixels of the image.
Time redundancy: The content of adjacent images in the video sequence is similar.
Encoding redundancy: Different pixel values have different probabilities.
Visual redundancy: human visual systems are not sensitive to certain details.
Knowledge redundancy: regular structure can be obtained from prior knowledge and background knowledge.

Video is essentially a series of continuous and Fast Video Playback. The simplest compression method is to compress each frame of the image. For example, the old MJPEG encoding method is used, this encoding method only supports intra-frame encoding and uses spatial sampling prediction for encoding. The image metaphor is to use each frame as an image and compress the image in JPEG encoding format. This encoding only takes into account the redundant information compression in one image, the green area is the area to be encoded, and the gray area is the area not yet encoded. The green area can be predicted based on the encoded area (left, bottom, bottom left, bottom left, etc ).

Figure 1

However, due to the time correlation between frames, some advanced encoders can adopt inter-frame encoding. Simply put, some regions on the frames are selected through the search algorithm, then, it is encoded by calculating the vector difference between the current frame and the reference frame. We can see from Two Consecutive Frames below that skiing students are moving forward, but it is actually the back-shift of the snow scene. P frames can be encoded by reference frames (I or other P frames). The size after encoding is very small and the compression ratio is very high.

Figure 2

Some may be interested in how these two images come from. Here we use two lines of FFmpeg commands. For more information about FFmpeg, see the following chapters:

The first line generates a video with a mobile vector.
The second line outputs each frame as an image.

ffmpeg  -flags2 +export_mvs -i tutu.mp4 -vf codecview=mv=pf+bf+bb tutudebug2.mp4

ffmpeg -i tutudebug2.mp4 'tutunormal-%03d.bmp'

In addition to space redundancy and time redundancy compression, there are mainly encoding compression and visual compression. Below is a main flow chart of the encoder:

Figure 3

Figure 4

Figure 3 and figure 4 are two flows. Figure 3 is intra-frame encoding, and figure 4 is inter-frame encoding. The main difference shown in the figure is that the first step is different, in fact, these two flows are also combined. We generally say that I and P frames adopt both intra-frame encoding and inter-frame encoding.

Encoder Selection

The principles and basic procedures of the encoder have been reviewed. The encoder has undergone decades of development, the next generation encoder, represented by H.265 and VP9, has supported the evolution of intra-frame encoding at the beginning. It analyzes some common encoders and explores the world of encoder.

1) Introduction to H.264

H.264/AVC project intends to create a video standard. Compared to the old standard, it is able to provide quality video with lower bandwidth (in other words, only half of the bandwidth of MPEG-2, H.263 or MPEG-4 2nd ), it does not add much design complexity to make it impossible or the implementation cost is too high. Another goal is to provide sufficient flexibility for use in a variety of applications, networks and systems, including high, low bandwidth, high and low video resolution, broadcast, and DVD storage, RTP/IP networks, and ITU-T multimedia telephone systems.

H.264/AVC contains a series of new features, so that it not only can be more effective in encoding than the previous codecs, but also can be used in various network environments. H. 264 became an online video company, including YouTube, using it as its main decoder, but using it is not very easy. In theory, H. 264 of patent fees are high.

Patent License

And the first and second parts of the MPEG-2, the second part of the MPEG-4, using H. 264/AVC product manufacturers and service providers need to pay patent license fees to the holders of patents used by their products. The main source of these patent licenses is a private organization called MPEG-LA LLC, which has nothing to do with the MPEG Standardization Organization, but the Organization also manages patent licenses for MPEG-2's first sub-system, second video, MPEG-4 second video, and other technologies.

Other patent licenses are required to apply to another private organization called VIA Licensing, which also manages patent licenses that favor Audio compression standards such as MPEG-2 AAC and MPEG-4 Audio.

H.264 open-source implementation

Openh264
X264

Openh264 is an open source implemented by Cisco. 264 encoding, although H. 264 of patent fees are expensive, but there is an annual ceiling for the patent fee. After Cisco has paid the annual Patent Fee for OpenH264, OpenH264 is actually free to use.

X264 x264 is a GPL-authorized video coding software. The main function of x264 is to encode the video of H.264/MPEG-4 AVC, rather than as a decoder (decoder.

Compare the charges:

Openh264 CPU usage is much lower than x264
Openh264 only supports baseline profile and x264 supports more profiles.

2) Introduction to HEVC/H.265

High Efficiency Video Coding (HEVC) is a Video compression standard and is regarded as the successor of ITU-T H.264/MPEG-4 AVC standard. Since 2004, it was developed by ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG) as ISO/IEC 23008-2 MPEG-H Part 2 or known as ITU-T H.265. The first version of the HEVC/H.265 video compression standard was accepted as the formal standard of the International Telecommunications Union (ITU-T) in April 13, 2013. HEVC is considered to not only improve the video quality, but also achieve H. 264/MPEG-4 AVC twice the compression rate (equivalent to reduced bit rate by 50% in the same image quality), supports 4 K resolution or even to Ultra High Definition TV (UHDTV ), the maximum resolution is 8192x4320 (8 K resolution ).

Open-source implementation of H.265

Libde265
X265

Libde265 HEVC is provided by struktur with the open-source License GNU LesserGeneral Public License (LGPL), allowing viewers to enjoy high-quality images at a slower speed. Compared with the H.264 standard decoder, The libde265 HEVC decoder can bring your full HD content to up to two times the audience, or reduce the bandwidth required for streaming media playback by 264. High-definition or 4 K/8 K Ultra-high-definition streaming media playback, low-latency/low-bandwidth video conferencing, and complete mobile device coverage. With the stability of "congestion awareness" video encoding, It is very suitable for applications in 3/4 Gbit/s and LTE Networks.

Patent License

HEVC Advance requires all content manufacturers that use H.265 technologies, including Apple, YouTube, Netflix, Facebook, and Amazon, to pay 0.5% of their content revenues as technical usage fees, the whole streaming media market is about $100 billion a year, and as it continues to grow, a 0.5% tax is definitely a huge expense. In addition, they have not yet released equipment manufacturers, of which TV manufacturers need to pay a patent fee of $1.5 per device and $0.8 per mobile device manufacturer. They have not even spared vendors such as Blu-ray players, game consoles, and video recorders. These manufacturers must pay $1.1 per unit. What is most unacceptable is that the patent right of HEVC Advance is traced back to the manufacturer's "", meaning that the previously sold products still have to be paid back.

X265 is developed and open-source by MulticoreWare. The GPL protocol was adopted, but several companies that funded the project formed a coalition to use the software without the GPL protocol.

3) Introduction to VP8

VP8 is an open video compression format, which was first developed by On2 Technologies and subsequently published by Google. At the same time, Google also released the real-time library for VP8 encoding: libvpx, which was issued in the form of BSD authorization terms, and subsequently added the patent right. After some arguments, the VP8 authorization was finally confirmed as an open source code authorization.

Currently, the Web browsers that support VP8 include Opera, Firefox, and Chrome.

Patent License

In March 2013, Google reached an agreement with mpeg la and 11 patent holders to authorize Google to obtain patents that may be infringed by VP8 and its earlier VPx codes, at the same time, Google can also re-authorize the relevant patents to VP8 users free of charge, this protocol is also applicable to the next generation of VPx encoding. So far, mpeg la has abandoned the establishment of the VP8 patent concentration authorization alliance, and VP8 users will be able to determine the free use of this code without worrying about possible patent infringement authorization issues.

Open-source VP8 implementation

Libvpx

Libvpx is the only open-source implementation of VP8. It was developed by On2 Technologies. After Google acquired it, it opened its source code. The License is very loose and can be used freely.

4) Introduction to VP9

The development of VP9 began in the third quarter of 2011. The goal was to reduce the file size by 50% compared with VP8 encoding in the same image quality. The other goal was to surpass HEVC encoding in coding efficiency.

In December 13, 2012, the Chromium browser added support for VP9 encoding. The Chrome browser started to support VP9 encoding video playback in February 21, 2013.

Google announced that it will complete the preparation of the VP9 encoding in June 17, 2013, when Chrome will guide the VP9 encoding by default. In March 18, 2014, Mozilla added VP9 support to the Firefox browser.

In April 3, 2015, Google released libvpx1.4.0, added support for 10-bit and 12-bit bits, sample the color at and, and compile/decode multiple VP9 cores.

Patent License

VP9 is an open and non-Premium video encoding format.

Open-source VP9 implementation

Libvpx

Libvpx is the only open-source implementation of VP9, which is developed and maintained by Google. Some of the Code is shared by VP8 and VP9, and the rest are the coding and decoding implementations of VP8 and VP9, respectively.

Comparison of VP9, H.264, and HEVC

CodecHEVCx264vp9HEVC-42.2 % 32.6% x26475.8 % 18.5% vp948.3 %-14.6% CodecHEVC vs. VP9 (in %) VP9 vs. x264 (in %) Total Average61239399

Reference Comparative Assessment of H.265/MPEG-HEVC, VP9, and
H.264/MPEG-AVC Encoders for Low-Delay Video Applications this relatively new paper deals with the results of coding Low-latency videos.

Comparison of HEVC and H.264 in different resolutions

Compared with H.264/MPEG-4, the average bit rate of HEVC is reduced:

Resolution: 480P720P1080P4K UHDHEVC52 % 56% 62%

The visible bit rate is reduced by more than 60%.

HEVC (H.265) has a significant advantage for VP9 and H.264 in bit rate reduction, saving 264 and 48.3% respectively with the same SNR.
H.264 has great advantages in coding time. Compared with VP9 and HEVC (H.265), HEVC is six times that of VP9, and VP9 is nearly 40 times that of H.264.

5) FFmpeg

When talking about video encoding, We have to propose a great software package, FFmpeg.

FFmpeg is a free software that can run video, conversion, and stream functions in multiple formats of audio and video, including libavcodec-this is a decoder library for audio and video in multiple projects, and libavformat-a library for converting audio and video formats.

In the word FFmpeg, FF refers to Fast Forward. Some new users wrote to the FFmpeg project owner and asked if FF meant Fast Free or Fast Fourier. the FFmpeg project owner replied, "Just for the record, the original meaning of FF in FFmpeg is Fast Forward... 」

This project was initially initiated by Fabrice Bellard and is now maintained by Michael Niedermayer. Many FFmpeg developers are also members of the MPlayer project. FFmpeg is designed as a server version for development in the MPlayer project.

FFmpeg: FFmpeg Download

You can input and download data in the browser. Currently, Linux, Mac OS, and Windows platforms are supported. You can also compile data on your own to Android or iOS platforms.
For Mac OS, run brew to install brew install ffmpeg -- with-libvpx -- with-libvorbis -- with-ffplay.

What useful and interesting things can we use FFmpeg to do? Through a series of small experiments, you can see the magic and power of FFmpeg.

FFmpeg screen recording

A small example shows how to use FFmpeg for screen recording in Mac OS:

Input:

ffmpeg -f avfoundation -list_devices true -i ""

Output:

[AVFoundation input device @ 0x7fbec0c10940] AVFoundation video devices:
[AVFoundation input device @ 0x7fbec0c10940] [0] FaceTime HD Camera
[AVFoundation input device @ 0x7fbec0c10940] [1] Capture screen 0
[AVFoundation input device @ 0x7fbec0c10940] [2] Capture screen 1
[AVFoundation input device @ 0x7fbec0c10940] AVFoundation audio devices:
[AVFoundation input device @ 0x7fbec0c10940] [0] Built-in Microphone
Gives a list and number of all input devices supported by the current device. I have two monitors locally, so 1 and 2 are both my screens. You can choose one to record the screen.

View the current H.264 codec:

Enter:

ffmpeg -codecs | grep 264
Output:

DEV.LS h264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (decoders: h264 h264_vda) (encoders: libx264 libx264rgb)
View the current VP8 codec:

Enter:

ffmpeg -codecs | grep vp8
Output:

DEV.L. vp8 On2 VP8 (decoders: vp8 libvpx) (encoders: libvpx)
You can choose to use vp8 or h264 as the encoder

ffmpeg -r 30 -f avfoundation -i 1 -vcodec vp8 -quality realtime screen2.webm
# -quality realtime is used to optimize the encoder. If it is not added to my Air, the frame rate can only reach 2
or

ffmpeg -r 30 -f avfoundation -i 1 -vcodec h264 screen.mp4
Then use ffplay to play

ffplay screen.mp4
or

ffplay screen2.webp
Convert FFmpeg video to gif
There is a particularly useful requirement. I found a particularly interesting video on the Internet and wanted to convert it into a dynamic emoticon. As an IT practitioner, my first thought was not to download a transcoder or to find an online Convert the website, directly use the tool FFmpeg at hand, and the transcoding is completed in an instant:

ffmpeg -ss 10 -t 10 -i tutu.mp4 -s 80x60 tutu.gif
## -ss means to start transcoding from 10s, -t means to convert 10s video -s
FFmpeg record screen and live broadcast
You can continue to expand Example 1, live broadcast the content of the current screen, to tell you how to build a test live broadcast service with a few lines of commands:

Step 1: First install docker:
Visit Docker Download and download and install by operating system.

Step 2: Download the nginx-rtmp image:

docker pull chakkritte / docker-nginx-rtmp
Step 3: Create nginx html path and start docker-nginx-rtmp

mkdir ~ / rtmp

docker run -d -p 80:80 -p 1935: 1935 -v ~ / rtmp: / usr / local / nginx / html chakkritte / docker-nginx-rtmp
Step 4: Push the screen recording to nignx-rtmp

ffmpeg -y -loglevel warning -f avfoundation -i 2 -r 30 -s 480x320 -threads 2 -vcodec libx264 -f flv rtmp: //127.0.0.1/live/test
Step 5: Play with ffplay

ffplay rtmp: //127.0.0.1/live/test
To sum up, FFmpeg is an excellent tool that can be used to complete a lot of daily work and experiments, but there is still a lot of work to be done from providing truly available streaming media services and live broadcast services. Seven Niu Live Cloud Service.

Package
After introducing video coding, let's introduce some packages. Following the previous analogy, packaging can be understood as the type of truck used to transport, that is, the container of the media.

The so-called container is a standard that mixes and encapsulates multimedia content (video, audio, subtitles, chapter information, etc.) generated by the encoder. Containers make it easy to play different multimedia content simultaneously, and another function of containers is to provide an index for multimedia content, which means that if there is no container, you can only see the end of a movie from the beginning, and you ca n’t drag the progress (Of course, some player sessions in this case take a long time to temporarily create an index), and if you do n’t manually load the audio separately, there will be no sound. The following introduces several common packaging formats and advantages and disadvantages:

At present, we mainly use FLV and MPEG2-TS formats in streaming media transmission, especially live broadcasting, which are used for RTMP / HTTP-FLV and HLS protocols, respectively.

In the next issue, we will systematically explain the streaming and transmission of live video, so stay tuned ~

Author: Bo He @ seven cattle cloud preacher, more cloud technology industry insight visit seven cattle cloud blog.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More