"Video Broadcast Technology details" Series 6: Principles of Modern players,
There are a lot of technical articles on live broadcasting, and there are not many systems. We will use seven articles to give a more systematic introduction to the key technologies of live video in all aspects of the current hot season, and help live video entrepreneurs to gain a more comprehensive and in-depth understanding of live video technology, better technology selection.
This series of articles outlines as follows:
(1) Collection
(2) handling
(3) coding and Encapsulation
(4) streaming and transmission
(5) latency Optimization
(6) Principles of Modern players
(7) SDK Performance Test Model
In the last article on latency optimization, we shared a lot of simple and practical tuning skills. This article is part 6 of the video live Technology series: Principles of Modern players.
In recent years, the increasing demand for multi-platform adaptation has led to the rise of Stream Media adaptive bit rate playback, which forces Web and mobile developers to rethink the logic of video technology. First, the Giants have released HLS, HDS, and Smooth Streaming protocols to hide all the details in their dedicated sdks. Developers cannot freely modify the multimedia engine logic in the player: you cannot modify the adaptive bit rate rules and cache size, or even the length of your slice. These players may be easy to use, but you don't have much options to customize them. Even poor functions can only be tolerated.
However, with the increase in different application scenarios, the demand for customizable functions is getting stronger and stronger. There are only differences between live broadcast and on-demand video, such as buffer management, ABR policies, and cache policies. These demands gave birth to a series of underlying multimedia operation APIs: Netstream on Flash, Media Source Extensions on HTML5, and Media Codec on Android, at the same time, an HTTP-based standard stream format MPEG-DASH emerged in the industry. These more advanced capabilities provide developers with better flexibility, allowing them to build players and multimedia engines that suit their business needs.
Today we will share with you how to build a modern player and what key components are needed to build such a player. Generally, a typical player can be divided into three parts: UI, multimedia engine, and decoder, as shown in 1:
Figure 1. Modern player Architecture
User Interface (UI ):This is the top layer of the player. It defines the watching experience of end users through three different functional features: Skin (player design) and UI (all customizable features such as playlist and social sharing) and the business logic section (specific business logic features such as advertising, device compatibility logic, and authentication management ).
Multimedia engine:Here we will process all the logic related to playback control, such as parsing the description file, pulling video clips, and setting and switching the adaptive bit rate rules. We will explain this part in detail below. Because these engines are usually tightly bound to the platform, you may need to use multiple engines to cover all the platforms.
Decoder and DRM Manager:The underlying layer of the player is the decoder and DRM manager, which directly calls the APIS exposed by the operating system. The main function of the decoder is to decode and render the video content, while the DRM manager controls whether to play the video through the decryption process.
Next we will use examples to introduce the different roles played by each layer.
I. User Interface (UI)
The UI Layer is the top layer of the player. It controls what you can see and interact with. You can also use your own brand to customize it, provide your users with a unique user experience. This layer is closest to the front-end development section. In the UI, we also include business logic components, which constitute the uniqueness of your playback experience, although end users cannot directly interact with this part of the function.
The UI consists of three main components:
1. Skin
Skin is a general term for the player's visual components: progress control bar, buttons, and animation icons, as shown in figure 2. Like most design components, these components are implemented using CSS, designers or developers can easily integrate them (even if you are using JW Player and Bitdash ).
Figure 2. Player skin
2. UI Logic
The UI logic section defines all visible Interactions during playback and user interaction: playlist, thumbnails, selection of playback channels, and social media sharing. Based on the expected playback experience, you can also add many other features to this part, many of which exist in the form of plug-ins. You may find some inspiration: plugins · videojs/video. the logic section of js Wiki-GitHub contains many functions. We will not detail them in detail here. We will use the UI of the Eurosport player as an example to intuitively understand these functions.
Figure 3. Eurosport player user interface
As shown in figure 3, apart from the traditional UI elements, there is also a very interesting feature. When users watch DVR streaming media, the live video is displayed in the form of a small window, the audience can return to the live broadcast at any time through this small window. Because the layout, UI, and multimedia engine are completely independent, these features can be implemented using dash. js in html5. The best way to implement the UI is to add various features to the core UI module in the form of plug-ins/modules.
3. Business Logic
In addition to the features of the above two "visible", there is also an invisible part, which forms the uniqueness of your business: authentication and payment, channel and playback list acquisition, and advertisements. It also contains some technical related things, such as A/B testing module and device-related configurations, these configurations are used to select multiple media engines between different types of devices.
To uncover the hidden complexity of the underlying layer, we will explain these modules in more detail here:
Device Detection and configuration logic: This is one of the most important features because it separates playback and rendering. For example, the player may automatically select an HTML5 MSE-based multimedia engine hls based on different versions of your browser. js, or select a flash-based playback engine FlasHls for you to play the HLS video stream. The biggest feature of this part is that no matter what underlying engine you use, you can use the same JavaScript or CSS on the upper layer to customize your UI or business logic.
The ability to detect user devices allows you to configure the user experience as needed: If you play a video on a mobile device rather than a 4 K screen device, you may need to start at a lower bit rate.
A/B testing logic:A/B testing aims to be able to grayscale some users in the production process. For example, you may provide some Chrome users with a new button or a new multimedia engine, and ensure that all of its work works properly on schedule.
AD (optional ):Processing advertisements on the client is one of the most complex business logic. As shown in the flowchart of the videojs-contrib-ads plug-in module, the ad insertion process contains multiple steps. For HTTP video streams, you may use some existing formats such as VAST, VPAID, or Google IMA, they can help you pull video ads (usually outdated non-adaptive formats) from the ad server, play them in the early, middle, and later stages of the video, and cannot be skipped.
Summary:
To meet your customization needs, you may choose to use JW Player that includes all the classic features for playback (it also allows you to customize some features ), you can also customize your own features based on open-source players such as Videojs. Even to unify the user experience between the browser and the Native player, you can also consider using React Native for UI or skin development and using Haxe for business logic development, these excellent libraries can all share the same code library among multiple different types of devices.
Figure 4. business logic Flowchart
Ii. Multimedia Engine
In recent years, the multimedia engine has become a new and independent component in the player architecture. In the MP4 era, the platform handles all playing-related logic, and only opens some multimedia processing-related features (only playing, pausing, dragging, full screen mode, and other functions) to developers.
However, the new HTTP-based streaming media format requires a new component to process and control new complexity: parse declarative files, download video clips, adaptive bit rate monitoring, decision-making, and more. At first, the complexity of the ABR was handled by the platform or equipment provider. However, with the increasing demand for caster control and custom players, some more underlying APIs (such as the Web Media Source Extensons, netstream on Flash and Media Codec on Android platform), and quickly attracted many powerful and robust multimedia engines based on these underlying APIs.
Figure 5. Data Flow chart of the multimedia processing engine Shakaplayer provided by Google
Next we will explain in detail the details of each component in the modern multimedia processing engine:
1. Declaration file interpreter and parser
In HTTP-based video streams, everything starts with a description file. The Declaration file contains the meta information that media servers need to understand: How many different types of video quality, languages, and letters are there, and what are they. The parser retrieves the description information from the XML file (which is a special m3u8 file for HLS) and then obtains the correct video information from the information. Of course, there are many types of media servers, not all of which are correctly implemented. Therefore, the parser may need to handle some additional implementation errors.
Once the video information is extracted, the parser parses the data from it to build a streaming visual image and knows how to obtain different video clips. In some multimedia engines, these visual images first appear in the form of an abstract multimedia image, and then draw different characteristics of different HTTP video stream formats on the screen.
In the live streaming scenario, the parser must periodically reobtain the Declaration file to obtain the latest video clip information.
2. Download the package (download declaration files, multimedia clips, and keys)
The download loader is a module that encapsulates native APIs for processing HTTP requests. It is used not only to download multimedia files, but also to download declarative files and DRM keys when necessary. The downloader plays an important role in handling network errors and retries, and can collect data on the current available bandwidth.
Note: You may use HTTP or other protocols to download multimedia files, such as WebRTC in point-to-point real-time communication.
3. Stream playback Engine
The stream playback engine is a central module that interacts with the decoder API. It imports different multimedia clips into the encoder, processing the differences between multi-Bit Rate switching and playback at the same time (such as declaring the differences between files and video slices, as well as automatic frame skipping upon a crash ).
4. Resource quality parameter estimator (bandwidth, CPU and Frame Rate)
The estimator obtains data from various dimensions (block size, download time of each segment, and the number of hop frames) and aggregates the data to estimate the available bandwidth and CPU computing capacity of users. This output is used for determining the Adaptive Bitrate (ABR) Switching Controller.
5. API Switching Controller
The ABR switch may be the most important part of the multimedia engine-usually the most overlooked part. The Controller reads the data output by the estimator (bandwidth and hop frames) and uses a custom algorithm to determine whether the stream playback engine needs to change the video or audio quality. There are a lot of research work in this field. The biggest difficulty is to find a balance between the risk of re-buffering and the switching frequency (too frequent switching may lead to poor user experience.
6. DRM Manager (optional components)
Today, all paid video services are based on DRM management, while DRM relies heavily on platforms or devices. We will see it later on the introduction of the player. The DRM manager in the multimedia engine is the packaging of the content decryption API in the lower-layer decoder. As long as possible, it will try to eliminate the differences in the browser or operating system implementation details through abstraction. This component is usually closely connected with the stream processing engine because it often interacts with the decoder layer.
7. format conversion multiplexing (optional components)
Later, we will see that each platform has its limitations in encapsulation and encoding (Flash reads the H encapsulated by the FLV container. 264/AAC file. MSE reads the H. 264/AAC file ). This causes some video clips to undergo format conversion before decoding. For example, with the MPEG2-TS to ISOBMFF format conversion multiplexing, hls. js can use MSE format content to play the HLS video stream. The format conversion multiplexing at the multimedia engine layer has been questioned. However, with the improvement of the Interpretation Power of modern JavaScript or Flash, the performance loss caused by it is almost negligible, this will not have much impact on the user experience.
Summary
The multimedia engine also has many different components and features, from subtitles to advertisement insertion. Next, we will write an article separately to compare the differences between different engines and provide some substantial guidance for engine selection through some testing and market data. It is worth noting that it is important to build a Player compatible with various platforms and provide multiple freely replaceable multimedia engines, because the underlying decoder is related to the user platform, next, we will focus on this aspect.
Iii. decoder and DRM Manager
For decoding performance (decoder) and security (DRM), the decoder and DRM manager are closely bound to the operating system platform.
Figure 6. decoder, Renderer, and DRM Workflow
1. Decoder
The decoder processes the underlying playing logic. It unpacks videos of different encapsulation formats, decodes their content, and then submits the decoded video frames to the operating system for rendering, so that end users can see the video frames.
As video compression algorithms become more and more complex, the decoding process requires intensive computing. To ensure decoding performance and smooth playback experience, the decoding process must be highly dependent on the operating system and hardware. Most of the current decoding relies on GPU-accelerated decoding help (this is one of the reasons why the free and more powerful VP9 decoder did not win the H.264 market position ). Without GPU acceleration, decoding a 1080 P video will occupy about 70% of the CPU computing workload, and the frame drop rate may be very serious.
On the basis of decoding and rendering video frames, the manager also provides a native buffer. The multimedia engine can directly interact with the buffer, learn about its size in real time and refresh it when necessary.
As mentioned above, each platform has its own rendering engine and corresponding APIs: The Flash Platform has Netstream, And the Android platform has the Media Codec API, standard Media Sources Extensions is available on the Web. MSE is becoming more and more eye-catching and may become a de facto standard on other platforms following browsers in the future.
2. DRM Manager
Figure 7. DRM Manager
Today, DRM is necessary when the transmission studio produces paid content. This content must be prevented from being stolen, so the DRM code and work process are blocked by end users and developers. The decrypted content will not leave the decoding layer, so it will not be blocked.
To standardize DRM and provide certain interoperability for the implementation of various platforms, several Web giants have created Common Encryption standards (CENC) and general multimedia Encryption Extensions Encrypted Media Extensions, this allows you to build a set of common APIs for multiple DRM providers (for example, EME can be used for Playready on Edge platforms and Widewine on Chrome platforms, these APIs can read video content encryption keys from the DRM authorization module for decryption.
CENC declares a set of standard encryption and key ing methods that can be used to decrypt the same content on multiple DRM systems. You only need to provide the same key.
In the browser, based on the video Content metadata, EME can identify which DRM system is used for encryption and call the corresponding Decryption Module (CDM) decrypts encrypted content in CENC. The decryption module CDM processes content authorization, obtains keys, and decrypts video content.
CENC does not specify details such as authorization issuance, authorization format, authorized storage, and ing between rules and permissions. All these details are handled by the DRM provider.
Iv. Summary
Today, we have a deep dive into the different content of the video player at three levels. The most outstanding structure of this modern player is that its interaction part is completely separated from the logic part of the multimedia engine, this allows the caster to seamlessly and flexibly customize the end user experience. At the same time, different multimedia engines can be used on a variety of terminal devices to ensure smooth playback of video content in different formats.
On the Web platform, the benefits of multimedia engines such as dash are. js, Shaka Player, and hls. with the help of JavaScript, which tends to mature libraries, MSE and EME are becoming new playing standards, and more influential manufacturers are using these playing engines. In recent years, our attention has begun to expand to set-top boxes and Internet TVs. We have seen more and more such new devices use MSE as their underlying multimedia processing engine. We will also continue to invest more in supporting these standards.
This article is translated by qiniu cloud evangelist he Lishi from How Modern Video Players Work.Qiniu cloud official blogView.