WebSocket + MSE--HTML5 live video technology analysis, websockethtml5
Author | Liu Bo (youpai cloud Multimedia Development Engineer)
Currently, a series of HTML5 live broadcast technologies have developed rapidly to meet the needs of mobile Web live broadcasting.
Common live video technologies for HTML5 include HLS, WebSocket, and WebRTC. Today, I will introduce the technical points related to WebSocket and MSE, and demonstrate the specific usage through an instance.
Outline
- WebSocket Protocol Introduction
- WebSocket Client/Server API Introduction
- MSE Introduction
- FMP4 Introduction
- Demo display
WebSocket
Generally, Web applications are built around the HTTP Request/response model. All HTTP Communication is controlled by the client. The client sends a request to the server. After receiving and processing the request, the server returns the result to the client. The client displays the data. Because this mode cannot meet real-time application requirements, the persistent connection technology such as SSE and Comet is introduced.
WebSocket is a communication protocol based on TCP connections. It can perform full-duplex communication over a single TCP connection. In 2011, WebSocket was set as the standard RFC 6455 by IETF and supplemented by RFC 7936. WebSocket API was set as the standard by W3C.
WebSocket is an independent protocol created on TCP. Those concepts in HTTP are not associated with WebSocket. The only difference is that when 101 status codes of HTTP are used for protocol switching, the TCP port used is 80, which can bypass the limits of most firewalls.
WebSocket handshake
To facilitate the deployment of new protocols, HTTP/1.1 introduces the Upgrade mechanism, so that the client and the server can be upgraded to other protocols by using the existing HTTP syntax. This mechanism is described in detail in section 6.7 Upgrade of RFC7230.
To initiate an HTTP/1.1 protocol upgrade, the client must specify these two fields in the request header.
> Connection: UpgradeUpgrade: protocol-name[/protocol-version]
If the server agrees to upgrade, You need to respond to the supervisor in this way.
> HTTP/1.1 101 Switching ProtocolsConnection: upgradeUpgrade: protocol-name[/protocol-version][... data defined by new protocol ...]
As you can see, the HTTP Upgrade response status code is 101, and the response body can use the data format defined by the new protocol.
WebSocket handshake uses this HTTP Upgrade mechanism. Once the handshake is complete, subsequent data transmission is completed directly on TCP.
WebSocket JavaScript API
Currently, mainstream browsers provide WebSocket APIs that can send messages (text or binary) to servers and receive event-driven response data.
Step1. check whether the browser supports WebSocket
> If (window. WebSocket) {// WebSocket code}
Step 2. Establish a connection
> var ws = new WebSocket('ws://localhost:8327');
Step3. register the callback function and send and receive data
Register the onopen, onclose, onerror, and onmessage callback functions of the WebSocket object respectively.
Ws. send () is used to send data. In this example, not only strings, but also Blob or ArrayBuffer data can be sent.
If binary data is received, set the format of the connection object to blob or arraybuffer.
ws.binaryType = 'arraybuffer';
WebSocket Golang API
For the WebSocket library on the server side, we recommend that you use Google's own http://golang.org/x/net/websocket, which can be easily used with net/http. You can also convert the WebSocket handler function to http. Handler through websocket. Handler, so that it can be used with the net/http library.
Then, you can use websocket. Message. Receive to Receive data and use websocket. Message. Send to Send data.
For specific code, see the Demo section below.
MSE
Before introducing MSE, let's take a look at the limitations of HTML5 <audio> and <video>.
HTML5 <audio> and <video> labels
- Stream not supported
- DRM and encryption are not supported
- Difficult to customize control and maintain cross-browser consistency
- Codec and encapsulation in different browsers support different
MSE is used to solve HTML5 stream problems.
Media Source Extensions (MSE) is a new Web API supported by mainstream browsers such as Chrome, Safari, and Edge. MSE is a W3C standard that allows JavaScript to dynamically construct <video> and <audio> media streams. It defines an object that allows JavaScript to transmit a media stream clip to an HTMLMediaElement.
By using MSE, You can dynamically modify media streams without any plug-ins. This allows the front-end JavaScript to do more things-convert, encapsulate, process, and even transcode in JavaScript.
Although MSE cannot directly transmit streams to media tags, MSE provides a core technology to build a cross-browser player, allowing browsers to push audio and video to media tags through JavaScript APIs.
Browser Support
Use caniuse to check whether the browser supports the situation.
MediaSource. isTypeSupported () further checks whether the codec MIME type is supported.
FMP4
Commonly used video encapsulation formats include WebM and fMP4.
WebM and WebP are two sister projects sponsored by Google. Because WebM is based on the Matroska container format, it is naturally streaming and is suitable for streaming media.
The following describes the fMP4 format.
We all know that MP4 is composed of a series of Boxes. A common MP4 is a nested structure. The client must load an MP4 file from the beginning to play the entire video.
FMP4 is composed of a series of fragments. If the server supports byte-range requests, these fragments can be independently requested to play on the client without loading the entire file.
To better illustrate this point, I will introduce several frequently used tools for analyzing MP4 files.
Gpac, formerly known as mp4box, is a media development framework. It has a large number of media analysis tools under its source code and can use testapps;
- Mp4box. js is the Javascript version of mp4box;
- Bento4, an analysis tool dedicated for MP4;
- Mp4parser is an online MP4 File analysis tool.
Fragment mp4 VS non-fragment mp4
The following figure shows the details of a fragment mp4 file after analysis by mp4parser (Online MPEG4 Parser ).
The following is a non-fragment mp4 file that is parsed using mp4parser.
We can see that the top-level box types of non-fragment mp4 are very few, while fragment mp4 is composed of a section of moof + mdat, which already contains enough metadata information and data, you can start playing the video directly at this position. That is to say, fMP4 is a stream Encapsulation Format, which is more suitable for stream transmission in the network without relying on the metadata of the file header.
Apple announced at the WWDC 2016 Conference that it will support fMP4 in HLS of iOS 10, tvOS, and macO S. This shows that fMP4 has a bright future.
It is worth mentioning that fMP4, CMAF, and ISOBMFF are similar.
MSE JavaScript API
At a high level, MSE provides
- A set of JavaScript APIs to build media streams
- A splicing and cache Model
- Identify byte stream types
- WebM
- ISO Base Media File Format
- MPEG-2 Transport Streams
Internal Structure of MSE
The Design of MSE does not depend on the task-specific codec and container format, but different browsers support different levels.
You can pass a MIME-type string to the static method:
> MediaSource. isTypeSupported to check. For example, ▽ MediaSource. isTypeSupported ('audio/mp3'); // falseMediaSource. isTypeSupported ('video/mp4'); // trueMediaSource. isTypeSupported ('video/mp4; codecs = "avc1.4D4028, mp4a. 40.2 "'); // true
To obtain the Codec MIME string, you can use the online [mp4info] (http://nickdesaulniers.github.io/mp4info), or use the command line mp4info test.mp4 | grep Codecs, you can get the following result:
> mp4info fmp4.mp4| grep Codec Codecs String: mp4a.40.2 Codecs String: avc1.42E01E
Currently, MP4 containers of H.264 + AAC are supported in all browsers.
Common MP4 files cannot be used with MSE. You need to fragment MP4 files.
Check whether a MP4 file has been fragment.
> mp4dump test.mp4 | grep "\[m"
For non-fragment, the following information is displayed:
> mp4dump nfmp4.mp4 | grep "\[m"[mdat] size=8+50873[moov] size=8+7804 [mvhd] size=12+96 [mdia] size=8+3335 [mdhd] size=12+20 [minf] size=8+3250 [mdia] size=8+3975 [mdhd] size=12+20 [minf] size=8+3890 [mp4a] size=8+82 [meta] size=12+78
If fragment already exists, the following similar information is displayed:
> mp4dump fmp4.mp4 | grep "\[m" | head -n 30[moov] size=8+1871 [mvhd] size=12+96 [mdia] size=8+312 [mdhd] size=12+20 [minf] size=8+219 [mp4a] size=8+67 [mdia] size=8+371 [mdhd] size=12+20 [minf] size=8+278 [mdia] size=8+248 [mdhd] size=12+20 [minf] size=8+156 [mdia] size=8+248 [mdhd] size=12+20 [minf] size=8+156 [mvex] size=8+144 [mehd] size=12+4[moof] size=8+600 [mfhd] size=12+4[mdat] size=8+138679[moof] size=8+536 [mfhd] size=12+4[mdat] size=8+24490[moof] size=8+592 [mfhd] size=12+4[mdat] size=8+14444[moof] size=8+312 [mfhd] size=12+4[mdat] size=8+1840[moof] size=8+600
Convert a non-fragment MP4 to fragment MP4.
You can use-movflags of FFmpeg for conversion.
For a non-MP4 file, refer
> ffmpeg -i trailer_1080p.mov -c:v copy -c:a copy -movflags frag_keyframe+empty_moov bunny_fragmented.mp4
If the original file is an MP4 file
> ffmpeg -i non_fragmented.mp4 -movflags frag_keyframe+empty_moov fragmented.mp4
Or use mp4fragment Encoding
> mp4fragment input.mp4 output.mp4
DEMO TIME
In the final stage, two demos are displayed: MSE Vod demo and MSE Live Demo.
MSE Vod Demo
Demonstrate how to use MSE and WebSocket to implement an on-demand video service
The backend reads an fMP4 file and sends it to MSE through WebSocket for playback.
Demonstrate how to use MSE and WebSocket to implement a live video service
Back-end proxy a HTTP-FLV live stream, sent to MSE through WebSocket, play
Front-end MSE has done a lot of work, including converting flv into fMP4 in real time. Here we reference the implementation of videojs-flow.
Refs
WebSocket
- Rfc6455
- HTTP Upgrade
- WebSocket API
- MDN WebSocket
- Videojs-flow
MSE
- W3C
- MDN MSE
- HTML5 Codec MIME
Upyun is an all-in-one solution based on upyun's content delivery network that provides live streaming applications with ultra-low latency, high bit rate, and high concurrency. It includes real-time transcoding, real-time recording, distribution acceleration, watermarks, second-level ban, and delayed live broadcast. The live video origin site supports the origin site or upyun origin site. To support users playing videos on different terminals, it supports RTMP, HLS, and HTTP-flv output.
Learn more: https://www.upyun.com/products/live
Recommended reading:
Without video connection or live broadcasting, what is the live video interaction tool?
Technical advice | six key technologies of mobile live broadcasting
Apsaravideo live cloud SDK, which provides face filter, filter, noise reduction, voice gain, and other functions
Apsaravideo live Cloud Processing: transcoding, recording, video watermarks, and videos
Apsaravideo live features basics: streaming and pulling, multi-protocol output, multi-access mode, custom back-to-Source Port
Apsaravideo live advanced functions: Anti-leeching, second-level ban, automatic porn detection, API Interfaces