"Video Broadcast Technology details" series: Collection,
There are a lot of technical articles on live broadcasting, and there are not many systems. We will use seven articles to give a more systematic introduction to the key technologies of live video in all aspects of the current hot season, and help live video entrepreneurs to gain a more comprehensive and in-depth understanding of live video technology, better technology selection.
This series of articles outlines as follows:
(1) Collection
(2) handling
(3) coding and Encapsulation
(4) streaming and transmission
(5) latency Optimization
(6) Principles of Modern players
(7) SDK Performance Test Model
This article focuses on collection.
Collection is the first step in the video streaming process. It obtains the original video data from the system's collection device and outputs it to the next step. Video Acquisition involves two aspects of data collection: Audio Acquisition and image acquisition, which correspond to two completely different input sources and data formats.
Collected content
1. Audio collection
Audio data can be combined with images to form video data and can be collected and played in pure audio mode, the latter plays an important role in many mature application scenarios, such as online radio stations and voice radio stations. The audio acquisition process is mainly through the equipment to collect the analog signal in the environment integrated with the original data of PCM encoding, and then the encoding is compressed into MP3 and other formats of data distribution. Common audio compression formats include MP3, AAC, HE-AAC, Opus, FLAC, Vorbis (Ogg), Speex and AMR.
Audio collection and encoding mainly face the following challenges: latency sensitivity, lagging sensitivity, noise elimination (Denoise), Echo elimination (AEC), voice detection (VAD), and various audio mixing algorithms.
In the audio collection phase, the following technical parameters are referenced:
Sampling Rate (samplerate): Sampling refers to the process of digitalized analog signals. The higher the sampling frequency, the larger the data volume used to record this audio signal, and the higher the audio quality.
Bit Width: each sampling point requires a value to indicate the size. The data type of this value can be 4-bit, 8-bit, 16-bit, 32-bit, and so on. The more digits, the more detailed the representation, the better the sound quality, and the larger the data volume. We usually use 8-bit or 16-bit BITs during audio sampling.
Number of channels (channels): Because Audio Acquisition and playback can be superimposed, you can collect sounds from multiple audio sources and output them to different speakers at the same time, therefore, the number of Audio Channels generally indicates the number of audio sources during sound recording or the number of speakers during playback. The number of channels 1 and 2 are called single-channel and dual-channel, which are common channel parameters.
Audio frame: the audio is very different from the video. Each frame of the video is an image, and the audio data is streamed from the xuanbo, there is no clear concept of one frame. in actual application, to facilitate audio algorithm processing/transmission, it is generally set to Ms ~ The data volume in 60 ms is a frame of audio. This time is called "Sampling Time". There is no special standard for its length. It is determined based on the requirements of the decoder and specific applications.
Based on the above definition, we can calculate the size of an audio frame. Assume that an audio signal has a sampling rate of 8 kHz, a dual-channel, a bit width of 16 bits, and a frame of 20 ms, then the audio data size of a frame is:
size = 8000 x 2 x 16bit x 0.02s = 5120 bit = 640 byte
2. Image Acquisition image results are combined into a group of continuous playback animations, which constitute the content that can be viewed by the naked eye in the video. In the image collection process, cameras and other devices take raw YUV-encoded data and then compress the data into H.264 and other formats for distribution. Common video encapsulation formats include MP4, 3GP, AVI, MKV, WMV, MPG, VOB, FLV, SWF, MOV, RMVB, and WebM.
Because the image has the strongest intuitive feeling and a relatively large volume, it constitutes a major part of the video content. The main challenges facing image acquisition and encoding are: Poor device compatibility, latency sensitivity, lagging sensitivity, and various image processing operations such as beautify and watermarks.
In the image acquisition phase, the following technical parameters are referenced:
Image Transmission Format: the Common Intermediate Format is a commonly used image transmission Format in video conferences.
Image Format: raw data information is usually stored in YUV format, which includes gray-scale values of black and white images represented by 8 bits, and color images composed of RGB colors.
Transmission Channel: under normal circumstances, only one channel is required for video shooting. As VR and AR technologies become increasingly mature, different angles may be required to take a complete 360 ° video, and then merged after multi-channel transmission.
Resolution: As the screen size of devices increases, the original video resolution plays an increasingly important role in the video collection process, the definition of all video resolutions used in subsequent processing is based on the original video resolution. The maximum lattice supported by the video capture card reflects the resolution performance.
Sampling frequency: the sampling frequency reflects the speed and capability of the acquisition card to process images. When collecting height images, pay attention to whether the sampling frequency of the acquisition card meets the requirements. The higher the sampling rate, the higher the image quality, and the larger the amount of data stored in the image information.
The above constitutes a major technical parameter for video collection and a common format for audio and image encoding in videos. Although it is more helpful for live video App developers to understand these details, they may seldom pay attention to the control of technical parameters in the collection process during actual development, instead, the collected data is directly transmitted to the next "processing" and "encoding" link in the SDK.
Collection Source
1. Camera acquisition
Video content collection is currently the most common collection method for social media live broadcasts. For example, the caster uses the front and back cameras of mobile phones to capture video content. In live scenes, professional photography and video equipment are also used for collection. Professional cameras are also used for monitoring and collection in Security Monitoring scenarios.
Currently, the sdks provided by qiniu support the acquisition of the above two types of cameras. for mobile phones, iOS and Android support the acquisition of front-and-back cameras, but iOS only supports a small number of device types and system versions, therefore, the collection module has good compatibility, while Android requires many compatible hardware devices and systems. Currently, it supports camera acquisition for Android 4.0.3 and later. For professional cameras or cameras, qiniu cloud provides a C-language acquisition module compatible with embedded systems. For details, refer to: GitHub-pili-engineering/ipcam_sdk.
2. screen recording
Screen recording is very common in live game scenarios. Currently, we have implemented the screen recording function in the Android SDK. However, iOS cannot directly perform operations because the system does not have the permission to enable screen recording. However, for Versions later than iOS 9, there is a clever solution, you can simulate an AirPlay image to connect to (the current App) itself, so that you can capture any operation on the screen on the software to achieve the recording screen effect.
During live education or conference presentations, we often see scenarios where you need to record PPT on the desktop of your computer, currently, the most convenient solution on the market is to use the Open source desktop streaming tool OBS for screen recording and streaming: Open Broadcaster Software
3. streaming from video files
In addition to collecting videos from hardware devices for streaming, we may also need to transmit a video or audio file to the audience in real time in the form of live streams, such as online radio stations or TV programs, their input may be directly from the recorded and edited video content.
Open Design
The above describes the knowledge of video collection from both the collection content and the collection source. However, for the collection sources, there are far more than the three types of sources visible on the market, even cameras have many categories. For a complete live video cloud service covering streaming, transmission, and playback, supporting as many collection sources and playback terminals as possible is an unavoidable and difficult task.
To support access to all the collection sources on the market, we adopt an open design in the SDK. As long as the collection source implementation follows the corresponding interface, we can support any collection source.
In the figure, we divide the collected content into images and audios. The image collection source includes the camera, screen recording, or local video files, and even other collection sources that need to be redefined and implemented. The audio collection source includes a microphone, system sound, or local audio file. Of course, you can also define other input sources for it.
The biggest benefit of this design is that a lightweight design can support a wide range of collection sources, and the specific implementation of the collection source can also be handed over to users.
In the next serialization, we will introduce in detail the processing process in the live video and answer how to meet the various needs of the market broadcaster, such as face beautification, watermarks, and livemix interaction.
Coming soon!
Author: Tao zeyu @ qiniu livestream cloud engineer. For more technical insights in the cloud industry, visit qiniu cloud blog.