A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
"Transcoding" for each family may not be the same, but the logic is the same, I am sharing today is another Pat cloud on the narrowband high-definition transcoding, close to the "transcoding", more basic, theoretical knowledge will be relatively more. Share content mainly divided into two pieces, the first is to introduce what is narrowband high-definition transcoding, the second is to introduce and pat the cloud narrowband high-definition transcoding implementation mode.
Let's start with the cloud. The definition of "Narrowband HD": In the case that the encoding standard is unchanged and the package format of the file is unchanged, the code stream of 30% is saved by intelligent analysis of scene complexity and intelligent bitrate distribution.First, what is narrowband HD transcoding
transcoding architecture Diagram
is a traditional transcoding architecture diagram, whether it is a local file or a streaming media file, first by unpacking, so that the audio and video separation, and then decode separately, the middle option is optional. This link can be used for audio and video processing, such as echo noise, video painting in picture, video image processing. Do a good job or skip this step, and then to transcoding, this time to pass the code, we can use a variety of different video coding standards, audio can also be used in different audio coding standards to do a coding compression, and finally packaged up into an output file, this file can be local, can also be a network stream.
Based on the above transcoding architecture, if we want to pass the transcoding, so that the transcoding file smaller, can be done in three directions to reduce the size of the operation (because the audio on the size of the contribution of small, discussed here to video-based, audio temporarily not considered). The first can adopt new video coding standards, such as now in the study of h.266, can reduce the code rate of about 30% to 50, followed by the file packaging format, such as FLV and MP4, affected by their own packaging format, although the encapsulated audio stream or video stream source is the same, but through different encapsulation Format, the size is not the same; the 3rd is narrowband HD transcoding.
1. Video Coding Standard
Video coding standards, the current mainstream is the h.264,2003 of the standard has been completed, h.265 was finalized in 2013, but because the patent costs are too high, so the application is mainly in H. The standard finalization time for h.266 is determined by the end of 2020, when a new coding standard will be introduced. In addition to Google as the representative of AOM, from 2008 to 2013 VP8, VP9, at that time is Google-based push, because in the chip this piece of promotion effect is not good, so also just in Google Home application. AV1 at the beginning of this year set a standard, we have great hopes that it can compete with h.265. In addition, there are standard AVx in China, which is currently only used domestically.
We consider the main reference standards for H. 30%, h.265 and AV1, of course, the next generation for the previous generation, known as the code-stream savings of between 50%, which is actually very impressive data.
Evolution of coding standards
The above comparison shows that at the same bit rate, the AV1 is better than the quality of both H. h.265 and the same quality, and its bitrate is more economical.
2. File Package Format selection
File Encapsulation Format
Above is a YUV file size, including the resolution and the number of frames statistics, YUV file original 834 m, below the bare stream encoding 4 m, and then this size encapsulated into MP4 format file, MP4 format file mainly contains Moov head and MDAT (audio and video data). The Moov header contains information about each frame of the audio and video, including size, timestamp, and so on, it only adds 7K, and if the same stream is packaged in FLV format, it adds a K, because there are fewer frames, and if the number of frames is more, it is quite impressive. FLV files Each frame has a header, including audio and video, so when its files add a size, as the number of frames increasing, the size will be larger; TS is not to mention that it is a 188-byte package that encapsulates more information, including some media information.
In general, the MP4 format is most often used for scenarios where real-time is not strong.
3, narrow-band HD
Narrowband HD is based on the above graph: there are blue and red two video sequences, blue for the slow motion of the video, such as I now speech, but the individual is moving, the background is static, relatively speaking the content of the movement is relatively slow, red means that the movement is more intense, such as "Transformers" film.
The above two video sequence of the curve can be explained as follows: PSNR represents the quality of video, when in the time, the blue required code rate with red required code rate, one is less than 1M, one is more than 2M, the degree of exercise is not the same situation, to achieve the same quality, the required rate is not the same At the same rate, such as 2M, the motion can reach up to a maximum of up to a few db, and the static can reach a db, so that the same rate, the motion and not the intensity of the video quality is not the same.
So here is a space for us to operate, we have the same coding standards and package format unchanged, based on these two conclusions, in the human eye does not perceive the quality of its distortion, reduce the video bitrate. The test results show that the saving of the code stream is very considerable, generally can reach 30%, of course, different scene complexity of the code stream saving effect is not the same.Second, Pat Cloud narrowband high-definition transcoding implementation
The following describes how the "Narrowband HD" is implemented. First input a video transcoding of The Shard, followed by the complexity of the analysis, and then the scene transcoding parameters, such as slow motion or intense, of course, there will be a bit rate control algorithm to adjust the output of the encoder, and finally get the encoded video.
1. Analysis of complexity
On the analysis of complexity, we use the BT1788 in the standard to learn about spatial perception information and time-aware information. Spatial perception information is a Sobel value for each frame image, and then analyzes how much of its texture is used as a reference standard; time-aware information is the standard deviation between frame and frame, as a change of time.
And Pat the clouds. According to the user's application scenario, there are four types of scenes: mobile phone selfie, animation, slow movement and vigorous movement. Auto is the user does not have to choose, we will automatically select the above four most appropriate method according to the analysis of complexity. Why do we need to classify scenes? Because we have customers with a single video source, such as a customer only mobile phone selfie video, he can choose video selfie. As the business becomes more and more complex, we will have finer scenarios to classify.
A, H. Codec
Optimized encoder Parameters--h.264 codec
Is the framework of the encoder, in fact, the coding framework of H. h.265 is about the same, it is about the space domain and the time domain of redundant compression. The framework process includes frame, intra-frame prediction, transformation, quantization, inverse conversion inverse quantization, entropy coding and de-block filtering, because coding standards are based on "block" coding, so there will be "block effect", so the need to filter the block to improve the subjective quality of the human eye.
B, h.265 Code codec
Optimized encoder Parameters--h.265 codec
This is the h.265 coding framework, which is the same as the mixed coding framework of H. For example, including inter-frame, intra-frame prediction, Entropy coding, deblocking in order to remove the "block effect", adding a new SAO filter to eliminate the ringing effect.
Comparison of parameters between C, H and h.265
As for the H. h.265 framework, although their processes are essentially the same, the framework has not changed, but they are "everywhere the same and different" and are optimized for each technology.
First, the size of the block is extended from 16x16 to h.265 64x64, which is an exponential block of the complexity of the ascension;
Second, with regard to prediction, the forecast direction within the h.265 frame has been raised to 35 kinds. Because the h.265 is for high-definition, including 1080P, 2K, 4K, up to 8 K, the size of the picture will be larger, so it can be divided into large chunks, for those not obvious changes in the large area of the image, you can use a larger block size, can be reduced in the prediction of the complex calculation of the block. The motion vectors are also optimized, and the luminance and chroma difference algorithms are more complex.
Third, to join the parallel computing, because the complexity of a lot of improvement, and the current computer industry parallel technology development is also very good, so in the video coding standards to add a parallel optimization, to save the coding time.
d, h.265 parameter optimization
h.265 Coding parameter Optimization
Whether it's H. Dozens of or h.265, their coding parameters are even hundreds, so how do you set them up? We need to theoretically analyze where it takes a long time and then optimize the parameters in these links.
The first is the inter-frame, because it is now based on blocks, if the block prediction is accurate, the front and back frames may be exactly the same, this time the encoded bit is 0, but if the prediction is inaccurate, they can be a large number of differences, so the code rate will be large, So the accuracy of inter-frame prediction also determines how much the code rate can save. However, if the exhaustive search or full search, the computational capacity will be very large. Therefore, the factors of inter-frame prediction include the algorithm of motion search and the range of motion search.
Currently we have an I-frame, P-frame, B-frame for the encoded frame type. B-Frame includes the reference B-frame and no reference to B-frame, I-frame theory should occupy more bits, so for video quality impact is better, P-frame is the forward reference, so is the second size, then a reference B-frame, no reference to the smallest B frame, is dispensable existence. This in the real stream time is more obvious, if you lose the I frame, the equivalent of a GOP frame can not be solved, all to throw away, there will be wrong spread.
Video post-processing, including de-block filtering, sample-point Adaptive Compensation (SAO), will better improve video image quality. The technical impact of parallel computing will be large, in addition to the bit rate control and assembly optimization.
3, Bit rate control algorithm
Bit rate control is in addition to the standard of video coding, and is used for encoder, which is based on feedback mechanism to adjust the encoder's output of the code stream. For example, when the output, bandwidth suddenly smaller, but you still according to the current output quality of the stream, it will be lost, this will be fed back to the encoder, can reduce the stream output, if the bandwidth is better, it will also be feedback to the encoder, output higher-quality code stream.
Bit rate control is divided into two categories, one is CBR, the constant rate of code, the other is called VBR, variable bitrate, but there will be some changes in the actual application. Bit rate control will vary depending on the application scenario, such as local files or for streaming, and the code rate control strategy is different. The flow must be CBR, but it will be a waste of bandwidth, quality will not be the best.
About the specific rate control mode, such as constant CQP quantization parameters to encode, this is used to do academic research, verify the quality of the encoder, its advantages are relatively fast, can better identify the situation of the encoder; the average ABR's bitrate is one of VBR; CBR, It will set a fixed bitrate size, because in the field of streaming media, the number of a bandwidth will have a fixed value, such as the maximum can not exceed 2 M, so this way will be used; N-pass, such as 2-pass rate control mode, because it needs two times encoding, can only be applied in the case of real-time not strong, He will be more accurate rate control, but because it takes two times so much time consumption; the principle of CRF is similar to that of CQP, but the code rate is different in the case of slow motion or vigorous motion, and the method of optimizing distribution of bit rate is done. VBV this is a buffer pool, Because the Bitstream control is based on a feedback mechanism, it is also necessary to guarantee the buffer pool mechanism.
Narrow band HD bitrate limit
Narrowband HD has a code rate limit, in the case of different resolutions, and Pat the cloud to ensure that at this rate can obtain the best video quality. Detailed reference:docs.upyun.com/cloud/av/#_18
Start building with 50+ products and up to 12 months usage for Elastic Compute Service