Video bitrate, frame rate and resolution and H264 introduction

Source: Internet
Author: User
Tags arithmetic coding standards comparison constant advantage

Video bitrate, frame rate and resolution exactly which one affects the sharpness of the movie

Code rate: affect the volume, proportional to the volume: the larger the rate, the larger the volume, the smaller the rate, the smaller the volume.

The bitrate is the number of data bits transmitted per unit of time when data is transferred, typically in Kbps, or thousands per second. That is, the sampling rate, the larger the sampling rate per unit of time, the higher the accuracy, the more processed files are closer to the original file, but the file volume and sampling rate is proportional, so almost all the coding format is how to use the lowest bit rate to achieve the least distortion, Around this core derived from CBR (fixed code rate) and VBR (variable code rate), "bit rate" is the distortion, the higher the rate of the more clear, whereas the screen is coarse and many mosaic

Frame rate: Affect the smoothness of the screen, and the flow of the picture is proportional: the greater the frame rate, the more smooth the picture, the smaller the frame rate, the more the screen has a sense of beating. If the code rate is a variable, then the frame rate will also affect the volume, the higher the frame rate, the more screen per second, the need for the higher bit rate, the larger the volume.

The frame rate is the number of frames in a picture transmitted in 1 seconds, or it can be understood that the graphics processor can refresh several times per second,

Resolution: Affects the image size, proportional to the image size: the higher the resolution, the larger the image; the lower the resolution, the smaller the image.


In the case of a certain rate, the resolution is inversely proportional to the definition: the higher the resolution, the less clear the image, the lower the resolution, the clearer the image.
In the case of certain resolution, the code rate is proportional to the definition, the higher the code rate, the clearer the image, the lower the bitrate, the less clear the image.

Bandwidth, frame rate

For example, the transmission of the image on the ADSL line, the upstream bandwidth is only 512Kbps, but to transmit 4 CIF resolution of the image. According to the usual, CIF resolution recommended code rate is 512Kbps, then according to this calculation can only be transmitted all the way, reducing the bitrate will inevitably affect the image quality. In order to ensure image quality, it is necessary to reduce the frame rate, so that even the lower bitrate will not affect the image quality, but will have an impact on image coherence.


Based on the MPEG-4 technology, the guarantees codec process consists of 5 parts: inter-frame and intra-frame prediction (estimation), transform (Transform) and inverse transform, quantization (quantization) and inverse, loop filter (loop Filter), Entropy coding (Entropy Coding).

H. Psnr is significantly better than MPEG-4 (ASP) and h.263++ (HLP), and in the comparison test of 6 rates, the Psnr of Humiliates-H is 2dB higher than MPEG-4 (ASP), and 刃 is 3dB higher than h.263 (HLP) on average. 刡 6 test rates and their related conditions are: kbit/s rate, 10f/s frame rate and QCIF format; kbit/s rate, 15f/s frame rate and QCIF format; 128kbit/s speed, 15f/s frame rate and CIF format; 256kbit/s rate, 15F/S frame rate and QCIF format, kbit/s rate, 30f/s frame rate and CIF format, 1024x768 kbit/s rate, 30f/s frame rate, and CIF format. Instruction


The biggest advantage is the high data compression ratio, demonstrators in the same image quality conditions, 凨 H. Twice times the compression ratio of MPEG-2, chisel is MPEG-4 1.5~2 times. As an example, if the size of the original file is 88GB, Then use MPEG-2 compression standard compression into 3.5GB, chisel compression ratio of 25:1, as and compressed by the compression standard of compression to 879MB, Nagi from 88GB to 879MB, 凥 the compression ratio reached 102:1. Why is the compression ratio so high in H. Low bit rates play an important role, compared to the compression technology such as MPEG-2 and MPEG-4 ASP, the compression technology will greatly save users ' download time and data traffic charges.


I/p/b Frame

I frame: in-frame encoded frames, i.e. keyframes or stand-alone frames

Prediction frame: I frame as the base frame, with I frame prediction P-frame, and then by I-frame and P-Frame prediction B-frame;

B-Frame: bidirectional prediction interpolation coded frame

P Frame: forward prediction coded frame

In addition to supporting P-frames and B-frames, 刋 supports a new inter-stream transfer frame--SP frame

The statistical results show that in the image of one or two frames, each pixel is only 10% or less, and its luminance difference is more than 2%, and the chroma difference value changes only 1%

In the process of video compression, I-frames are image data compression and are independent frames. P frames are reference I frames for inter-frame image data compression, not stand-alone frames. Most of the compressed video is b/p frame, so the video quality is mainly represented by b/p frame. Since the b/p frame is not a stand-alone frame, but the difference between the adjacent I-frames is preserved, there is actually no concept of resolution and should be considered as a binary difference sequence.

While the binary sequence uses the entropy coding compression technique to use the quantization parameter to carry on the lossy compression, the video quality is directly determined by the quantization parameter, and the quantization parameter directly affects the compression ratio and the code rate.

Video quality can be expressed subjectively and objectively, the subjective way is the video definition usually mentioned, and the objective parameter is quantization parameter or compression ratio or bitrate. In the same way as the video source, the compression algorithm is the same as the premise of comparison, quantization parameters, compression ratio and code rate is directly proportional to the relationship.

The change in resolution is also called resampling. From high-resolution to low-resolution is called the lower sampling, because the pre-sampling data sufficient, only need to retain more information, generally can obtain relatively good results. The loss of video quality (sharpness) is bound to be distorted by the fact that the low resolution becomes a high-resolution, called upper-sampling, and because of the need for interpolation to supplement (guess) the missing pixel points.

amount of data transferred

Due to the length of packets received on the network, these different packet length groups and inevitably affect the interface rate. Data transmitted on the bus in addition to the data payload , the protocol layer will also add the necessary protocol header and tail, so add network overhead, the amount of data transmitted is generally small ≤ net charge data volume * 1.3

Key technologies for the standard of H.
1. Intra-frame predictive coding
Intra-frame encoding is used to reduce spatial redundancy of images. 凣 in order to improve the efficiency of intra-frame coding, acutely takes full advantage of the spatial correlation of adjacent macro blocks in a given frame, 刃 adjacent macro blocks usually contain similar attributes. Therefore, 凨 when encoding a given macro block, the planing can first be based on the surrounding macro-block prediction (typically based on the upper left corner of the macro block, 刋 because this macro block has been encoded), and then the difference between the predicted value and the actual value of the code, 齒 so, open relative to the direct frame encoding, Laurie can greatly reduce the bitrate. Humiliates
H. 6 provides a 4x4 pixel macro block prediction with a range of 1 DC predictions and 5 direction predictions, as shown in Figure 2, Nagi. In the figure, A to I of 凕 adjacent block of 9 pixels have been encoded, demonstrators can be used to predict, a few if we choose Mode 4, 凵 So, cleanse A, B, C, D4 pixels are predicted to be equal to the value of E, the profit E, F, G, H4 pixels are predicted to be equal to the value of F, 凣 for flat areas with very little spatial information in the image, open H: also supports 16x16 in-frame encoding. Delete Figure 2 in-frame encoding mode
2. Inter-Frame Predictive coding
Inter-frame predictive coding uses time redundancy in continuous frames to estimate and compensate for motion. Instruction the motion compensation of the previous video coding standards for most of the key features, concave and flexible add more features, 刋 in addition to support P-frame, B-frame, Huang H-C also supports a new inter-stream transfer frame--sp frame, gan as shown in Figure 3. After the SP frames are included in the stream, Luv can quickly switch between streams with similar content but with different bitrate, 刧 supports both random access and fast replay mode. 刦 Figure 3 sp-frame diagram The motion estimation of H. 4 has the following properties. Nagi
(1) macro block segmentation of different sizes and shapes
The motion compensation for each 16x16 pixel macro block can be different in size and shape, and the 7 modes are supported by minus H. Dispose as shown in Figure 4. Reducing the motion compensation of block mode improves the performance of the motion details, guarantees reduces the block effect, and 凣 improves the quality of the image. Figure 4 Macro Block Segmentation method
(2) High precision sub-pixel motion compensation
In H.263, motion estimation with half-pixel accuracy is used, and the motion estimate of 1/4 or 1/8 pixel accuracy can be used in H. Rear. In the case where the same accuracy is required, luv H. 1/4 or 1/8 pixel accuracy of the residual error after the motion estimation is smaller than the h.263 using half-pixel precision motion estimation. 凕 in this way, with the same accuracy, the bitrate required for the inter-frame coding is smaller. As
(3) Multi-frame prediction
With the optional multi-frame prediction feature, the LUV can select 5 different reference frames for the inter-frame encoding, which provides better error correction performance and 凖 to improve video image quality. MPhil This feature is mainly used in the following situations: Periodic motion, translational motion, changing the camera's lens back and forth between two different scenes. 凘
(4) Go to block filter
The self-Adaptive block effect filter is defined to deal with the horizontal and vertical block edges in the prediction loop, which greatly reduces the block effect. Kay
3. Integer transformations
In terms of transformation, Liu H-SSQA uses a DCT-like transformation based on a 4x4 pixel block, but uses an integer-based spatial transformation, 刅 there is no inverse transformation, and there is an error in the choice of the 凢 transformation matrix as shown in Figure 5. Compared with floating-point arithmetic, the DCT transform of Hishinuma can cause some additional errors, but the quantization error caused by the DCT transform is not significant because of the quantization error in the open. 刋 In addition, the integer DCT transform also has the advantages of reducing computation and complexity, and 凥 is advantageous to the port-fixed DSP. 刐
4. Quantification
32 different quantification steps are available in H. Hishinuma this is similar to the 31 quantization steps in the H.263, but in H. 12.5%, the stool step is progressive with a compound rate in the ratio of five, rather than a fixed constant. Dispose
There are two ways to read the transformation coefficients in H. 6: The glyph (ZIGZAG) scan and the double scan, 凢 as shown in Figure. Nagi uses simple zigzag scanning in most cases, and dual scans are only used within blocks with smaller quantization levels, and 凣 helps improve coding efficiency. Delete Figure 6 The method of reading the transformation coefficients
5. Entropy coding
The last step of video encoding processing is entropy coding, in which the two different entropy coding methods are used: Universal variable length coding (UVLC) and text-based adaptive binary Arithmetic coding (CABAC). Philip

Technical Highlights of H.

1. Layered design
The algorithm is conceptually divided into two layers: the video coding layer (Vcl:video Coding layer) is responsible for efficient video content representation, Liu Network extraction layer (Nal:network abstraction Layer) is responsible for packaging and delivering the data in the appropriate manner required by the network. Apoptosis defines a packet-based interface between VCL and nal, and instruction packaging and corresponding signaling are part of the NAL. In this way, the tasks of continual high coding efficiency and network friendliness are accomplished by VCL and nal respectively. Oblivion
The VCL layer includes block-based motion compensation mixed coding and some new features. Exposing as in the previous video coding standard, 凵 H. I did not include pre-processing and post-processing features in the draft, which would increase the flexibility of the standard. Mowed
The NAL is responsible for encapsulating the data using the segmented format of the downlevel network, which includes the framing, signaling of the logical channel, utilization of timing information, or sequence end signal. Out for example, Phoenix NAL supports video transmission formats on circuit-switched channels, instruction supports video on the internet using RTP/UDP/IP transmission formats. Rear NAL includes its own head information, segment structure information and actual load information, creating the upper-level VCL data. Stool (判 Data may consist of several parts if data segmentation is used). Clues
2. High-precision, multi-mode motion estimation
The motion vector of 1/4 or 1/8 pixel accuracy is supported. The 6-tap filter can be used to reduce high-frequency noise at 1/4 pixel accuracy, and the 凥 can use a more complex 8-tap filter SSQA for motion vectors with 1/8 pixel accuracy. Just when the motion estimation is carried out, the killer encoder can also choose "enhanced" interpolation filter to improve the prediction effect. Guarantees
In the motion prediction of H. 2, a macro block (MB) can be divided into different sub-blocks according to the throat, and the stools form a block size of 7 different modes. 刡 This multi-modal flexible and meticulous division, continual more suited to the shape of the actual moving objects in the image, reducing greatly improve the accuracy of motion estimation. In this way, 凖 can contain 1, 2, 4, 8, or 16 motion vectors in each macro block. 判
In H. S, the sentence allows the encoder to use more than one frame of the previous frame for motion estimation, which is called multi-frame reference technology. 凖 such as a 2-or 3-frame reference frame that has just been encoded, the clipping encoder will choose to give a better prediction frame to each target macro block and indicate for each macro block which frame is used for the prediction. Ssqa
3. Integer transformation of 4x4 blocks
Similar to the previous standard, the measures uses block-based transformation coding for residuals, while SSQA transforms are integer operations rather than real numbers, and the process and DCT are basically similar. Humiliates the advantage of this approach is that the same precision transformations and inverse transformations are allowed in the encoder and in the decoder, making it easy to use a simple fixed-point operation. The blade means that there is no "inverse transformation error" in the Taipa. The units of the 刐 transform are 4x4 blocks, guarantees rather than the previously used 8x8 blocks. 刐 because of the size of the transformation block, convex moving object division more accurate, cool so that, 凢 not only the transformation of the calculation is small, but also at the edge of the moving object of the cohesion error is greatly reduced. 凞 in order to make the transformation of small size block in the image of the larger area of the smooth region does not produce gray differences between blocks, humiliates can be within the frame macro block brightness data of 16 4x4 block DC coefficients (each small block one, a total of 16) for the second 4x4 Block transformation, Nagi 4 of the chroma data 4x The DC coefficients of 4 blocks (one for each small block, and 4 for all) are converted to 2x2 blocks. 刡
In order to improve the ability of the rate control, the amplitude of the Laurie quantization step is controlled at about 12.5%, instead of changing with constant increment. The normalization of the amplitude of the transformation coefficients is placed in the process of the inverse quantization to reduce the computational complexity. Just to emphasize the color fidelity, Huang has adopted a smaller quantization step for chromaticity coefficients. Invasive
4, the unified VLC
There are two ways to encode entropy in H. Acutely, one is to use the unified VLC (Uvlc:universal VLC) for all symbols to be encoded, and the other is to use content-adaptive binary arithmetic coding (cabac:context-adaptive binary Arithmetic Coding). Laurie Cabac is optional, 凞 its coding performance is slightly better than UVLC, Nagi but also high computational complexity. Minus UVLC uses an infinite set of code words, 凣 design structure is very regular, exposing with the same Code table can be different object encoding. Phoenix This method is easy to produce a code word, cool and the decoder is also easy to identify the prefix of the code word, Gan UVLC in the event of a bit error can quickly get resynchronization. Where
5. Intra-frame prediction
In the previous H.26X series and Mpeg-x series standards, several of the methods used for inter-frame prediction. In H. Cleanse, the intra-frame prediction is available when encoding intra images. For each 4x4 block (except for the special disposal of the Edge block), the coagulation per pixel can be predicted using the different weights of the 17 closest previously encoded pixels (some weights can be 0), and 刃 is the 17 pixels in the upper-left corner of the block where the pixel resides. Oblivion Obviously, instruction this intra-frame prediction is not in time, concave but in the spatial domain of the predictive coding algorithm, 凒 can remove the space between the adjacent blocks of redundancy, ling get more efficient compression. As
As shown in Figure 4, the exposing 4x4 block A, b 、...、 p is 16 pixels to be predicted, humiliates a, B 、...、 p is the encoded pixel. 凘 such as the value of M-points can be predicted by (j+2k+l+2)/4, 刦 can also be predicted by (A+B+C+D+I+J+K+L)/8, Nagi and so on. Oblivion There are 9 different modes of 齒 brightness in terms of the points selected by the predicted reference, there are only 1 patterns in the intra-frame prediction of 刅 but chroma. Several
6. For IP and wireless environments
The bill contains tools for error elimination, which facilitates the transmission of compressed video in error-prone, packet-loss environments, MPhil such as the robustness of transmission in a mobile channel or an IP channel. Stool
In order to resist transmission errors, the time synchronization in the video stream can be accomplished by using intra-frame image refresh, and the 刃 space synchronization is supported by the strip structure coding (slice structured coding). Ling at the same time in order to facilitate the resynchronization after the error, clipping in a picture of video data also provides a certain resynchronization point. Cut in addition, where the macro block refresh and multi-reference macro block in the frame allows the encoder to determine the macro block mode can not only consider the coding efficiency, the genesis can also consider the characteristics of the transmission channel. 刅
In addition to using the change of quantization step to adapt to the channel code rate, cool in H. N, the data segmentation method is often used to deal with the change of channel rate. Generally speaking, the concept of cleanse data segmentation is to generate video data with different priority in the encoder to support QoS in the network. Journal for example, using a syntax-based data segmentation (syntax-based) method, clues divides each frame of data into parts of its importance, guarantees this allows for less important information to be discarded when the buffer overflows. Taken can also take a similar time data segmentation (temporal data partitioning) method, clues by using multiple reference frames in P-frames and B-frames. Sentenced
In the application of wireless communication, continual we can support the large bit rate change of the wireless channel by changing the quantization precision or the space/time resolution of each frame. Rear However, 齒 in the case of multicasting, Luv requires the encoder to respond to varying bit rates is not possible. Stool as a result, Kay differs from the method of fine-coded FGS (Fine granular Scalability) used in MPEG-4, which uses a stream-switched SP frame instead of the hierarchical encoding for 刅 H.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.