(Quan sorting)
I. Video Information and Signal Features
1.1 Intuitive Use of human visual systems, direct acquisition of video information 1.2 deterministic video information only specific, not difficult to confuse with other content 1.3 efficient use of visual systems, people can observe each pixel of the image in parallel, therefore, it is highly efficient. 1.4 extensive visual systems account for 70% of the total external information. 1.5 of High-band video information of Video Signals contains a large amount of changed information. The amount of information is huge, and the bandwidth required for transmission networks is relatively large.
Ii. requirements and possibilities of video compression2.1 because of the large amount of video information and high transmission bandwidth, the video source must be compressed before transmission to save bandwidth and storage space. (1) The video must be compressed within a certain bandwidth and a sufficient compression ratio should be ensured. (2) After video compression, the restoration must ensure a certain degree of video quality (3) Video Encoder implementation methods should be simple, easy to implement, low cost, and high reliability. 2.2 possible rows of video compression (1) time-related rows in a set of video sequences, there are only a few differences between adjacent two frames, which is time-related. (2) The spatial correlation is in the same frame, and there is a large correlation between adjacent pixels. The closer the two pixels are, the stronger the side correlation.
III,
Video Coding Technology3.1 The basic structure video encoding method is related to the source used. Based on the source model, video encoding can be divided into two categories: Waveform-based encoding and content-based encoding. 3.2 If Waveform-based coding uses a source model consisting of many pixels in an image, the parameters of this source model are the brightness and the amplitude of the pixel, these parameters are encoded Based on waveforms. The pixel space correlation and the time correlation between frames are used. The prediction encoding and change encoding technologies are used to reduce the video signal correlation, significantly reduce the bit rate of the video sequence, and achieve the goal of compressing the encoding. 3.3 If content-based encoding uses a source model composed of several objects with a component, the parameters of this source model involve the shape, texture, and motion of each object, content-based encoding is used to encode these parameters. IV,
H264
Application4.1 h. 264 of the technical features can be summarized into three aspects (1) focus on practicality; (2) focus on adaptation to mobile and IP networks; (3) under the basic framework of the hybrid encoder, major improvements have been made to its key components, such as multi-mode motion estimation, intra-frame prediction, multi-frame prediction, content-based variable-length coding, and 4x4 two-dimensional integer transformation. (4) It is necessary to measure the implementation difficulty while focusing on the superior performance of H.264. In general, H.264 performance improvement is achieved at the cost of increasing complexity. It is estimated that H. the computing complexity of 264 encoding is approximately equivalent to H. 3 times of 263, and the decoding complexity is approximately equivalent to H. 2 times of 263. 4.2 h264 applications can be divided into three levels: (1) Basic level: (simple version, wide application, support for intra-and inter-frame encoding, and variable-degree-based entropy encoding .) applications: real-time communication, such as video sessions, conference TVs, and wireless communications. (2) Major grades: (a number of technical measures are adopted to improve image quality and increase compression ratio, support for interlace videos and context-based adaptive arithmetic coding .) application: digital broadcast and digital video storage. (4) extended grades: application fields: Video Stream Transmission over various networks, on-demand video 5,
Video Encoding principles5.1 basic concepts (1) video encoder can compress an image or a video sequence to generate a code stream. In, the frame or field FN input by the encoder is processed by the encoder in the unit of Macro Block.
If Inter-Frame Prediction encoding is used: The predicted P value is obtained by reference of the encoded image after motion compensation. The prediction image P is subtract from the FN of the current frame, and the residual difference DN of the two images is obtained. The dn is converted to t, q is quantified, and space redundancy is eliminated. The coefficient x is obtained, sorts X (to make the data more compact) and Entropy code (to add motion vectors... Obtain the nal data. The re-encoder has a reconstruction process (decoding process), quantization factor X, reverse quantization, reverse transformation, get DN ', DN' and prediction image P, get ufn ', filter to get FN, and FN is the image obtained after FN encoding and decoding.
If intra-Frame Prediction encoding is used: Predicted P is predicted by the Macro Block encoded in the current video (brightness 4x4 or 16x16, color 8x8 ). The block to be processed, minus the predicted value p, gets the residual value DN. The dn is converted to T, quantizes Q, obtains the coefficient x, rearranges X (makes the data more compact), and Entropy code, obtain the nal data. During re-reconstruction, the quantization coefficient x, inverse quantization, and inverse transformation are used to obtain the sum of DN ', DN' and prediction image P to obtain the decoded value of the front macro block, this value can be used as a reference Macro Block for intra-frame prediction.
The reason why the encoder needs to have a reconstruction mechanism: the reconstruction process is actually a decoding process. The decoded image and source image must be different. We will use the decoded image for reference, it can be consistent with the value in the decoder to improve the accuracy of image prediction. (In the decoder, it uses decoded images for reference and uses decoded images to predict the next image)(2) The Video Decoder can decode a code stream and produce images or video sequences with the same length as the source image or source video sequence. If the decoded image is the same as the source image, the encoding/decoding process is lossless. Otherwise, the decoded image is lossy. The decoder implementation is the same as the reconstruction mechanism of the encoder. (3) field, frame, and image field: The image is scanned by line, and even rows become the top rows. The odd number of rows is the base field. All the top fields are called the top fields. All bottom fields are called bottom fields. Frame: The image scanned row by row. Image: both the field and frame are considered as images. (4) Macro Block, Chip: Macro Block: a macro block consists of a 16x16 brightness block, an 8x8 CB and an 8x8 Cr. Slice: an image can be divided into one or more slices. A slice consists of one or more macro blocks. 5.2 The encoding data format 5.2.1 h264 supports video encoding and decoding. 5.2.2 The h264 encoding format has two main objectives: (1) high video compression ratio (2) Good network affinity, which can be adapted to various transmission networks. Therefore, h264 functions are divided into two layers: Video Encoding layer (VCL) and network extraction layer (NAL) VCL data, that is, the video data sequence after compression and encoding. VCL data can be transmitted or stored only after it is encapsulated in the nal unit. The Nal unit format is as follows:
NAL Header |
Rbsp |
NAL Header |
Rbsp |
NAL Header |
Rbsp |
5.2.3 h264 code stream structure 5.3 reference image h264 to improve accuracy, h264 can be selected from a maximum of 15 images to select the best matching image advantages: greatly improve prediction accuracy disadvantages: the complexity is greatly increased. The reference images are managed by the reference list (list0, list1, P frame has a reference list list0 B frame has two reference lists list0 and List 1 prediction block within 5.4 frames P is formed based on the encoded reconstruction block and the current block. brightness prediction: 4x4 brightness prediction, 16x16 brightness prediction for Color Pixel prediction: 8 × 8 color prediction 5.4.1 4 × 4 brightness prediction 4 × 4 brightness prediction there are 9 kinds of prediction modes (a) use the pixels above and left ~ Q: 8 directions for 4x4 prediction in a frame (B) in a 4x4 prediction in a frame
Mode |
Description |
Mode 0 (vertical) |
The corresponding pixel value is pushed vertically from the above Pixel |
Mode 1 (horizontal) |
The corresponding pixel value is pushed horizontally from the left Pixel |
Mode 2 (DC) |
All pixel values are derived from the average values above and left. |
Mode 3 (lower left corner) |
The corresponding pixel value is obtained by interpolation of pixels in the 45-degree direction. |
Mode 4 (bottom right diagonal) |
The corresponding pixel value is obtained by interpolation of pixels in the 45-degree direction. |
Mode 5 (right vertical) |
The corresponding pixel value is obtained by interpolation of the pixel value in 26.6 degrees. |
Mode 6 (lower level) |
The corresponding pixel value is obtained by interpolation of the pixel value in 26.6 degrees. |
Mode 7 (Left vertical) |
The corresponding pixel value is obtained by interpolation of the pixel value in 26.6 degrees. |
Mode 8 (upper level) |
The corresponding pixel value is obtained by interpolation of the pixel value in 26.6 degrees. |
Corresponding prediction blocks produced by 9 prediction modes (SAE defines the prediction error for each prediction ), among them, the smallest prediction block of SAE matches the most with the current block in the brightness prediction mode of 5.4.2 16 × 16-there are 4 prediction modes in total
Mode |
Description |
Mode 0 (vertical) |
The corresponding pixel value is introduced from the above Pixel |
Mode 1 (horizontal) |
Returns the corresponding pixel value from the left pixel. |
Mode 2 (DC) |
The corresponding pixel value is derived from the average value of the top and left pixels. |
Mode 3 (plane) |
The linear "plane" function is used to produce the corresponding pixel value, which is suitable for areas with gentle brightness changes. |
5.4.3 8x8 color block prediction modes: Four prediction modes, similar to intra-frame 16x16 prediction, where different DC numbers are set to Mode 0, horizontal to Mode 1, and vertical to Mode 2, plane mode: 3 5.5 frames prediction h264 frames prediction uses the encoded frame or field and block-based motion compensation. In h264, the block size is more flexible (16x16 to 4x4 ). 5.5.1 basic concepts there is a certain correlation between the scenes in the adjacent frames of the active image. Therefore, the image is divided into several blocks or macro blocks and the position of each block or macro block in the adjacent frame image is searched, and obtain the cheap space between the two. the relative offset is the motion vector (MV ). the process of vector motion is called motion estimation (me ). 5.5.2 tree Motion Compensation for the brightness of each Macro Block (16 × 16), which can be divided into one 16 × 16, two 16 × 8, two 8 × 16, 4x8. The sub-blocks in the 8x8 mode can be further divided into: 1 8x8, 2 4x8, 2 8x4, and 4 4x4. This split motion compensation is called tree motion compensation. Tree motion compensation, flexible and meticulous division, greatly improving the accuracy of motion estimation. The block size is variable. You can flexibly select the block size during motion estimation. In the Macro Block (MB) Division, H. 264 adopts four modes: 16 × 6, 16 × 8 × 16, 8 × 8. When the 8 × 8 mode is adopted, 8 × 4 × 8 can be further adopted, in this way, we can further divide three sub-Macro Block partitioning modes, which can make the division of moving objects more accurate, reduce the convergence error of the edge of the moving object, and reduce the calculation amount in the transformation process. When intra_16 × 16 Inter-frame prediction is used for large smoothing areas, H. 264 uses 16 4x4 DC coefficients of the brightness data for the second 4x4 transformation, the DC coefficients of four 4x4 blocks of the color data are transformed by 2x2. 5.5.3 each sub-Macro Block of the macro block encoded Between Motion Vector frames is predicted from a region of the same size of the reference image. The difference between the two (MV) is that the brightness component is accurate to 1/4 pixels, and the color is accurate to 1/8 pixels.
Appendix:
264
Learning Guide-- Learning is divided into three stages: 1. The first stage: Learning H. 264 first, you must put the most basic and necessary information in your hand (// 172.22.113.200/share/h264/h. 264 related papers/other/classic articles ). These materials include: standard documents + test models + classic articles. First look at the H.264_MPEG-4 Part 10 white paper, after reading the video coding using the H. 264 MPEG-4 AVC compression standard and Halsted. press. h.264.and. MPEG-4.Video.Compression.Video.Coding.For.Next.Generation.Multimedia.eBook-LiB, then you can take the time to read overview of the H. 264 _ AVC Video Coding standard.pdf. After reading these articles, you should have a deep understanding of the overall framework of H.264. The first three articles may cost you two ~ Three weeks. 2. Stage 2: Check the code. At this time, the most commonly used tools are standard documents and test models (jm86 is recommended ). To view the code, you must first start with the overall framework. First understand how the overall framework of H.264 is distributed in the code, what is the frontend module and the successor module of a function module. That is to clarify the entire code process. Standard documents may be rarely used at this stage. 3. Stage 3: Find a starting point that you are interested in and start to study this issue. When you study the problem, you should contact the test model for research. At this time, you need to carefully check the implementation of this problem in the code. At this stage, I absolutely support tracking with one line of code and one parameter. Standards may need to be checked for what is not understood in the code. Now you can refer to the standard document for targeted purposes. It is also because the standard document can be mapped to the code, so reading the standard document does not feel too difficult, but also can understand what the standard document is about, in the test model, how is Code implemented. In this phase, many H.264 related knowledge will be involved, so that we can take the dot-line and the line-plane. Will learn more and more about H.264 content. And you find your own direction. =, Your difficulties will be great. 2. For people who are new to H.264, do not directly look at the code and standards. Even if they combine the standards and code, you will not be able to perform well. In other words, you 'd better not do anything before you know the overall framework of H.264.
Related documents: H.264 code stream Structure
Next article: H.264/AVC technology progress and Pragmatic Development Strategy