H.264+JM Study Notes

Source: Internet
Author: User
Tags relative split
The characteristics of H .

The 4:2:0 supports encoding and decoding of continuous or interlaced video. The basic concept of H .

1. Macro Block
In video coding, an encoded image is usually divided into a number of macro blocks , a macro block consisting of a luminance pixel block and an additional two chroma pixel blocks . In general, the luminance block is a 16x16-sized block of pixels, and the size of the two chroma image pixel blocks depends on the sampling format of their image, such as: for YUV420 sampled images, the Chroma block is an 8x8-sized pixel block. in each image , a number of macro blocks are arranged in the form of a video coding algorithm in the macro block units, macro-block by code, organized into a continuous video stream.

2, tablets (slice)
In order to meet the requirements of MTU size, it is particularly important to fragment the video in 3G video transmission. The fragmented compressed video stream typically contains one slice per RTP packet (or it can be split or merged with RTP), typically containing one or several macro blocks in each slice, whichever is the r11p packet size satisfies the MTU requirement.
In addition to meeting the communication requirements, the video stream is fragmented, and another function is to improve the fault tolerance of the video stream . H. The 264/AVC standard stipulates that intra-frame code blocks can only be predicted within the same slice. This way, if the data in a slice is lost due to a transmission error, it can affect only the macro block decoding in the same piece, and will not affect the decoding of the macro blocks in the other slices in the same frame image. The image is divided into several slices, when a piece is not normal decoding the space visual effect will be greatly reduced, and the head of the piece also provides a resynchronization point.
Ref:http://www.cnblogs.com/jiangjh/archive/2011/06/30/2094756.html

A slice is made up of a series of macro blocks arranged in a raster scan order. In general, each macro block contains a 16x16 luminance array that contains and two corresponding chroma arrays when the video format is not monochrome. If you do not use the macro block adaptive frame/field decoding, each macro block represents a rectangular area of space in the image. For example, as shown in Figure 3.22, an image is divided into two bands.

Each slice is a separate encoding unit that cannot be crossed between frame and intra-frame encoding.
ref:http://blog.csdn.net/yangzhongxuan/article/details/8011596

3. Frame
In video compression, each frame represents a still image.
Frame data can be divided into multiple slice. The data in each slice is predicted to use only the data of its own slice in the frame, with no dependency on other slice data. The NAL is used to carry the encoded data into large packets. For example, each slice data can be placed in a NAL package.
I-frame is self-coded and does not depend on other frame data. The I-frame is usually the first frame of each GOP, which is moderately compressed to be a reference point for random access and can be used as an image.
P frame relies on the I frame data. The P-frame is predicted by the P-frame or I-frame in front of it. The full P-frame image must be reconstructed after the predicted value in the I frame is summed with the prediction error in the decoding.
b frame relies on I frame, P frame, or other B frame data.

I-frames represent keyframes, which you can understand as a complete reservation for this frame, and only need this frame of data to complete the decoding (because it contains the full picture).

The P-frame represents the difference between this frame and a previous keyframe (or P-frame), which needs to be decoded to create the final picture by overlaying the differences defined by this frame with the previously cached screen. (That is, the difference frame, p-frame does not have the complete picture data, only with the previous frame the picture difference data).
P-Frame: Http://baike.baidu.com/link?url=HTNh1PTyKgaWi5NKBQPdEtlerNxvyt_we2s826sYshyDjK_NeIcv5HlNRqpg-xlOpMqgufKBfe8qNq1l2OtlBq

B-Frame is a two-way differential frame, that is, B-frame recording is the difference between this frame and the front and back frame (more complex, there are 4 cases), in other words, to decode the B-frame, not only to obtain the previous cache screen, but also to decode the screen after the image and the frame data overlay to obtain the final picture B-Frame compression rate is high, but the CPU will be more tired when decoding ~.
ref:http://blog.csdn.net/abcjennifer/article/details/6577934

I, B, p frames are based on the needs of the compression algorithm, are human-defined, they are real physical frames, as to which frame in the image is the I-frame, is random, one but determined the I frame, the subsequent frames are strictly in accordance with the specified order.
When the video decoder decodes a bit stream one frame at a time to reconstruct the video, it must always be decoded from the I-frame. If P-frames and B-frames are used, they must be decoded along with the reference frame. Only I and P frames are used in the H. A benchmark class. Because the base class does not use B-frames, it can achieve low latency, making it ideal for network cameras and video encoders.

the difference between frame and IDR frame
For example, in a video,
The following frames exist: I p b p b p b b p (green) I (red) p (blue) b ...
If a multi-reference frame is applied to this video, then the blue P-frame will refer to the I-frame (red) in front of him, and it may refer to the P-frame (green) before the I-frame, because the scene before and after the I-frame may be very different or even at all, so the frame before the P-frame reference I Will cause a lot of problems instead.
So a new type of frame is introduced, and that is the IDR frame. If the video uses multiple reference frames while using IDR frames, then the order of the frames will become this: I p b p b p b b p IDR p b ...
Since IDR frames prohibit the subsequent frames from referencing the frame in front of themselves, this time the blue P-frame does not refer to the green P-frame. h Structure

Traditional coding structure:

Sequence encoding structure:

Code Flow structure:
nalu:coded H. Stored or transmitted as a series of packets known as Networkabstraction layerunits. (Nalu unit)
Rbsp:a Nalu contains a raw Byte Sequence Payload, a Sequence of bytes containingsyntax elements. (Raw data byte stream)
Sodb:string of data bits (raw bit stream, length is not necessarily a multiple of 8, it needs to be filled)

Logical Relationship:
Sodb Data bit string--> the most primitive encoding data, namely VCL data;
Sodb + rbsp Trailing bits = rbsp
NAL header (1 byte) + rbsp = Nalu
Start Code Prefix (3 bytes) + Nalu + Start Code Prefix (3 bytes) + Nalu + ... + = H.264bitsstream
ref:http://blog.csdn.net/stpeace/article/details/8221945 I macro block

In H. Three, a macro block consists of 16*16 pixels, and I-macro blocks are available in one or two forms:
1.16 i4x4 encoding method composed of macro block (for convenience, referred to as i4x4 way of macro block);
The finer the chunking, the more accurate it is, so the i4x4 encoding is suitable for coding those macro blocks that are more complex in texture.
2. I16X16 macro Block;
I16X16 encoding is suitable for coding areas that are relatively smooth.
3. IPCM macro Block (Special)
No prediction, no residual, no transformation, no quantification, and so on, but directly to the pixel value, directly to the original YUV data written to the stream, visible IPCM information will not have any loss P-SKIP macro block

For the general P macro block, the pixel residuals and motion vector residuals are written to the code stream, from the encoding to the decoding end, but the special P-SKIP macro block is that neither the pixel residuals nor the motion vector residuals are transmitted (in this case, The pixel residuals and motion vector residuals must all be zero, so there is no need to transmit at all. In addition to transmitting some of the code to identify that the macro block is a small bit of the PSKIP macro block, there is no need to transfer other information about the macro block, then the decoding side how to recover pixels.
We know that MVD = MV-MVP, just said, the motion vector residual MVD is zero, and from the decoding end can get MVP, so, at the decoding end also know the MV. At the decoding end there is a reference frame corresponding to the reconstruction of the macro block pixel , according to the reconstructed pixel and MV, you can restore the frame of this macro block pixel value (if the MV is a fraction, you need to interpolate), this is called the P-SKIP macro block principle, from the literal understanding is to skip the macro block, Equivalent to this macro block is not encoded, at the decoding end took an approximate alternative recovery method.
Why introduce P-SKIP macro block? If the macro block of this frame and the reference frame of a macro block pixel (do not require two macro blocks in the same position) almost identical, it is clear that the current frame macro block is not encoded at all, at the decoding end, directly can be used in the approximate substitution method to restore the pixel value of the macro block of this frame. For example, in the first frame there is a table tennis, in the second frame also has a table tennis, the second frame of ping-pong this macro block is likely to be compiled into a PSKIP macro block.
inter-frame prediction of ref:http://blog.csdn.net/stpeace/article/details/8202880 H.

1. Inter-frame Prediction concept
Predictive mode with encoded video frame/field and block-based motion compensation .
Motion Estimation:
The active image is divided into blocks or macro blocks, and tries to search out the position of each block or macro block in the adjacent frame image, and obtains the relative offset of the space position between the two, the relative offset is the commonly referred motion vector, and the process of getting the motion vector is called motion estimation.
Motion Compensation:
Based on the motion vector and the inter-frame prediction method, the process of estimating the current frame's estimated value (predicted value) is obtained. Designed to show how each pixel of the current image is obtained from the pixel block of the reference image.

2. Inter-frame prediction process
1. The current frame looks for the matching part in the window of the past frame, and finds the motion vector from it;
2. According to the motion vector, the past frame displacement is obtained to estimate the current frame (prediction);
3. Subtract this estimate from the current frame and obtain the estimated error value (prediction error);
4. Send the motion vector and the estimated error value to the receiving end.
5. The decoding end takes the past frame ( reference frame ) as displacement (i.e. the estimation of the current frame) based on the received motion vector, plus the received error value, which is the current frame.
6. Using the prediction block information from the code stream, the decoder obtains the prediction macro block p from the reference frame (reference image list), which is the same as the Prediction macro block p formed by the encoder. P+D filter, decoding macro block, current image so when the macro block decoding is complete, the current reconstructed image is displayed, and this image will be used for future decoded reference frames.
Ref:http://blog.sina.cn/dpool/blog/s/blog_687611bf0101gtnn.html


ref:http://blog.csdn.net/a514223963/article/details/7894779

3, the tree structure of the motion compensation:
The motion vectors of the currently-encoded blocks are predicted using the motion vectors of the coded blocks .
Coding and Transmission: Actual motion vectors-predicted values, motion vectors for each block, chunking method

4. Inter-frame chroma block prediction
The MV of the Chroma block is obtained by halving the MV horizontal and vertical components of the corresponding luminance block.

5. Features of inter-frame prediction
split macro blocks of different sizes and shapes
The motion compensation for each 16x16 pixel macro block can take different sizes and shapes, and H. 7 (16*16~4). The motion compensation of the small block mode improves the performance of the motion details, reduces the block effect and improves the quality of the image.
High-precision subpixel motion compensation
uses a half-pixel precision motion estimation in H.263, while in H. 1/4 (luminance) or 1/8 (chroma) pixel accuracy, the motion estimate can be used. In the case of requiring the same precision, the residual error after the motion estimation using 1/4 or 1/8 pixel accuracy is smaller than that of the h.263 using half-pixel precision motion estimation. In this way, with the same precision, the bit rate required in the inter-frame encoding is smaller. The
Multi-frame prediction
Multi-frame prediction requires the decoder and encoder to store multi-frame images as reference frames in the buffer. The decoder uses the parameter setting information memory management control operation in Bitstream to replicate the same multi-frame buffer of the encoder. At the same time, the 16x16, 16x8, 8x16, or 8x8 blocks and their sub-blocks of each motion compensation need to be passed the reference index parameter to determine the location of the block or sub-block reference frame in the cache .
to block filter

6. Reference image List
Decoder each decoded an image, will determine whether the image is used for reference, and mark the corresponding reference image, and will decode the next image, the reference image list is initialized; When decoding the next image, first determine whether the reference list needs to be reordered according to the title information of the image. , if necessary, according to the title of additional information to reorder, and then start decoding the image, after the decoding is complete, re-reference image of the mark ... So loop. (The IDR frame does not need a reference frame, it is decoded after the mark produces the first reference image, for subsequent images to be used when decoding)
ref:http://blog.csdn.net/newthinker_wei/article/details/8784742. h. Data stream Format

http://www.cnblogs.com/general001/archive/2013/04/26/3044833.html Questions

1, a frame can be divided into multiple slice to encode, and a slice encoding is packaged into a nal unit.
2, Nalnal_unit_type in 1 (non-IDR image encoding strip), 2 (coded stripe data Block a), 3 (coded stripe data partition block B), 4 (coded stripe data block C), 5 (Code stripe of IDR image) type and three encoding modes of slice species: five types in I_slice, P_slice, B_slice 
NAL Nal_unit_type, representing what information is next and how it is chunked.
I_slice, P_slice, b_slice represent I-type slices, P-type slices, and B-type slices. I_slice is the intra-frame predictive mode encoding, P_slice is one-way predictive encoding or intra-frame mode, and B_slice is bidirectional prediction or intra-frame mode.
other

About H264, the General 6 advanced document is
1 "h.264_mpeg-4 part" White Paper "
2" Video coding using the "H. MPEG-4 AVC Compression Standard "
3 H. MPEG-4 Video Compression"
4 Overview of the H.264_AVC video Coding standard
5 Ov Erview and Introduction to the Fidelity Range Extensions "
6 h.264_mpeg-4 AVC Reference software Manual"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.