Additional Notes:
VCL: The VCL layer refers to the video encoding layer. The VCL nal unit refers to the NAL units whose nal_unit_type value is equal to 1 to 5 (including 1 and 5, these units contain video data. All other NAL units are called non-vcl nal units, and both PPs and SPs are non-vclnal units.
The format of the byte stream nal unit: (the length of 0 in the Start code)
Except for the byte stream NAL units starting with the stream, most of the byte stream NAL units do not start with leading_zero_8bits (0 of a byte );
When nal_unit_type is equal to 7 (SPS) or 8 (PPS), or the byte stream nal unit syntax structure contains the first nal unit of an access unit in the decoding sequence, zero_byte (0 in one byte) will appear in the code stream );
Start_code_prefix_one_3bytes (three 000001 bytes) exists at the beginning of the unit of all byte streams ).
Therefore, if you do not consider the beginning of the stream and there is no special bytes stream nal unit, it is only three bytes of the starting code (that is, the number of 0 in the starting code is 2 ).
The first nal unit of an access unit refers:
The first Any of the following NAL units after the last VCL nal unit of the Basic Encoding image represents the beginning of a new access unit:
-Access units are separated into NAL units (if any)
-Serial parameter set nal unit (if any)
-Nal unit of the image parameter set (when the image parameter set exists)
-Sei nal unit (when it exists)
-NAL units whose nal_unit_type value is between 14-18 (inclusive)
-The first VCL nal unit of the basic encoded image (always exists)
From: http://blog.csdn.net/yangzhongxuan/article/details/8003494
Glossary
Field and frame: one or more frames of a video can be used to generate an encoded image. In a TV, in order to reduce the flickering phenomenon in a large area, one frame is divided into two barrier fields.
Slices: Several macro blocks in each image are arranged into slices. Slice is divided into I, B, P, and other slice.
The I-part only contains the I macro block. The P-part can contain the P and I macro blocks, and the B-part can contain the B and I macro blocks.
The I Macro Block uses decoded pixels in the current video as a reference for intra-frame prediction.
P Macro Block uses the previously encoded image as the reference image for intra-frame prediction.
The B Macro Block uses two-way reference images (the previous frame and the next frame) for intra-frame prediction.
The purpose of an slice is to limit the spread and transmission of codes so that the encoded slice is independent of each other.
The prediction of a piece cannot take the Macro Block in other pieces as the reference image, so that the prediction error in a piece won't be transmitted to other pieces.
Macro Block: a coded image is usually divided into several macro blocks, A macro block consists of a 16x16 brightness pixel and an 8x8 CB appended with an 8x8 Cr Color Pixel block.
Relationship between data:
In the h264 structure, a video image encoded data is called a frame. A frame is composed of one slice or multiple slice, A piece consists of one or more macro blocks (MB), and one macro block consists of 16x16 YUV data. Macro Block is the basic unit of h264 encoding.
There are three different data formats in the h264 Encoding Process:
Sodb data bits ----> the most primitive encoding data, that is, VCL data;
Rbsp original byte sequence load ----> after sodb, add the ending bit (rbsp trailing bits is a bit "1") and a number of BITs "0" to ensure byte alignment;
Ebsp extended byte sequence load ----> Based on rbsp, the imitation validation byte (0x03) is added because: When NALU is added to limit B, you need to add the start code startcodeprefix before each set of NALU. If the slice corresponding to the NALU is the start of a frame, it is expressed in 4-bit bytes, ox00000001, otherwise, ox000001 (a part of a frame) is represented in three bytes ). In addition, in order to make the NALU subject excluded from the conflict with the start code, the 0x03 of a byte is inserted every time two bytes are consecutively 0 during encoding. Remove 0x03 from decoding. Also known as shell removal.
Layered Structure of h264/AVC
H.264's main goals are:
1. High video compression ratio;
2. Good network affinity;
The solution for achieving these h264 goals is:
1. VCL video coding layer video encoding layer;
2. nal network extract action layer network extraction layer;
The VCL layer defines the core algorithm engine, block, Macro Block, and chip syntax. It finally outputs the encoded data sodb;
The Nal layer defines the syntax level above the chip level (such as the sequence parameter set and image parameter set for network transmission ),
It also supports the following features: Independent chip decoding, unique start Code guarantee, SEI and stream format encoding data transmission, the nal layer packs sodb into rbsp and then adds the nal header, form a NALU (Nal Unit );
H264 Network Transmission Structure
H264 transmits NALU over the network. The structure of NALU is nal header + rbsp. Data Stream in actual transmission:
The NALU header is used to identify the type of data behind the rbsp, whether it will be referenced by other frames, and whether there is an error in network transmission.
NALU header Structure
Length: 1 byte
Forbidden_bit (1bit) + nal_reference_bit (2bit) + nal_unit_type (5bit)
1. forbidden_bit: the prohibited bit. The initial value is 0. When a bit error occurs in the nal Unit, set this bit to 1 so that the receiver can correct the error or lose the unit.
2. nal_reference_bit: indicates the importance of the nal unit. The greater the value, the more important it is. When the decoder cannot complete decoding, it can discard the NALU whose importance is 0.
The importance of different types of NALU is shown in the following table.
Nal_unit_type |
NAL type |
Nal_reference_bit |
0 |
Unused |
0 |
1 |
Non-IDR slices |
This part belongs to the reference frame and is not equal to 0, If it does not belong to the reference frame, it is equal to 0 |
2 |
Partition |
Same as above |
3 |
Partition B |
Same as above |
4 |
Partition C of slice data |
Same as above |
5 |
IDR image slices |
5 |
6 |
Supplemental enhancement Information Unit (SEI) |
0 |
7 |
Sequence Parameter Set |
Non-0 |
8 |
Image parameter set |
Non-0 |
9 |
Delimiter |
0 |
10 |
Sequence end |
0 |
11 |
Code stream ends |
0 |
12 |
Fill |
0 |
13 .. 23 |
Retained |
0 |
24 .. 31 |
Not retained |
0 |
The reference frame is the frame to be referenced when decoding other frames. For example, an I frame may be referenced by one or more B frames, and a B frame may be referenced by a p frame.
We can also see from this table that the I frame of DIR is very important. If it is lost, all frames in this sequence cannot be decoded;
The sequence parameter set and the image parameter set are also very important. Without the sequence parameter set, the frames of the sequence cannot be resolved;
If there is no image parameter set, frames using this image parameter set cannot be parsed.
3. nal_unit_type: The following table lists the NALU type values.
Nal_unit_type |
NAL type |
C |
0 |
Unused |
|
1 |
Segments without data division in non-IDR Images |
2, 3, 4 |
2 |
Classification of Class A data segments in non-IDR Images |
2 |
3 |
Division of Class B data segments in non-IDR Images |
3 |
4 |
Classification of class C data segments in non-IDR Images |
4 |
5 |
IDR image slices |
2, 3 |
6 |
Supplemental enhancement Information Unit (SEI) |
5 |
7 |
Sequence Parameter Set |
0 |
8 |
Image parameter set |
1 |
9 |
Delimiter |
6 |
10 |
Sequence end |
7 |
11 |
Code stream ends |
8 |
12 |
Fill |
9 |
13 .. 23 |
Retained |
|
24 .. 31 |
Not retained (used during RTP packaging) |
|
Extended RTP package type
24 |
STAP-A |
Single-time aggregation Packet |
25 |
STAP-B |
Single-time aggregation Packet |
26 |
Mtap16 |
Multi-time aggregation Packet |
27 |
Mtap24 |
Multi-time aggregation Packet |
28 |
FU-A
|
Fragmentation Unit |
29 |
FU-B |
Fragmentation Unit |
30-31 |
Undefined |
|
Rbsp
Rbsp data is one of the following table:
Rbsp type |
Written |
Description |
Parameter Set |
PS |
The Global Information of the sequence, such as the size and video format. |
Enhanced information |
Sei |
Enhanced video sequence decoding information |
Image identifier |
PD |
Video Image Boundary |
Encoding piece |
Slice |
Encoding part header information and data |
Data Segmentation |
|
DP layer data for error recovery Decoding |
Sequence Terminator |
|
Indicates the end of a sequence. The next image is an IDR image. |
Stream Terminator |
|
Indicates that no image exists in the stream. |
Fill data |
|
Sub-meta data, used to fill in bytes |
From the previous analysis, we know that the VCL layer displays the encoded video frame data,
These frames may be I, B, and P frames, and these frames may belong to different sequences. In addition, the same sequence has a set of parameter sets and image parameter sets,
To complete video decoding, you must transmit not only the video frame data encoded at the VCL layer, but also the sequence parameter set, image parameter set, and other data.
Parameter Set: includes the sequence parameter set SPS and the image parameter set PPS
SPS include parameters for a continuous encoding video sequence, such as the identifier seq_parameter_set_id, the number of frames and POC constraints, the number of reference frames, the size of the decoded image, and the selection of the identifier in the frame field encoding mode.
PPS corresponds to one or more images in a sequence,
Its Parameters include the identifier pic_parameter_set_id, the optional seq_parameter_set_id, the ID selected by the entropy encoding mode, the number of slices, the initial quantization parameter, and the identity adjusted by the desquare filter coefficient.
Data segmentation: the encoding data of the component is stored in three independent dp (data segmentation, A, B, and C), each containing a subset of the encoding.
Split A contains the title header and the data of each Macro Block header in the video.
Split B to include the encoding residual data of the intra-frame and Si macro blocks.
Split C to include the encoding residual data of the macro block between frames.
Each split can be placed in an independent nal unit and transmitted independently.
NAL start and end
The encoder puts each nal into a group independently and completely. Because the Group has a header, the decoder can easily detect the nal division and extract the nal for decoding in sequence.
Each NAL has a start code 0x00 00 01 (or 0x00 00 00 01). The decoder detects each start code as the start ID of a nal, when the next start code is detected, the current nal ends.
At the same time, H.264 stipulates that when 0x000000 is detected, the end of the current nal can also be characterized. What should I do if data in nal is 0x000001 or 0x000000? H.264 introduces a Competition Prevention Mechanism. If the encoder detects that the nal data has 0x000001 or 0x000000, the encoder inserts a new byte 0x03 before the last byte, as shown in the following code:
0x000000-> 0x00000300
0x000001-> 0x00000301
0x000002-> 0x00000302
0x000003-> 0x00000303
When the decoder detects 0x000003, It discards 03 and restores the original data (case removal ). When decoding, the decoder first reads the nal data in bytes one by one, counts the nal length, and then starts decoding.
Order requirements of nalu
H. 264/AVC standards have strict requirements on the order of the NAL units sent to the decoder. If the order of the NAL units is chaotic, they must be re-organized according to the specifications and then sent to the decoder, otherwise, the decoder cannot be correctly decoded.
1. Serial parameter set nal Unit
It must be transmitted before all other NAL units referenced in this parameter set, but duplicate sequential parameter set NAL units are allowed between these NAL units.
The so-called repetition is explained in detail: the nal unit of the sequence parameter set has its special identifier. If the identities of the NAL units of the two sequence parameter sets are the same, it can be considered that the last one is just a copy of the previous one, rather than a new sequence parameter set.
2. nal unit of image parameter set
It must be transmitted before all other NAL units referenced in this parameter set, but duplicate image parameter set NAL units are allowed among these NAL units, this is the same as the nal unit in the above sequence parameter set.
3. slice units and Data Partition units in different basic encoding images cannot overlap in sequence, that is to say, slice is not allowed to appear in a series of slice units and Data Partition units of a basic encoded image) data Partition units.
4. influence of reference image: If one image is referenced by another image, it belongs to all slice units and Data Partition segments of the former) the Unit must follow this rule after the latter segment and the data segment, whether it is a basic encoding image or a redundant encoding image.
5. all slice units and Data Partition units of the Basic Encoding image must be in the slice units and Data Partition units of the corresponding redundant encoding image) before the unit.
6. If continuous non-reference Basic Encoding images appear in the data stream, the image sequence number is smaller than the previous one.
7. if arbitrary_slice_order_allowed_flag is set to 1, the slice unit and Data Partition unit in a basic encoding image are in any order. If arbitrary_slice_order_allowed_flag is set to zero, the sequence of the segments is determined based on the position of the first macro block in the segment. If data is used for segmentation, the data segment of Class A is before the data segment of Class B, the B-type data partition segments are before the C-type data partition segments, and the Data Partition segments corresponding to different segments cannot overlap or overlap with those without data division.
8. if there is an sei (Supplemental enhancement information) unit, it must be prior to the segment (slice) Unit of the basic encoded image corresponding to it and the Data Partition segment (Data Partition) Unit, at the same time, it must be followed by all slice units and Data Partition units of the last basic encoded image. If the SEI belongs to multiple basic encoded images, the sequence is only for reference by the first basic encoded image.
9. if an image delimiter exists, it must be before all sei units, all slice units of the basic encoded image, and data partition units, and then the NAL units of the last basic encoded image.
10. If there is a sequence Terminator and there is an image after the sequence Terminator, the image must be an IDR (instant decoder refresh) image. The position of the sequence Terminator should be prior to the data such as the separator and SEI unit of the IDR image, followed by the nal unit of the previous image. If there is no image after the sequence Terminator, it will be after all the image data in the bit stream.
11. The end of the stream in the bit stream.
Http://blog.csdn.net/newthinker_wei/article/details/8748442
Some useful concepts in http://blog.csdn.net/china_video_expert/article/details/5943271 h264
Http://blog.csdn.net/funkri/article/details/8994858
Http://blog.csdn.net/yangzhongxuan/article/details/8003535 with source code Testing
Multimedia Development --- h264 NALU syntax structure