Reference connection: http://blog.csdn.net/dxpqxb/article/details/7631304
H264 Nalu (NAL unit) to support the transmission of encoded data in a packet-switched technology-based network.
NALU defines the basic format that can be used for packet-based and bitstream-based systems, while giving out information, thus providing video encoding and interface to the outside world.
H264 three different forms of data in the encoding process:
Sodb Data bit string--> the most primitive encoding data, namely VCL data;
RBSP raw byte sequence load--> the end bit (rbsp trailing bits a bit "1") is added to the back of the Sodb for byte alignment;
EBSP Extended byte sequence payload-----rbsp byte (0X03) is added to the base of this: When adding Nalu to Annexb, you need to add the start code NALU before each set of Startcodeprefix, If the Nalu corresponding slice is the beginning of a frame, it is represented by a 4-bit byte, ox00000001, otherwise the ox000001 is represented by a 3-bit byte (part of a frame). In addition, in order for the Nalu body to not include the start code conflict, at the time of encoding, each encounter two bytes consecutive 0, insert a byte of 0x03. 0X03 is removed when decoding. Also known as shelling operations.
Encoding process:
1. Packaging the Sodb of the VCL output into Nal_unit,nalu is a general encapsulation format, which can be applied to the sequential byte stream and the IP packet switching mode.
2. For different transport networks (circuit-switched | packet switching), the Nal_unit is packaged into a package format for different networks (such as encapsulating Nalu into RTP packets).
---------------------------------------------------
Process one, VCL data encapsulated into Nalu
---------------------------------------------------
the bitstream sodb output from the VCL layer (String of Data bits), between Nal_unit, The following three steps are processed:
1.SODB byte alignment is encapsulated into rbsp (Raw byte Sequence Payload).
2. In order to prevent the rbsp byte-stream from being in an SCP (start_code_prefix_one_3bytes,0x000001) with an ordered stream of bytes, the cycle detects the first three bytes of the rbsp, In the case of byte contention, add Emulation_prevention_three_byte (0x03) before the third byte, as follows:
Nal_unit (numbytesinnalunit) {
Forbidden_zero_bit
NAL_REF_IDC
Nal_unit_type
numbytesinrbsp = 0
for (i = 1; i < Numbytesinnalunit; i++) {
if (i + 2 < numbytesinnalunit && next_bits () = 0x000003) {
rbsp_byte[NUMBYTESINRB sp++]
rbsp_byte[numbytesinrbsp++]
i + = 2
Emulation_prevention_three_byte
} else
rbsp_byte[numbytesinrbsp++]
}
}
3. anti-byte contention processed rbsp plus one byte header (forbidden_zero_bit + nal_ref_idc+ Nal_unit_type), encapsulated into Nal_unit.
------------------------------------------------
Process two, RTP packaging for Nalu
------------------------------------------------
One, Nalu packaged into RTP in three ways:
1. Single NAL unit mode
That is, a RTP packet consists of only one complete nalu. In this case, the RTP NAL header Type field and the original H.
The Nalu header Type field is the same.
2. Combo Packet mode
This may be a RTP package consisting of multiple NAL units. There are 4 combinations: stap-a, Stap-b, MTAP16, MTAP24.
The type values here are 24, 25, 26, and 27, respectively.
3. Shard Packet Mode
Used to encapsulate a NALU unit into multiple RTP packets. There are two types of fu-a and fu-b. The type values are 28 and 29, respectively.
Remember the definition of the previous nal_unit_type, 0~23 is for H264 use, 24~31 unused, in RTP packaging, if a nalu placed in a RTP packet, you can use Nalu Nal_unit_type, However, when you need to package multiple Nalu into a single RTP package, or if you need to package a NALU into multiple RTP packets, define a new type to identify it.
Type Packet type name
---------------------------------------------------------
0 undefined-
1-23 NAL Unit single NAL unit packet per h.
Stap-a Single-time Aggregation Packet
Stap-b Single-time Aggregation Packet
MTAP16 multi-time Aggregation Packet
MTAP24 multi-time Aggregation Packet
Fu-a Fragmentation Unit
Fu-b Fragmentation Unit
30-31 undefined
Two or three ways to package the specific format
1. Single NAL unit mode
For NALU packets with a length less than the MTU size, a single NAL cell pattern is generally used.
For an original H. Nalu unit is usually composed of [start code] [Nalu Header] [Nalu Payload] three parts, where Start code is used to indicate that this is a
The start of the Nalu unit must be "00 00 00 01" or "XX", Nalu only one byte, followed by Nalu unit content.
When packaging, remove the "00 00 01" or "00 00 00 01" Of the start code, the other data packets to the RTP packet.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| F| nri| Type | |
+-+-+-+-+-+-+-+-+ |
| |
| Bytes 2..N of a single NAL unit |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :... OPTIONAL RTP Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
If there is a nalu of H. S:
[xx] A0 1E 0E 2F ...]
This is a sequence parameter set of NAL cells. [00 00 00 01] is the start code of four bytes, 67 is the NALU header, and 42 begins with the Nalu content.
Package into RTP package will be as follows:
[RTP Header] [A0 1E 0E 2F]
That is, just get rid of the 4-byte start code.
2 Combo Packet mode
Second, when the length of the Nalu is special, several NALU units can be sealed in a RTP packet.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RTP Header |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Stap-a NAL HDR | Nalu 1 Size | Nalu 1 HDR |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Nalu 1 Data |
: :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | Nalu 2 Size | Nalu 2 HDR |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Nalu 2 Data |
: :
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :... OPTIONAL RTP Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
3 Fragmentation Units (FUs).
When the length of the Nalu exceeds the MTU, the Nalu unit must be partitioned into packets. Also known as Fragmentation Units (FUs).
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| FU Indicator | FU Header | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| |
| FU Payload |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :... OPTIONAL RTP Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 14. RTP Payload format for FU-A
FU indicator has the following format:
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
| F| nri| Type |
+---------------+
Fu indicates the type of byte of the field type=28 represents Fu-a. The value of the NRI field must be set according to the value of the NRI domain of the Shard nal cell.
The format of the FU header is as follows:
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
| s| e| r| Type |
+---------------+
S:1 bit
When set to 1, the start bit indicates the start of the Shard nal unit. When the following FU load is not the beginning of the fragment NAL unit load, the starting bit is set to 0.
E:1 bit
When set to 1, the end bit indicates the end of the Shard nal cell, that is, the last byte of the load is also the last byte of the Shard nal cell. When the following FU load is not the last shard of the Shard nal unit, the end bit is set to 0.
R:1 bit
The reserved bit must be set to 0 and the receiver must ignore the bit.
Type:5 bits
Three, unpacking and settlement package
Unpacking: When the encoder needs to be encoded in the original nal according to Fu-a, the original NAL unit head and the fragment of the Fu-a cell header has the following relationship:
The first three bits of the original NAL head are the first three bits of the FU indicator, the last five bits of the original NAL head are the last five digits of the FU header, and the remaining digits of the FU indicator and FU headers are determined according to the actual situation.
Unpacking: When the receiving end receives Fu-a shard data, it is necessary to restore all the Shard package combinations to the original NAL package, the relationship between the Fu-a cell header and the restored nal is as follows:
The eight bits of the restored NAL head are made up of the last five bits of FU indicator's first three FU headers, namely:
Nal_unit_type = (Fu_indicator & 0xe0) | (Fu_header & 0x1f)
Hierarchical structure of H264 elements
In the bit stream of the encoder output, each bit is subordinate to a syntactic element. Syntactic elements are organized into hierarchical structures that describe each level of information.
In H. Five, the syntactic elements are organized into sequences, images, slices, macro blocks, and Submacro blocks. In such a structure, the head of each layer and its data part form a strong dependency relationship between management and management, the syntactic element of the head is the core of this layer of data, and once the head is lost, the data part of the information is almost impossible to decode correctly , especially in the sequence layer and image layer.
The biggest difference in the hierarchical structure is that the sequence layer and the image layer are eliminated, and most of the syntactic elements that originally belong to the sequence and the head of the image are free to form sequence and image two-level parameter set, and the remainder is put into the slice layer.
The parameter set is a separate unit of data that does not depend on other syntactic elements outside the parameter set. One parameter set does not correspond to a particular image or sequence, and the same sequence parameter set can be referenced by multiple sets of image parameters, so the same set of image parameters can be referenced by multiple images. A new set of parameters is emitted only when the encoder considers it necessary to update the contents of the parameter set.
Data units that may appear in a stream in a complex communication:
IDR: The image is organized in sequence in H. the first image of a sequence is called an IDR image (immediately refreshes the image), and the IDR image is an I-frame image. The IDR image is introduced in order to decode the resynchronization, when the decoder decoding to the IDR image, immediately empty the reference frame queue, the decoded data are all output or discard, re-find the parameter set, start a new sequence. Thus, if there is a significant error in the previous sequence, chances of resynchronization can be obtained here. The image after the IDR image is never decoded using the data from the image before the IDR. the IDR image must be an I image, but the I image is not necessarily an IDR image. An image after the I-frame may use the image before the I-frame to do a motion reference.
H264 Code Stream structure
1. H264 Layered StructureThe code flow structure defined by h.263 is a hierarchical structure with a total of four layers. Top-down are: Image layer (picturelayer), block layer (GOB layer), macro block layer (macroblock layer), and block layer. Compared with h.263, the code flow structure of H. H.263 is very different from that of the three, and it is no longer a strict hierarchical structure. The function of H.two layer, the Video Coding layer (VCL) and the network extraction layer (NAL) VCL data is compressed encoded video data sequence. The VCL data can be used for transmission or storage only after it is encapsulated in the NAL unit. The nal cell format [2] is shown in table 1:
| table 1 nal cell format |
| nal header |
rbsp |
nal head |
rbsp |
RBSP: The data encapsulated in the network abstraction unit is called the primitive byte sequence load rbsp, which is the basic transmission unit of the NAL. Among them, rbsp is divided into video coding data and control data. The basic structure is that the end bits are added after the original encoded data. A bit "1" of several bits "0" for byte alignment.
Types of rbsp:
One of the RBSP types PS: Includes the sequence parameter set SPS and the image parameter set PPS
SPS contains parameters for a sequential encoded video sequence, such as identifier seq_parameter_set_id, frame number and POC constraints, number of reference frames, decoding image size and frame field encoding pattern selection identification, and so on.
PPS corresponds to a sequence of an image or a number of images, whose parameters such as identifier pic_parameter_set_id, optional seq_parameter_set_id, Entropy coding mode selection of identification, chip groups, The initial quantization parameters and the de-block filter coefficients are adjusted to identify and so on.
Nalu Type
Identifies the RBSP data type in the Nal cell, wherein the NAL unit Nal_unit_type is 1, 2, 3, 4, 5, and 12 is called a vcl nal unit, and other types of nal units are non-VCL nal units.
0: Not Specified
1: Fragments that do not use data partitioning in non-IDR images
2: Segment A in non-IDR images
3: Division of Class B data in non-IDR images
4: Division of Class C data in non-IDR images
Fragments of the 5:IDR image
6: Supplemental Enhancement Information (SEI)
7: Set of sequence parameters
8: Set of image parameters
9: Separator
10: Sequence Terminator
11: Stream Terminator
12: Populating data
13–23: Reserved
24–31: Not Specified
2. Code Flow structure diagramThrough the relevant knowledge of the review, summed upthe code flow structure of H. [2]1:
| Figure 1 The code flow structure of H. |
Application of 3-H code stream analysisIn some cases, information such as the width of the image and the height of the image can be obtained directly from a stream of H. The following is a description of how to obtain relevant information: Image information is stored in the Network extraction layer (NAL) of the RBSP structure, to obtain information about the image, we need to obtain the relevant bits of the image. According to the RBSP structure, to obtain the PIC_WIDTH_IN_MBS_MINUS1 and Pic_height_in_map_units_minus1 two values, then the width is (pic_width_in_mbs_minus1+1) *16, the height is ( pic_height_in_map_units_minus1+1) *16, but some cases have to consider the value of nnum_ref_frames, typically 1.
3.1 Obtaining test data Device: Sunnic (IP Cam) Name: St100factoryfirmware version: P8B8 video format: H. 1 The device resolution is set to 176*144, a set of data is captured using the ethereal and other grab tools, and after the corresponding RTP header is removed, This data is 0x00,0x00,0x00,0x01,0x67,0x42,0x00,0x1e,0x99,0xa0,0xb1,0x31. (2) The device resolution is set to 720*240, using the ethereal and other grasping kit tool to grasp a set of data, and remove the corresponding RTP header, the data is 0x00,0x00,0x00,0x01,0x67,0x42,0xe0,0x1e,0xda,0x82, 0xd1,0xf1. (3) The device resolution is set to 720*480, using the ethereal and other grasping kit tool to grasp a set of data, and remove the corresponding RTP header, the data is 0x00,0x00,0x00,0x01,0x67,0x42,0xe0,0x1e,0xdb,0x82, 0xd1,0xf1.
The code flow and frame structure of H.