Multimedia Development --- h264 format slice

Multimedia Development --- h264 format slice_header

Last Update:2014-08-29 Source: Internet

Author: User

Tags ranges

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Learn H.264 from slice_header

Preface:

$ H.264 I learned it in combination with standards and Bi Houjie's book. It is a headache to look at syntactic semantics. A lot of elements require patience. When introducing the semantics of an element in the standard, another variable related to it is often popped up. This variable is generally described in the previous article. But how can I remember multiple variables once? At this time, I can only go to the front to find this variable again and then read it again. No way, H. 264 this huge structure must be interlocking, and the various parts are closely linked, so we should understand h at the beginning. the main details of 264 and the relationship between them are not very easy, especially when you see a lot of unknown variables, it is inevitable that the top is big. So when taking this note, during the introduction of the syntax elements, I specifically explained all the other variables involved and recorded the structure in which they appeared, this makes it easy for you to read it by yourself.

$ For the sequence parameter set and image parameter set, you will find that many elements in these two parameter sets cannot be understood immediately. This is because you have not understood many details yet, these incomprehensible elements are mentioned and explained in the next section of the titles syntax elements. We will gradually go into details. However, before that, you should take a brief look at the two parameter sets. At the very least, you should understand their functions and try to understand the elements that do not involve details.

$ To understand the global structure of H.264 and some important details, I personally think slice_header is a good starting point. This series of notes are described in conjunction with the title structure, and will gradually involve a lot of content. When introducing some detailed operations in the standard, it is usually to write a lot of pseudo code in the calculation method, but there is no straightforward description. It is very difficult to understand it. The illustrations in Bi Houjie's book can help a lot, it will be easy to look at the picture, but some details in Bi Houjie's book are not shown (it is estimated that readers are allowed to see the standard ), in addition, some difficult-to-understand documents are also the standards for direct copying. In this series of notes, I re-Describe the confusing part at the beginning according to my current understanding. Some necessary parts will also copy the standard pseudocode, however, before copying it, I will first describe it as straightforward as possible to explain what the pseudo code is doing. I try to describe all the details involved.

$ I personally feel that at the beginning of school, I can leave no further research on redundant parts, filtering, etc. There is probably a concept. Otherwise, the more I don't understand it, the more I may get bored, the less I can learn, if you are not clear about data segmentation, you can separate them first. 264 it may be much easier to look back after you are familiar with it. Multi-part groups will be mentioned at the end of this series of notes, so you can put aside questions when encountering multiple groups in the middle.

I. Slice header Main Elements

First, describe slice_header, the syntax elements pic_parameter_set_id, frame_num, field_pic_flag, bottom_field_flag, idr_pic_id, delimiter [0], delta_pic_order_cnt [1], delimiter, and delimiter in the titles take the lead in all (strip = slice, the leading = title = slice_header) must have the same value. The following describes each element.

$ Slice_type so easy, skipped

$ Pic_parameter_set_id so easy, skipped

$ Field_pic_flag of the title, specifying whether the current image is frame encoding (0) or field encoding (1 ). This element should have the same value in all slices of the same image. This element exists in the code stream only when the frame_mbs_only_flag in the sequence parameter set is 0.

The syntactic elements frame_mbs_only_flag and mb_adaptive_frame_field_flag in the sequence parameter set are used together to determine the image encoding mode.

Frame_mbs_only_flag	Mb_adaptive_frame_field_flag	Field_pic_flag	Mode
1	Does not exist in the code stream	Does not exist in the code stream	Frame encoding
0	0	0	Frame encoding
0	0	1	Field code
0	1	0	Frame field adaptive (in this case, mbaffframeflag = 1, In other cases, the value of mbaffframeflag is 0)
0	1	1	Field code

$ First_mb_in_slice indicates the address of the first macro block in this film.

(For mbaffframeflag values, refer to the table above)

If mbaffframeflag is equal to 0, first_mb_in_slice is the address of the first macro block in the Strip, and the value of first_mb_in_slice should be in the range from 0 to picsizeinmbs-1 (including the boundary values ).

Otherwise, first_mb_in_slice * 2 is the first macro block address in the Strip. This macro block is the top Macro Block in the first macro block pair in the Strip, and the value of first_mb_in_slice should be in the range from 0 to picsizeinmbs/2-1 (including the boundary value ).

Specifically, the mbaffframeflag is specified by the mb_adaptive_frame_field_flag in the sequence parameter set. If it is equal to 1, the frame field adaptive mode is used; otherwise, the image size is not used ), it is specified by pic_width_in_mbs_minus1, pic_height_in_map_units_minus1, and other elements in the sequence parameter set (this is not described in detail here, because it involves the ing between the ING unit and the macro block, which will be mentioned ).

$ Bottom_field_flag specifies whether the current field is the top field or the bottom field. If the value is 1, the current image belongs to the base field. If the value is 0, the current image belongs to the top field. This element appears in the bitstream only when field_pic_flag exists and is 1 (indicating that the current slice belongs to a field image.

$ Frame_num and picnum (picnum is not an element of the slice header)

For non-reference frames, its frame_num value is meaningless during decoding, because the frame_num value is unique to the reference frame, its main function is to provide an identifier when the image is referenced by other images for Motion Compensation reference. But h. 264 This syntactic element is not removed from the non-reference frame image, the reason is that in the second and third methods of POC decoding, the POC value can be calculated through the frame_num value of the non-reference frame.

Frame_num indicates the frame number. That is to say, in the presence mode, the value of frame_num is the same for the top and bottom of the same field pair.

Frame_num is the identifier of the reference frame, but in the decoder, it is not a directly referenced frame_num value,

It is the variable picnum calculated by frame_num. Maxpicnum represents the maximum value of picnum,

In field mode, maxpicnum = 2 * maxframenum; otherwise, maxpicnum = maxframenum. Maxframenum is determined by log2_max_frame_num_minus4 in the sequence parameter set. Like frame_num, picnum is also embedded in a loop. When the maximum value is reached, picnum will be recalculated from 0.

Currpicnum is the picnum value of the current image. During the picnum calculation process, the picnum value of the current image is calculated directly from frame_num:

-If field_pic_flag = 0, currpicnum = frame_num.

-Otherwise, currpicnum = 2 * frame_num + 1.

$ When the gap s_in_frame_num_value_allowed_flag in the sequence parameter set is equal to 0, the frame_num of the reference frame is continuous. If it is equal to 1, if the network is blocked, the encoder can discard several encoded images, you do not need to notify the decoder separately. In this case, the decoder must have a mechanism to fill in the missing frame_num and the corresponding image. Otherwise, if the image points the motion vector to the missing image, a decoding error will occur.

$ Idr_pic_id ID of the IDR image. Different IDR images have different idr_pic_id values. In the field mode, two IDR frames have the same idr_pic_id value. The value range of idr_pic_id is [0,]. When the value exceeds this range, the count starts again in a loop.

$ PoC:

PoC refers to pic_order_cnt, indicating the playing sequence of the image. There are three calculation methods for POC. Which algorithm is used to calculate POC is specified by pic_order_cnt_type in the sequence parameter set. In the first POC algorithm, the POC value is explicitly transmitted, while the other two algorithms map the POC value through frame_num. How to calculate POC under the three algorithms is described in Section 8.2.1 decoding process of image sequence numbers in standard 2005/03.

Pic_order_cnt_lsb: When the pic_order_cnt_type in the sequence parameter set is equal to 0, this element will appear in the code stream. In the first method of POC, this element "explicitly transmits the POC value". To be precise, it is the LSB of the POC value (see standard 8.2.1.1 for details ). The log2_max_pic_order_cnt_lsb_minus4 element in the sequence parameter set specifies the maximum number of BITs encoded in pic_order_cnt_lsb.

Delta_pic_order_cnt_bottom: This element is used in the first algorithm of POC. When the frame_mb_only_flag in the sequence parameter set is not 1 (the image sequence can have both a field image and a frame image ), the two fields contained in the frame or frame field adaptive image must also have their respective POC values (For subsequent field images as reference images ). This element maps a POC value based on the decoded frame or frame field adaptive image POC and assigns it to the base field. This element has the following conditions: pic_order_cnt_type in the sequence parameter set is equal to 0 (POC is calculated using the first algorithm) the pic_order_present_flag in the image parameter set is equal to 1 (indicating that the syntax elements related to the number of image sequence will appear in the bar leading), and The field_pic_flag In the title header exists and is 0.

Delta_pic_order_cnt [0], delta_pic_order_cnt [1, because the two elements in the standard syntax table only appear when pic_order_cnt_type = 1, that is, the second POC algorithm is used. This means that the third POC algorithm is used, these two elements do not exist. since they do not exist, how can they be used in the third algorithm ). The second and third algorithms of POC are derived from the frame_num ing. Delta_pic_order_cnt [0] existence condition: the substring in the sequence parameter set is equal to 0 (equal to 1 indicates that there are no delta_pic_order_cnt [0] And delta_pic_order_cnt [1] fields in the lead of the video sequence, their values are both 0 by default), and pic_order_cnt_type = 1 (using the second POC algorithm); Existence Condition of delta_pic_order_cnt [1]: Based on the condition that delta_pic_order_cnt [0] exists, the pic_order_present_flag in the image parameter set is equal to 1 (indicating that the syntax elements related to the number of image sequences will appear in the leading position), and The field_pic_flag In the title header exists and is 0.

Note: The pic_order_present_flag in the image parameter set mentioned above is equal to 1, indicating that "syntax elements related to the number of image sequences will appear in the leading position ", however, this pic_order_present_flag does not work for all syntax elements related to the image sequence. It only works for delta_pic_order_cnt_bottom and delta_pic_order_cnt [1] In the strip (slice) header, pic_order_cnt_lsb and delta_pic_order_cnt [0] are not restricted.

$ Redundant_pic_cnt: the band and band data isolation band of the basic encoded image should be equal to 0. The redundant_pic_cnt value of the encoding or encoding band data isolation band of a redundant encoded image should be greater than 0. When redundant_pic_cnt does not exist in the bit stream, it should be assumed that its value is 0. The redundant_pic_cnt value should be within the range of 0 to 127; each redundant encoded image has a corresponding BASIC encoded image; For the redundant encoded image encoding strip (or data segmentation ), the image parameter set specified by pic_parameter_set_id must have the same pic_order_present_flag value as the image parameter set specified by the encoding band of the corresponding basic encoding image; the standard uses nearly one page to introduce this element, and there are a lot of content. I don't want to worry about the redundant image, but I still don't want to read it first. The more I see it, the more I am confused, first, you probably know that there is such a thing. 264 it may be much easier to look back.

Element existence condition: the value of redundant_pic_cnt_present_flag in the image parameter set is equal to 1, indicating that the redundant_pic_cnt syntax element will appear in the leading bar and in the image parameter set (directly or associated with the corresponding data split Block) (B ). For data segmentation, when this condition is met, not only will redundant_pic_cnt appear in the code stream separated by a (not directly, but in the title structure of split ), it also appears (directly) in the corresponding B and C segments ).

$ Direct_spatial_mv_pred_flag indicates whether to use time prediction or Space Prediction in the direct prediction mode of B images. 1: Spatial Prediction (brightness motion vectors B _skip, B _direct_16x16, and

B _direct_8x8 will use the spatial guidance mode as expected); 0: Time Prediction (brightness motion vectors B _skip, B _direct_16x16, and B _direct_8x8 will use the temporary guidance mode as expected ).

Condition: slice_type = B in the title, that is, the current slice_type is B.

$ The number of reference frames in list0 and list1.

The num_ref_idx_l0_active_minus1 and num_ref_idx_l1_active_minus1 elements indicate the number of reference frames. num_ref_idx_l0_active_minus1 indicates the maximum reference index number of the reference image list 0, this index number will be used to decode the strip of the image when num_ref_idx_active_override_flag is 0 or equal to 0 using the list 0 prediction. When the mbaffframeflag is equal to 1 (frame field adaptive), num_ref_idx_l0_active_minus1 is the maximum index number of frame Macro Block decoding, and 2 * Second + 1 is the maximum index number of Macro Block decoding. (In frame field adaptive mode, whether a macro block is a frame macro block or a field macro block is specified by the mb_field_decoding_flag element in slice_data syntax, each macro block can specify its own frame/field mode ). The value of num_ref_idx_l0_active_minus1 should be in the range of 0 to 31. Num_ref_idx_l1_active_minus1 has similar meanings and rules. It indicates the maximum reference index number in list1.

To make certain images more flexible, you can reload these two elements in the lead.

Num_ref_idx_active_override_flag: this element in the lead bar is used to determine whether to reload the two elements. If this element is reloaded, the lead and num_ref_idx_l1_active_minus1 elements will appear again, they will overwrite the values in the image parameter set.

$ Ref_pic_list_reordering () Reference Frame reordering. This syntax project is nested in the entry lead and is a sub-project of the entry lead.

$ Pred_weight_table () prediction weighted table. This syntax project is nested in the entry lead and is a sub-project of the entry lead. This project has the following conditions:

1. If the current slice is P or SP, that is, slice_type = p | slice_type = Sp:

If the weighted_pred_flag in the image parameter set is 1 (indicating that weighted prediction should be used in the P and SP bands), pred_weight_table () exists.

2. If the current slice_type is B, that is, slice_type = B:

If weighted_bipred_idc = 1 in the image parameter set, pred_weight_table () exists. For weighted_bipred_idc, if it is equal to 0, it indicates that the B band should adopt the default weighted prediction; if it is equal to 1, it indicates that the B band should adopt the specified weighted prediction. Only in this case pred_weight_table () exists; if the value is equal to 2, the B-band should adopt implicit weighted prediction. (Question: How does one feel the meaning of "default" and "hidden ?). The value of weighted_bipred_idc should be between 0 and 2 (including 0 and 2 ).

$ Dec_ref_pic_marking () decoded reference image ID. This syntax project is nested in the entry lead and is a sub-project of the entry lead. This project has the following conditions: nal_ref_idc in the nal unit is not 0. When nal_ref_idc is not 0, it indicates that the nal unit contains a sequence parameter set, an image parameter set, a reference image band, or a reference image band data segment. Because this syntax item is included in the title, the current nal must be a piece or piece of data segmentation. That is to say, this syntax item has the following conditions: the current nal contains the strip or strip data of a reference image.

$ The three subsyntax items mentioned above will be detailed later.

$ Cabac_init_idc indicates the sequence number of a table used in the cabac calculation process. It indicates the sequence number of the initialized table used in the initialization process of the associated variables, ranging from 0 to 2. (It doesn't matter if you don't know much about it. You just need to know about it when you learn cabac ).

Element existence condition: the entropy_coding_mode_flag in the image parameter set is equal to 1 (indicating cabac encoding), and slice_type! = I & slice_type! = Si (indicating that the current slice is not an I or Si slice ).

$ Slice_qp_delta indicates the initial value of the quantization parameter used for all macro blocks of the current chip. This element is used for normal frames (non-Si and SP frames ), they quantify the coefficients after the predicted residual transformation. Qpy.

Sliceqpy = 26 + pic_init_qp_minus26 + slice_qp_delta

Qpy ranges from 0 to 51. It indicates the quantitative spacing.

The Quantization parameters in H.264 are given at three layers: Image parameter set, title, and macro block header. The first two layers each have an offset value, which is the offset of the slice.

Pic_init_qp_minus26 is in the image parameter set.

$ Slice_qs_delta is similar to slice_qp_delta in semantics and is used in Si and SP (both slices are used to quantify the coefficients after directly transforming the predicted values and actual values, instead of quantization the coefficient after the residual difference value transformation ).

Qsy = 26 + pic_init_qs_minus26 + slice_qs_delta

The qsy value ranges from 0 to 51.

Pic_init_qs_minus26 is in the image parameter set.

Different from normal frames, when encoding and decoding Si and SP frames, two sets of quantization coefficients are required (of course these two sets of coefficients can be the same ): the quantization parameter spqp for the prediction reconstruction block coefficient and the quantization parameter pqp for the prediction residual coefficient. When the front part is Si or SP, this element corresponds to spqp, while pqp corresponds to slice_qp_delta.

This element has the following conditions: slice_type = Sp | slice_type = Si (the current element is SP or Si ).

$ Sp_for_switch_flag indicates whether the P Macro Block in the SP frame is in switching mode. What is the switching mode? Well, I don't understand either. Record the issue and wait for the SP frame to be learned. There is no need to worry too much about it now.

Condition: slice_type = Sp (the current slice is an SP slice ).

$ Slice_group_change_cycle. Slice_group_change_rate_minus1, you can obtain the number of ing units in the segment group 0.

The specific meaning of the ING unit will be mentioned later in FMO. Now you can jump to the ING unit definition here.

The number of ing units in group 0 is obtained from the following formula.

Mapunitsinslicegroup0 = min (slice_group_change_cycle * slicegroupchangerate, picsizeinmapunits)

Slice_group_change_cycle is represented by Ceil (log2 (picsizeinmapunits into slicegroupchangerate + 1) bits. The range of the slice_group_change_cycle value is 0 to Ceil (picsizeinmapunits into slicegroupchangerate ).

Here, slicegroupchangerate can be obtained by slice_group_change_rate_minus1;

Picsizeinmapunits = picwidthinmbs * picheightinmapunits. The two numbers on the right of the equal sign are obtained from the pic_width_in_mbs_minus1 and values in the sequence parameter set. For details, see the description of the sequence parameter set in; the Ceil (x) function returns the smallest integer greater than or equal to X.

Element existence condition: num_slice_groups_minus1> 0 in the image parameter set, and slice_group_map_type in the image parameter set are 3, 4, or 5.

$ Filtering:

H.264 specifies a set of algorithms that can independently calculate the filtering intensity of each boundary in the image on the decoder side for filtering. In addition to independent decoder computing, encoder can also transmit syntactic elements to interfere with the filtering intensity.

Disable_deblocking_filter_idc: indicates whether the operation of The deblocking effect filter will be discarded when it passes through some blocks of the Strip edge, and specifies the edge to which the filter is discarded. When disable_deblocking_filter_idc does not exist in the lead, the default value is 0. The value of disable_deblocking_filter_idc should be in the range of 0 to 2 (including 0 and 2 ). Element existence condition: deblocking_filter_control_present_flag in the image parameter set is equal to 1 (partition equal to 1 specifies that a set of synt axelements controlling the characteristics of the deblocking filter is present inthe slice header .).

Slice_alpha_c0_offset_div2: returns the offset value used to enhance α and tc0.

Filteroffseta = slice_alpha_c0_offset_div2 <1

The range of slice_alpha_c0_offset_div2 is-6 to + 6.

Slice_beta_offset_div2: returns the offset value used to enhance β and tc0.

Filteroffsetb = slice_beta_offset_div2 <1

The range of slice_beta_offset_div2 is-6 to + 6.

Conditions for the existence of these two elements: disable_deblocking_filter_idc! = 1. For these elements, you only need to know that they are related to the square filter. after learning the square filter, the meaning of these elements will naturally be understood.

Http://dl.dbank.com/c0fry4rukk Bi Houjie-new generation video compression coding standard h264

Http://dl.dbank.com/c0ng2niprl# h264 official Chinese Version

Http://dl.dbank.com/c0dnp0ne0j# elecard streameye tools 2.9.1.70328.zip, video analysis tools

Multimedia Development --- h264 format slice_header

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More