http://blog.csdn.net/jefry_xdz/article/details/8461343
1, nal full name Network abstract layer, that is, the net abstraction level.
In the H.264/AVC video coding standard, the whole system framework is divided into two levels: Video coding level (VCL) and network abstraction level (NAL). The former is responsible for effectively representing the content of the video data, while the latter is responsible for formatting the data and providing the header information to ensure that the data suitable for the transmission of various channels and storage media. So our usual data per frame is a NAL unit (except SPS and PPS). In the actual H264 data frame, the frame is often preceded by a 00 00 00 01 or 00 00 01 separator, generally the first frame data of the encoder is PPS and SPS, followed by the I-frame ...
Such as:
2, how to determine the frame type (image reference frame or I, p frame, etc.)?
The Nalu type is a powerful tool for judging the type of frame, which is drawn from official documents such as:
We still go to the top figure of the code stream corresponding to the data to layer analysis, 00 00 00 01 After the next byte is the Nalu type, converted to binary data, the order of interpretation from left to right, as follows:
(1) 1th bit forbidden bit, value 1 indicates syntax error
(2) the 2nd to 3rd position is the reference level
(3) 4th to 8th is the NAL unit type
For example, there are 67,68 and 65 after 00000001.
Where the 0x67 binary code is:
0110 0111
4-8 = 00111, decimal 7, reference second Picture: 7 corresponding sequence parameter set SPS
Where the 0x68 binary code is:
0110 1000
4-8 is 01000, to decimal 8, refer to the second Picture: 8 corresponding image parameter set PPS
Where the 0x65 binary code is:
0110 0101
4-8 is 00101, to decimal 5, refer to the second Picture: 5 for the slices in the IDR image (I-frames)
So the algorithm for determining if the I-Frame is: (Nalu type & 0001 1111) = 5 = Nalu type & 31 = 5
such as 0x65 & 31 = 5
http://blog.csdn.net/evsqiezi/article/details/8492593
Frame format
The H264 frame consists of the Nalu head and the Nalu body.
The Nalu header consists of a byte with the following syntax:
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
| F| nri| Type |
+---------------+
F:1 a bit.
Forbidden_zero_bit. This one must be 0, as stipulated in the H.
Nri:2 a bit.
NAL_REF_IDC. Take 00~11, seems to indicate the importance of this nalu, such as 00 of the Nalu decoder can discard it without affecting the playback of the image, 0~3, the greater the value, the more important to indicate that the current nal, need priority to be protected. This syntactic element must be greater than 0 if the current nal is a slice of the reference frame, or a sequence parameter set, or an important unit of an image parameter set.
Type:5 a bit.
Nal_unit_type. The type of this nalu unit, 1~12 used by H. 24~31, is used by applications other than H.
0 Not defined
1-23 NAL Unit single NAL unit package
1 not partitioned, non-IDR image slices
2 piece partition A
3 piece partition B
4 slices of partition C
5 The slices in the IDR image
6 Supplemental Enhancement Information Unit (SEI)
7 SPS
8 PPS
9 Sequence End
10 Sequence End
11-yard-stream sleepover
12 Filling
13-23 reserved
Stap-a Single-time combo package
Stap-b Single-time combo pack
MTAP16 Multi-time combo pack
MTAP24 Multi-time combo pack
Fu-a Shards of cells
Elements of Fu-b shards
30-31 Not defined
AUD
The general document does not describe the AUD, in fact, this is a frame beginning of the flag, byte order is: xx xx f0
Structurally, there is a start code, so it is really a nalu, and type 09 in the H264 definition is the AUD (splitter). Most players can play normally without the AUD.
Immediately following the AUD, usually a combination of SPS/PPS/SEI/IDR or simply a slice, that is, the beginning of a frame. For players like Flash, each time a full frame data is required, the data between the 2 AUD is packaged in the format of the player.
When encoding, the 0x000001 in the code stream, the decoder detects the starting code, and the current nal is ended by adding the starting code before each nal. In order to prevent the nal inside the 0x000001 data, H. Two also put forward the "Prevent competition emulation prevention" mechanism, when the code is completed, if there is a nal, if there is a continuous number of 0x00 bytes detected, a 0x03 is inserted in the back. When the decoder detects the 0x000003 data inside the NAL, it discards the 0x03 and restores the original data.
0x000000 >>>>>> 0x00000300
0x000001 >>>>>> 0x00000301
0x000002 >>>>>> 0x00000302
0x000003 >>>>>> 0x00000303
In general, there are two ways to package the H264 stream, one for annex-b byte stream format, which is the default output format for most encoders, that is, the four-and-five bytes at the beginning of each frame are the start_code of H264. 0x00000001 or 0x000001.
The other is the original NAL packaging format, that is, the beginning of a number of bytes (1,2,4 bytes) is the length of nal, rather than Start_code, at this time must rely on a global data to obtain the encoder Profile,level,pps,sps and other information can be decoded.
Analysis of Sps,pps
Sps
PROFILE_IDC and LEVEL_IDC refer to the configuration and level that the bitstream adheres to.
Constraint_set0_flag equals 1 refers to the bitstream complying with all the provisions in a section. Constraint_set0_flag equals 0 means that the bitstream may or may not comply with all the provisions in a section. When PROFILE_IDC equals 100, 110, 122, or 144, Constraint_set0_flag, Constraint_set1_flag, and constraint_set2_flag should be equal to 0.
The value of LOG2_MAX_FRAME_NUM_MINUS4 should be in the range of 0-12 (including 0 and 12), this syntactic element is mainly for reading another syntactic element Frame_num service, Frame_num is one of the most important syntactic elements, it identifies the decoded order of the image to which it belongs. This syntactic element also indicates the maximum value that the Frame_num can achieve: Maxframenum = 2*exp (LOG2_MAX_FRAME_NUM_MINUS4 + 4).
Pic_order_cnt_type refers to the counting method of decoding image order. The value range for Pic_order_cnt_type is 0 to 2 (including 0 and 2).
LOG2_MAX_PIC_ORDER_CNT_LSB_MINUS4 represents the value of the variable MAXPICORDERCNTLSB in the decoding process for the number of image sequences specified in a section,
The num_ref_frames specifies the maximum number of short-term reference and long-range reference frames, complementary reference field pairs, and unpaired reference fields that may be used during the decoding of any image frame between frames in a video sequence. The value of the Num_ref_frames should range from 0 to maxdpbsize.
Gaps_in_frame_num_value_allowed_flag represents the allowable value of the frame_num given in a section and the decoding process that occurs when there is a speculative discrepancy between the frame_num values given in a section.
Pic_width_in_mbs_minus1 plus 1 refers to the width of each decoded image with a macro block as a unit.
The semantics of PIC_HEIGHT_IN_MAP_UNITS_MINUS1 depend on the variable frame_mbs_only_flag, as follows:--if Frame_mbs_only_flag equals 0,
Pic_height_in_map_units_minus1 plus 1 represents the height of a macro block. --otherwise (Frame_mbs_only_flag equals 1), Pic_height_in_map_units_minus1 plus 1 means
The height of a frame in a macro block. The variable frameheightinmbs is derived from the following formula: Frameheightinmbs = (2–frame_mbs_only_flag) * picheightinmapunits.
Mb_adaptive_frame_field_flag equals 0 means there is no exchange between the frame and the field macro block of an image. Mb_adaptive_frame_field_flag equals 1 indicates that there may be an interchange between the field macro blocks within the frame and frame. When Mb_adaptive_frame_field_flag is not specifically specified, the default value is 0.
Direct_8x8_inference_flag represents the method used in the calculation of the B_skip, b_direct_16x16, and b_direct_8x8 luminance motion vectors specified in a section. When Frame_mbs_only_flag equals 0 o'clock
Direct_8x8_inference_flag should be equal to 1.
Frame_cropping_flag equals 1 indicates that the frame clipping offset parameter conforms to the next value in the video sequence parameter set. Frame_cropping_flag equals 0 means there is no frame cut offset parameter.
Vui_parameters_present_flag equals 1 indicates the existence of the vui_parameters () syntax structure as mentioned in Appendix E. Vui_parameters_present_flag equals 0 means there is no vui_parameters () syntax structure as mentioned in Appendix E.
PPS
SEQ_PARAMETER_SET_ID refers to the set of sequence parameters for an activity. The value of the variable seq_parameter_set_id should be within the range of 0 to 31 (including 0 and 31).
Entropy_coding_mode_flag is used to select the entropy encoding of grammatical elements, represented by two identifiers in the syntax table, as follows: If Entropy_coding_mode_flag equals 0, the method specified by the left-hand descriptor in the syntax table is used.
Pic_order_present_flag equals 1 indicates that the syntax elements associated with the number of image orders appear in the lead, and Pic_order_present_flag equals 0 means that no grammatical elements related to the number of image orders will appear in the lead.
Num_slice_groups_minus1 plus 1 represents the number of stripe groups in an image. When Num_slice_groups_minus1 equals 0 o'clock, all the bands in the image belong to the same stripe group.
NUM_REF_IDX_L0_ACTIVE_MINUS1 represents the maximum reference index number for the reference image list 0, which will be used to num_ref_idx_active_override_flag a stripe equal to 0 in an image using the list 0 forecast. Decode these bands of the image. When Mbaffframeflag equals 1 o'clock, NUM_REF_IDX_L0_ACTIVE_MINUS1 is the maximum number of index numbers decoded by the frame macro block, and 2 *num_ref_idx_l0_active_minus1 + 1 is the maximum index value decoded by the field macro block. The value of NUM_REF_IDX_L0_ACTIVE_MINUS1 should be within the range of 0 to 31 (including 0 and 31).
Weighted_pred_flag equals 0 indicates that a weighted forecast does not apply to P and SP bands. Weighted_pred_flag equals 1 indicates that a weighted prediction should be used in the P and SP bands.
WEIGHTED_BIPRED_IDC equals 0 means that the B band should use the default weighted forecast. WEIGHTED_BIPRED_IDC equals 1 indicates that the B-band should use a specific weighted forecast. A weighted_bipred_idc equal to 2 means that the B band should adopt an implied weighted prediction.
The value of WEIGHTED_BIPRED_IDC should be between 0 and 2 (including 0 and 2).
PIC_INIT_QP_MINUS26 indicates that the Sliceqpy initial value of each stripe is reduced by 26. When decoding a slice_qp_delta that is not a value of 0, the initial value is corrected in the stripe layer and further corrected when the macro block layer decodes Mb_qp_delta that are not 0 values. The value of PIC_INIT_QP_MINUS26 should be between-(+ + qpbdoffsety) to +25 (including boundary values).
PIC_INIT_QS_MINUS26 indicates that the Sliceqsy initial value of all macro blocks in the SP or Si stripe is reduced by 26. When decoding a slice_qs_delta that is not a 0 value, the initial value is corrected in the stripe layer. The value of the PIC_INIT_QS_MINUS26 should be between 26 and +25 (including the boundary value).
Chroma_qp_index_offset represents the offset on the parameter qpy and qsy for finding the CB Chroma component in a table of QPC values. The value of Chroma_qp_index_offset should be within the range of 12 to +12 (including boundary values).
Deblocking_filter_control_present_flag equals 1 indicates that a set of syntax elements that control the characteristics of the de-block effect filter will appear in the strip lead. Deblocking_filter_control_present_flag equals 0 indicates that a set of grammatical elements that control the characteristics of the de-blocking effect filter will not appear in the lead, and their presumed values will take effect.
Constrained_intra_pred_flag equals 0 means that the prediction within the frame allows the use of residual data, and the prediction of a macro block encoded using the intra-frame macro Block Prediction mode can use the decoded sample values of adjacent macro blocks encoded by the inter-frame macro block prediction mode. Constrained_intra_pred_flag equals 1 for constrained intra-frame prediction, in which case the prediction of macro blocks encoded using the intra-frame macro block Prediction mode uses only the residual data and decoding sample values from the I or SI macro block types.
Redundant_pic_cnt_present_flag equals 0 means that the REDUNDANT_PIC_CNT syntax element does not appear in the strip leader, in the image parameter set (directly or in association with the corresponding data partition block a), and in data splitting block C. Redundant_pic_cnt_present_flag equals 1 means that the REDUNDANT_PIC_CNT syntax element will appear in the strip leader, in the image parameter set (directly or associated with the corresponding data splitting block a), the data partition block B and the data splitting block C.
Sub-Package
When the H264 package is in transit, it will be divided into multiple slices if the package is too large. The Nalu head will be replaced by the following 2.
The FU indicator octet has the following format:
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
| F| nri| Type |
+---------------+
Don't be intimidated by the name. This format is the RTP H264 payload type mentioned above, type Fu-a
The FU header has the following format:
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
| s| e| r| Type |
+---------------+
An S bit of 1 means that the nal of the Shard begins, and when it is 1 o'clock, E cannot be 1
The E bit is 1 for the end, when it is 1,s cannot be 1
R bit reserved bit
Type is the type in the Nalu header, which takes 1-23 of the value
The standard of advanced video coding
http://blog.csdn.net/gl1987807/article/details/11945357
The standard of the advanced video compiling code of H.
Guo Qichang/Institute of Electrical and Industrial Research Institute
1. Preface
In December 2001, ITU-T VCEG together with ISO MPEG to form a joint video group (Joint video TERM,JVT) to develop a new compression format for videos, which is referred to in the ITU-T organization as H. I, and is incorporated in the ISO organization MPEG-4 PART-10 (ISO/IEC 14496-10) and named Advanced Videocoding (AVC), often combined as H.264/AVC [1], the first edition of its international standard was published in 2003, and the second edition of the revision was finalized in March 2005. Studies have shown that H.264/AVC, both in terms of compression ratio or video quality, are significantly increased [2] compared to MPEG-2 and MPEG-4, and H.264/AVC also for the first time the video coding layer (video Coding LAYER,VCL) and the Network extraction layer ( The concept of network abstraction Layer,nal is covered, the previous video standards focus on the Compression performance section, and H.264/AVC contains an built-in NAL network protocol adaptation layer, with NAL to provide the state of the network, It allows VCL to have better coding flexibility and error correction, making the H.264/AVC ideal for multimedia streaming (multimedia streaming) and mobile TV applications. In the first edition of the standard specification, H.264/AVC provides three encoding sizes (profiles) based on the type of coding tool used, as shown in table 1, Baseline profile, Main profile, Extension profile, respectively, The corresponding film size and bit rate level is from levels 1 to 5.1, which covers the range of applications for small and high resolution images. Baseline profile is mainly focused on low bit rate applications (for example: Image communication), and its low computational complexity, so it is also suitable for use in personal portable multimedia dial-and-drop machine; Main profiles because of the support for interlaced movies (interlaced content) encoding, so suitable for HDTV digital TV broadcasts, and very easy to integrate in the traditional MPEG-2 Transport/program stream to transmit H.264/AVC bitstream; for Ip-tv or mod (multimedia on Demand) applications, these requirements can be met with extension profiles with high-error-resistant coding tools (err resilient tools). However, Microsoft Inc. in 2003 to its video compression technology to the U.S. Film and television engineersThe Association (Society of Motion Picture and television Engineers,smpte) presents a publicly standardized application, with VC-1 (Video Codec 1) named for this new standard [3], Due to the excellent performance of VC-1 in high resolution films, H.264/AVC was association in DVD Forum and Blu-ray Disc cashed high-resolution DVD film Test, the main reason is h.264/ AVC uses a smaller conversion formula with an inability to adjust the quantization matrix, resulting in the inability to preserve the high-frequency details of the image, so H.264/AVC in 2004 to discuss the standard revision to incorporate the call Fidelity Range Extensions (FRExt) [4] New coding tool and expanded by 4 new levels based on previous mainprofile (Table 1), expecting to be able to regain the disadvantage of high-resolution film applications, and the H.264/AVC second edition of the current revised standard was published in March 2005. In the later part of this paper, the correlation characteristics of network extraction layer are discussed, then the principle of video coding layer is explained, and the application status of H.264/AVC is discussed finally.
2. Network Abstraction Layer (abstraction layer,nal)
The H.264/AVC standard is characterized by the concept of the network abstraction layer, that is, in the form of a NAL packet as a unit of the VCL compiler code unit, so that the transport layer to NAL packets do not need to cut, just attach the Transfer Protocol file header information (adding header Only) can be sent to the bottom of the transmission, 1, can be regarded as a NAL package (packaging) module, used to encapsulate the VCL compressed bitstream into the appropriate size of the packet Unit (nal-unit), and Nal-unit The Nal-unit Type field in the header records the type of the packet, each of which corresponds to a different codec tool in the VCL. NAL Another important feature is that the Transport Layer protocol will act as set in reference flag when the network is congested and the packet error or receive order disorder (Out-of-order) occurs, and the receiving side VCL receives the NAL packet. You know that you want to do the so-called error concealment, and at the same time, you try to fix the error back when you unzip it. As shown in 2, a complete H.264/AVC bitstream is composed of multiple nal-units, so this bitstream is also called the NAL unit stream, and a NAL unit stream can contain multiple compressed video sequences (coded Video sequence), a single compressed video sequence represents a video movie, and the compressed video sequence is composed of multiple access units, when the receiving side receives an access unit, it can be completely decoded into a single picture, The first access unit for each compressed video sequence must be instantaneous decoding Refresh (IDR) Access unit,idraccess the contents of the unit are all intra-prediction encoded, So you can fully decode yourself without reference to other access unit data. Access Unit is also composed of a number of nal-units, the standard of a total of 12 types of nal-unit, which can be further categorized into VCL Nal-unit and NON-VCL Nal-unit, the so-called VCL Nal-unit is purely compressed image content, while the so-called NON-VCL Nal-unit there are two: Parameter sets and supplemental enhancement information (SEI), SEI can store the film introduction, Copyright publicityUser-defined data ... Parameter sets mainly describes the parameters of the entire compressed video sequence, such as the length-to-width ratio, the temporal point at which the image appears (timestamp), the parameters required for the relevant decoding ... And so on, this information is very important, in case of errors in the process of transmission, will cause the whole movie can not decode, in the past like mpeg-2/-4 all put this information in the general packet header, so easily with packet loss and disappear, now h.264/ AVC makes this information stand-alone as a special parameter set, which can be transmitted in a so-called Out-of-band way, so that the Out-of-band channel is protected with the highest-level channels coding. To ensure the correctness of the transmission.
3. Video coding layer (video Coding LAYER,VCL)
Video compression principle is the use of images in time and space similarity, these similar data after the compression algorithm processing, can be the human eye is not aware of the part of the extraction, these are called Visual redundancy (visual redundancy) parts after removal, can achieve the purpose of video compression. 1, the H.264/AVC video encoding mechanism is based on the Block (block-based) as the base unit, that is, the entire image is divided into a number of rectangular small area, called the Giant Tile (MACROBLOCK,MB), and then the giant tiles encoded, first use in-screen prediction ( intra-prediction) and picture-to-screen prediction (inter-prediction) technology to remove the similarity between images to obtain so-called differential image (residual), and then apply the difference image to space conversion (transform) and quantization ( Quantize) to remove visual redundancy, the final video encoding layer outputs the encoded bit stream (bitstream), which is then packaged into a network extraction layer's unit packet (NAL-UNIT), which is transmitted over the network to remote or stored in the storage medium. H.264/AVC allows a video movie to be encoded in frame or filed, both can coexist, and frame can be progress or interlace form, and the same can be used to mix encoding for the same movie, which is the same as MPEG-2. In the image Color format support, H.264/AVC the first version of the standard only support YCRCB 4:2:0 sampling method, and in the second edition of the revised standard to add 4:2:2 and 4:4:4 sampling format, usually these formats will be used in digital movies or HDTV films.
3.1 H.264/AVC image Format class structure
H.264/AVC hierarchy from small to large in sequence is Sub-block, block, macroblock, Slice, slicegroup, frame/field-picture, sequence. For a MB using 4:2:0 sampling, it is composed of the luma of the 16x16 Point and the corresponding 2 8x8 points chroma, and in the H.264/AVC specification, MB can be divided into 16x8, 8x16, 8x8, 8x4, 4x8, 4x4 format for sub-blocks. The so-called Slice is a collection of many MB, and an image is made up of many slice (Figure 3), slice is the smallest decoded unit in the H.264/AVC format (self-decodable unit), which means that a slice can decode the compressed data by itself , instead of relying on other slice, the advantage is that when the transmission to the remote, each received a slice of compressed data can be decoded immediately, not waiting for the entire data received before the beginning, and in the event of transmission of data loss or error, but also only affect the pen slice, Does not affect other slice, but differs from MPEG-2 's slice in that it allows the range of slice to be more than a row of megabytes, meaning H.264/AVC allows the entire image to consist of only a single slice. H.264/AVC's slice architecture also features a feature called Flexible macroblock Ordering (FMO), which means that the MB composing slice does not have to be limited to sequential scanning (rasterscan) arrangement, For example: The rightmost row of Figure 3 is well suited for multiple foreground (foreground) slice groups with a single background (background) Slice group, the advantage is that different slice group can be used for various quality compression parameters, For example, in areas where the foreground object is usually more interesting to the human eye, a smaller compression rate can be used to maintain good quality.
encoding mode of 3.2 slice
H.264/AVC Slice according to the type of encoding can be divided into the following categories: (1) i-slice:slice all MB is encoded in intra-prediction way; (2) P-slice: MB in slice is encoded using intra-prediction and inter-prediction, but each inter-prediction block can use a maximum of one moving vector; (3) B-slice: similar to P-slice , but each of the inter-prediction blocks can use two moving vectors. In particular, B-slice's ' B ' refers to bi-predictive, which differs greatly from the bi-directional concept of mpeg-2/-4 b-frame, mpeg-2/-4 B-frame is limited only by the previous and subsequent I (or P)- Frame to do inter-prediction, but H.264/AVC b-slice in addition to the previous and the next image of I (or P, b)-slice, can also be from the first two different images of I (or P, b)-slice to do inter- Prediction, and H.264/AVC added two additional special slice types: (1) Sp-slice: The so-called switching P slice, a special type of p-slice, used to string two different bitrate bitstream ; (2) si-slice: , the so-called switching I slice, is a special type of i-slice that can be used to perform random access in addition to the bitstream for connecting two different content. To achieve the function of the network VCR. These two special slice are mainly considered when the application of the video-on-demand streaming, for the same video content of the movie, the server will pre-store different bitrate compressed film, and when the bandwidth changes, The server will send a movie suitable for the bandwidth bitrate at that time, and the traditional practice is to wait until the appropriate point of the day to transmit the new I-slice (the capacity is much larger than p-slice), but because the bandwidth becomes smaller it takes more time to transmit the i-slice, This allows the client side of the image to be delayed, in order to let the same content but different bitrate bitstream can be more smoothly threaded, the use of Sp-slice will be easy to achieve (Figure 4), not only can send a new bitstream, also because the transmission of P-The slice has a smaller capacity, so there is no time lag. When the user of the client wants to switch to the new receive channel (channel), because the current transmission of bitstream not only the content of different connection bit rate is different, the traditional practice requires the client to re-buffer (buffering) a new channel content (Figure 5), This is in order to receive the new channel Bitstream I-slice, and then start to transmit the new channel Bitstream subsequent p-slice, so the client will also have a delayed reception, and when the client to carry out so-called fast-forwarding, reverse, random access ( Random access action, the traditional approach does not achieve real-time response, H.264/AVC use Si-slice can easily achieve the purpose.
3.3 In-screen predictive technology (intra-frame prediction)
In the past, when the compression standard was intra-prediction, it was mostly simply coding the conversion coefficients, while H.264/AVC in the spatial domain (spatial domain) to make predictions between the pixels, rather than using the converted coefficients. It offers two types of intra-prediction: intra_4x4 and intra_16x16, the so-called intra_4x4 is a luma 4x4 sub-block unit, after finding its reference object (predictor), After subtracting it from the reference object, the difference image (residual) is fed into the conversion algorithm, and the mode of the Reference object is 9 (Figure 6), taking mode 0 (vertical) as an example, {a,e,i,m}, {b,f,j,n}, {C,g,k,o}, { D,H,L,P} 's Reference objects are A, B, C, D;luma intra_16x16 and Chroma are similar to Luma intra_4x4, the detailed calculation formula can refer to [1].
3.4 Inter-screen prediction technology (inter-frame prediction)
As for the predictive technology across each picture, H.264/AVC provides a richer coding model with the following methods of block partitioning (partition): 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4, A variety of segmentation methods can make the prediction of moving vectors more accurate, 7, some of the moving areas of the screen is not a square, using a rectangular or smaller 4x4 segmentation to make predictions of the region, you can significantly reduce the value of the difference image to increase the compression ratio, However, the MB in P-slice can have up to 16 moving vectors (motion vectors), while the MB in B-slice can have up to 32 moving vectors, although these increase the capacity of the moving vector header (header), there is a positive benefit to the compression ratio as a whole. Furthermore, the dynamic estimation (motion estimation) used in previous compression standards only uses the previous image as the object of prediction, H.264/AVC provides the concept of multiple reference images (multiple reference frames), So that the moving vector is no longer limited to the adjacent images, but can span multiple images, 8, in the point of time t block, you can use T-1 to t-2 images of the tiles to be used as the predicted object, when the film has periodic repetition of the content, such as: The background image periodically appear or be obscured, The object has the behavior of jumping back and forth, the shape is large or small, or the camera in the shooting, because there are many places to take the sights, and photography pictures in the movement between the attractions, this situation in the ball game broadcast often appear, these conditions can be better dynamic prediction results, thus improving the compression efficiency.
3.5 conversion, quantization and Entropy coding algorithm (Transform, quantization, and entropycoding)
h.264/ AVC's conversion algorithm uses the so-called 4x4 and 8x8 integer conversion, and Mpeg-2/-4 's 8x8 DCT (discrete cosine Transform) is very different, because it is the sake of integer operations, not like the fractional arithmetic of DCT have coefficients to restore after the problem, It also reduces the extent of the block effect by converting to a 4x4 chunk size. In quantitative technology, H.264/AVC only use addition and multiplication without division, which is advantageous to the realization of integrated circuits. Unlike previous mpeg-2/-4 entropy coding techniques (entropy coding), H.264/AVC is aimed at quantifying conversion coefficients and non-conversion coefficient data (file header data, moving vectors ...). , respectively, using two different coding laws. Non-conversion coefficient data using a single encoding table, the advantage is that it can save the memory space used by the encoding table, for the quantization of conversion coefficient data, unlike mpeg-2/-4 for each image using a fixed encoding table, H.264/AVC use so-called Content Adaptive coding Technology ( Context-adaptive), which is the probability that some code (CODE-WORD) will be counted based on the encoded content, produces an encoding table that is best suited to the current image, with the advantage of being able to increase the compression ratio, but using additional bandwidth to transmit the encoded tables. There are two kinds of H.264/AVC content Adaptive Coding techniques: Context Adaptive Variable Length Coding (CAVLC) and contextadaptive arithmetic Binary Coding (CABAC) , the basic principle of CAVLC is the same as that of mpeg-2/-4 VLC, while the complexity of CABAC is higher than CAVLC, but it can provide a higher compression ratio, especially in compressed interleaved digital TV films.
3.6 Inline de-block effect filter (In-loop de-blocking filter)
Previously mentioned H.264/AVC is also a block-based compression method, so there will be a block effect (blocking-effect) phenomenon, although it uses 4x4 conversion can slightly reduce the extent of the block effect, but in the image of a smoother region, Still need to rely on the block effect filter to do image quality repair. Generally, the block effect filter is divided into two types: Post filter and In-loop filter, the so-called post filter is after the decoding process, and not in the specification of the decompression standard, the advantage is that the vendor can depend on the complexity of the application, the flexibility to determine the filter implementation mode, And the so-called In-loop filter is the direct specification in the process of compiling code, although it will increase the complexity, but because the image quality after the filter processing is better, if this as a reference image between the images, the prediction accuracy will be greatly improved, thus increasing the compression ratio.
4. Conclusion
Due to the improvement of H.264/AVC in the video coding algorithm, the compression ratio and the video quality are greatly improved compared with the mpeg-2/-4 phase, and the NAL concept helps to transmit high quality video content on the channel with limited bandwidth, in addition, for high-quality digital TV or high-definition DVD, H.264/AVC coding technology can easily meet the needs of the application, but on the market side, VC-1 standards with Microsoft, the advantages of PC platform and low-cost licensing strategy, will become H.264/AVC the most powerful challenger in the future.
# Plan Related Information
This paper is one of the achievements of the development of multimedia digital video technology by the Ministry of Economic Affairs of the Institute of Economics.
# Reference Documents
[1] "Draft ITU-T Recommendation and final Draft Internationalstandard of Joint Video specification (ITU-T Rec. h.264/iso/i EC 496-10 AVC, "Injoint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, JVTG050, 2003.
[2] "special Issue on the H.264/AVC Video Coding," Ieeetrans. Circuits Syst. Video Tech., vol. 2003, July.
[3] "proposed SMPTE standard for television:vc-9 compressed Videobitstream Format and decoding Process," 2004-03-31.
[4] g.j Sullivan, p. Topiwala, and A.luthra, "The H.264/AVC advanced Video Coding Standard:overview andintroduction to T He Fidelity Range Extensions, "SPIE Conference on Applications ofdigital Image processing, 2004.
H264 (nal Introduction and I-frame judgment)