This article refers to some of the previous procedures and related articles, after summing up and re-join their own understanding, I hereby express our gratitude for the hard work of our predecessors.
Http://www.cnblogs.com/chef/archive/2012/07/18/2597279.html
http://blog.csdn.net/yeyumin89/article/details/7932368
http://blog.csdn.net/yeyumin89/article/details/7932431
Http://blog.sina.com.cn/s/blog_48f93b530100eyoe.html
http://blog.csdn.net/b4362928/article/details/4970169
Http://www.cnblogs.com/haibindev/archive/2011/12/29/2305712.html
X264 encoded data, using H264videoesviewer view, the structure of the following figure is:sps->pps->sei->idr->nonidr...nonidr->sps->pps- >idr->nonidr...nonidr->sps->pps->IDR->NONIDR...NONIDR----
1. The following figure is the SPS (Sequence parameter set) detailed parametersThe first 3 units are program settings//Copy a full Nalu, do not copy the starting prefix 0x000001 or 0x00000001
memcpy (Nalu->buf, &h264buffer[nalu->startcodeprefix_len], Nalu->len);
Nalu->forbidden_bit = nalu->buf[0] & 0x80; 1 bit should is always FALSE
NALU->NAL_REFERENCE_IDC = nalu->buf[0] & 0x60; 2 bit Nalu_priority_xxxx
Nalu->nal_unit_type = (nalu->buf[0]) & 0x1f;//5 bit nalu_type_xxxx the contents of the seq_parameter_set_rbsp below to be studied2. The following figure is the PPS (picture parameter set) detailed parametersThe first 3 parameters are the same as the SPS below the PIC_PARAMETER_SET_RBSP parameter to be studied3. The following image is the SeiAdditional enhancement information(Supplemental enhancement information) Information00000025 (FF) FF 5C DC E9 BD E6 D9 B7 ...?? \ warped Couch ga like h?
00000035 2C D8 D9 EE EF (2D)??? Erbium x264-c
00000045 6F, 2D, 2E, 2F, Ore 125-h.264/
00000055 4D, 2D, 6F, and MPEG-4 AVC codec.
00000065 2D 6F, 6C, and 2003 33-copyleft.
00000075 2D (2D) 3 a 2F 2F 77-2012-http://w
00000085 2E, 6F 6C, 6E 2E 6F, ww.videolan.org/, 2F
00000095 2E, 6D 6C, 2D, 6F, X264.html-opti
000000A5 6F 6E 3 A-all-in-a-3D-ons:cabac=1 ref
000000b5 3D 6C 6F + 6B 3D 3 A 3 a =4 deblock=1:0:0
000000c5 6E, 6C, 3D, 3 A, analyse=0x3:0x1
000000D5 6D 3D All-in-a-Me=dia, 6D, 3D, subme=
000000E5 PNs 3D, 5F, 3D to 7, Psy=1 psy_rd=1
000000f5 2E 3 A 2E (6D)--00:0.00 Mixed_r
00000105 3D 6D, 5F, 6E, 3D, Ef=1 me_range=16
00000115 6F 6D, 5F 6D, 3D, chroma_me=1 tre
00000125 6C 6C, 3D------------------Llis=1 8x8dct=1
00000135, 6D 3D, 7A, 6F 6E, 3D, cqm=0 deadzone=2
00000145 2C, 5F, 6B, 3D, 1,11 fast_pskip=
00000155 6F 6D, 5F, 5F 6F, 1 chroma_qp_offs
00000165 3D 2D All-in-a-et=-2 threads=6
00000175 6C 6F 6F 6B------------------------Lookahead_thread
00000185 3D 6C, and the 5F of the S=1 Sliced_threa
00000195 3D 6E, 3D, Ds=0, 6D, nr=0 Decima
000001a5 3D 6E, 6C, te=1, 3D, interlaced=
000001b5 6C, 5F, 6F 6D, 0, 3D, bluray_compat=
000001C5 6F 6E----6E----0 5F, 6E, constrained_in
000001d5, 3D, 6D, 3D, tra=0 bframes=0.
000001E5, the 3D, 6B, 6E, weightp=2 keyint
000001f5 3D 6B, 6E, 5F 6D, 6E 3D, =50 keyint_min=5
00000205 6E (3D)--6E, scenecut=40 int
00000215 5F-------3D----3D ra_refresh=0 rc=
00000225, 6D, 3D, the mbtree=0 bit of the ABR
00000235 (3D), 6F 6C 3D rate=90 ratetol=.
00000245 2E, 6F 6D, 3D, 2E, 1.0 qcomp=0.60 Q
00000255 6D 6E 3D to +-6D-pmin=10 qpmax=30
00000265, 3D, 5F, and the qpstep=4 Ip_rat.
00000275 6F 3D 2E 3 a 2E (io=1.40 aq=1:1.0).
00000285 30 00 80 0.?
This SEI information mainly describes some of the setup parameter information of X264 encoder;4. Below is the information for IDR (coded slice of an IDR pictures)This contains all the IDR key-frame data.5. The following figure is the information of NON-IDR (coded slice of an NON-IDR pictures)This contains non-keyframe data in order to describe the difference between keyframes and non-keyframes, first cite an article
"DR (instantaneous decoding refresh)--instant decode refresh.
Both I and IDR frames are predicted using intra-frame. They are all the same thing, in the encoding and decoding for convenience, the first I-frame and other I-frame differences, so the first I frame is called IDR, so it is convenient to control the encoding and decoding process. The function of the IDR frame is to refresh immediately so that the error does not propagate, starting with the IDR frame and re-calculating a new sequence to begin coding. While I-frames do not have the ability to randomly access, this feature is assumed by IDR. IDR will cause DPB (Decodedpicturebuffer reference Frame list-This is the key) to empty, and I will not. The IDR image must be an I image, but the I image is not necessarily an IDR image. A sequence can have a lot of I images, I image after the image can refer to I image between the image to do motion reference. There can be a lot of I images in a sequence, and the image after I image can refer to the images between I images for motion reference.
For IDR frames, all frames after the IDR frame cannot reference the contents of the frame before any IDR frames, and in contrast, for normal I-frames, the B-and P-frames after them can refer to the I-frame before the normal I-frame. From a randomly accessed video stream, the player can always play from an IDR frame (when the video is dragged, it is dragged to the IDR frame position to continue playing) because there is no frame behind it that references the previous frame. However, you cannot start from any point in a video that does not have an IDR frame, because the frame that follows will always refer to the previous frame. ”
In the above description I frame refers to the normal frame (also known as non-keyframe, Non-idr frame), IDR is clearly refers to the keyframe
6. Talk about the X264 real-time encoding and sending process
1) X264 initialization, specifies the width, height, fps, bitrate, quality of the encoded image
2) YUV420 data sent over the code
3) If the data is encoded, then send to 0x000001 or 0x00000001 parsing, find the data of the head of 0x000001 or 0x00000001, then packet the trailing data into a NALU unit, and set a Nalu parameters, and send out, The format of the NALU unit is as follows:
typedef struct
{
int Startcodeprefix_len; //! 4 for parameter sets and first slice in picture, 3 for everything else (suggested)
unsigned len; //! Length of the NAL unit (excluding the start code, which does not belong to the Nalu)
unsigned max_size; //! Nal Unit Buffer Size
int forbidden_bit; //! Should is always FALSE
int NAL_REFERENCE_IDC; //! Nalu_priority_xxxx
int nal_unit_type; //! Nalu_type_xxxx
Char *buf; //! Contains the first byte followed by the EBSP
unsigned short lost_packets; //! True if packet loss is detected
} nalu_t;
About the type of Nalu (nal_unit_type) Description: #define NALU_TYPE_SLICE 1
#define NALU_TYPE_DPA 2
#define NALU_TYPE_DPB 3
#define NALU_TYPE_DPC 4
#define NALU_TYPE_IDR 5
#define NALU_TYPE_SEI 6
#define NALU_TYPE_SPS 7
#define NALU_TYPE_PPS 8
#define Nalu_type_aud 9
#define NALU_TYPE_EOSEQ 10
#define Nalu_type_eostream 11
#define Nalu_type_fill 12 Each Nalu the first byte & 0x1f can derive its type, such as the first nalu:67 & 0x1f = 7, then this unit is an SPS, the third: the 0x1f = 8, This unit is PPS, so look at the first 3 information in front of each piece of data, the 3rd information, through the program described as follows: Nalu->nal_unit_type = (nalu->buf[0]) & 0x1f;//5 bit Nalu_type_ Xxxx7.flv file Structure AnalysisFLV is a binary file containing the FLV header and the file Body,file Body contains a lot of prevoius tag size and tag composition, tag can be divided into three categories: Audio,video,script, respectively, representing the audio stream, Video streaming, script flow, and each tag is composed of tag and Previous tag size pair, Previous tag size is immediately after each tag, accounting for 4 bytes representing a value of UI32 type, indicating the size of the previous tag; The entire FLV file is actually: FLV header (occupies 9 bytes) + Previous tag size0 (occupies 4 byte by default for XX) + Metadata tag (for information about FLV video and audio parameters such as duration, WI DTH, height, etc., the actual size of the occupied size calculation) + Previous tag size (occupies 4 byte to indicate the size of the previous tag, this refers to the length of the metadata) + video TAG1 (video configuration information, the size of the actual size calculation) + Previous Tag size1 (occupies 4 byte video configuration information size) + audio Tag2 (audio configuration information, size by actual size) +previous tag size2 (occupies 4 bytes audio configuration information size) + ... +tagn + P revious tag Sizen.
Starting from the Tag3, recorded in turn is the audio and video data and its size, below the red part of the above for analysis.
7.1 FLV Header AnalysisThe FLV header consists of 9 bytes: The first 3 bytes are file types, always "FLV", i.e. (0x46 0x4C 0x56). The 4th Btye is the version number, which is currently fixed to 0x01. The 5th byte is a stream of information, the penultimate bit is 1 for the video (0x01,00000001), the penultimate bit is 1 for audio (0x04,00000100), video and audio are 0x01 | 0x04 (0x05,00000101), the other should be 0. The last 4 bytes indicate the length of the FLV header, 3+1+1+4 = 9.7.2 FLV Body AnalysisThe FLV header is followed by the FLV body:flv body consisting of several tags. Each tag the first part is the tag Header,tag header length is 11bytes, but each tag header has 4bytes recorded in front of the last tag length; because the first tag has a length of 0, So the first thing that follows is the 4 00, which is the initial Tag Size, as shown in the following image selection area7.2.1 Tag Header analysisThe 1th byte is a record of the type of tag, audio (0x8), Video (0x9), script (0x12);
Section 2-4bytes is the length of the data area, the value of the UI24 type, that is, the length of the tag data; Note: This length equals the last tag Size-11 第5-7个 bytes is the timestamp, the UI24 type value, in milliseconds, type 0x12 script type data, The timestamp is 0, and the timestamp controls the speed of the file playback, which can be set according to the frame rate class of the audio and video; the 8th byte is the extended timestamp, which expands the timestamp to a 32-bit value as the highest bit when the 24-bit value is not sufficient; 第9-11个 bytes is a value of type streamid,ui24. But the total is 0; the tag header length is 1+3+3+1+3=11. Then there is the data area (tag data), which is the bare stream of the h264. Above is the header of metadata tag, only set the tag type and data area length;7.2.2 Tag Data analysisThe tag data type can be divided into audio data, video data and script data.7.2.2.1 Tag data is audio
Tag data if it is audio data, the 1th byte records the audio information:
第1-4个 bits represents the audio format (see official documentation for all formats):
• 0--Uncompressed
1--ADPCM
·2--MP3
4--Nellymoser 16-khz Mono
5--Nellymoser 8-khz Mono
10--AAC
第5-6个 bits represents samplerate:
0--5.5KHz
1--11kHz
·2--22kHz
·3--44kHz
The 7th bit indicates the length of the sample:
0--Snd8bit
1--Snd16bit
The 8th bit represents a type:
0--Sndmomo
1--Sndstereo
Then the real audio data is behind it.
7.2.2.2 Tag data is a video
If it is video data, the first byte records the video information:
第1-4个 bits represents the type:
1--Keyframe
·2--Inner frame
·3--Disposable inner frame (h.263 only)
4--generated keyframe
Section 5-8bits indicates the decoder ID:
·2--Seronson h.263
·3--screen video
4--On2 VP6
5--On2 VP6 with alpha channel
6--screen video version 2