-If you want to learn a new point of knowledge, the Official Handbook may be the quickest way. Looking at a summary of others on the web may be quicker to get started, but be accurate, thorough, complete, or read the Official Handbook. The following is from an analysis summary of the official document video file Format specification Version 10. In the process, an FLV file use case study was actually converted with ffmpeg. An FLV file, each type of tag belongs to a stream, that is, an FLV file with a maximum of only one audio stream, one video stream, there is no multiple independent audio and video streams in a file case. (MP4 seems to be possible) in addition, the FLV file format is used in the big-endian order. Note: In the following data type, the UI represents an unsigned shape, followed by a number indicating how long it is. For example, UI8, which means it cannot be shaped, is a byte in length. The UI24 is three bytes. UB represents a bit field, and UB5 represents 5 bits of one byte. You can refer to the bit-field structure in C.
flv Head
Field |
Type |
Comment |
Signature |
UI8 |
' F ' (0X46) |
Signature |
UI8 |
' L ' (0X4C) |
Signature |
UI8 |
' V ' (0x56) |
Version |
UI8 |
Version of FLV. 0X01 indicates that the FLV version is 1 |
Reserved fields |
UB5 |
The first five digits must be 0 |
Is there an audio stream |
UB1 |
Whether the audio stream has a flag |
Reserved fields |
UB1 |
Must be 0 |
Is there a video stream |
UB1 |
Whether the video stream has a flag |
File Header Size |
UI32 |
FLV version 1 o'clock fills in 9, indicating the size of the FLV header, which is used for later FLV version extensions. Include these four bytes. The starting position of the data is offset by so many sizes from the beginning of the file. |
flv file Body The body part consists of a tag, each tag below a piece
4bytesSpace, used to record the length of this tag, this post is used for reverse reading processing, their relationship such as: note: The head below four oneself is previoustagsize, because the previous one does not have the tag, therefore, the value fills 0.
FLV Tags Structure
Field |
Type |
Comment |
Tag type |
UI8 |
8:audio9:video18:script data--Here are some descriptive information. All others:reserved other values are not used. |
Data size |
UI24 |
The size of the data area, not including the header. The total size of the Baotou is 11 bytes. |
Time stamp |
UI24 |
The current frame timestamp, in milliseconds. The first tag timestamp relative to the flv file. The timestamp of the first tag is always 0. --Not the timestamp increment, which is the timestamp increment in rtmp. |
Time Stamp extension Field |
UI8 |
If the timestamp is greater than 0xFFFFFF, this byte will be used. This byte is the high 8 bits of the timestamp, and the above three bytes are low 24 bits. |
Stream ID |
U24 |
Always 0 |
Data area |
Ui8[n] |
|
Audio Data
Field |
Type |
Comment |
Audio format |
UB4 |
0 = Linear PCM, platform endian 1 = ADPCM 2 = MP3 3 = Linear PCM, little endian 4 = Nellymoser 16-khz Mono 5 = Nellymoser 8-khz Mono 6 = Nellymoser7 = g.711 A-law Logarithmic PCM8 = g.711 Mu-law logarithmic PCM 9 = RESERVED10 = AAC one = Speex14 = MP3 8-khz15 = device-specific Sound 7, 8, +, and 15: reserved for internal use. FLV is not supported for g711a, and if it is to be used, linear audio may be used. |
Sample Rate |
UB2 |
for aac:always = 5.5-KHZ1 = 11-KHZ2 = 22-khz3 = 44-khz |
Sample size |
UB1 |
0 = Snd8bit1 = snd16bit |
Channel |
UB1 |
0= Mono 1 = stereo, dual channel. AAC is always 1 |
Sound data |
Ui8[n] |
If it is PCM linear data, it is stored at the time each 16bit small end is stored, signed. If the audio format is AAC, the data is stored as AAC audio data, otherwise it is a linear array. |
AAC AUDIO DATA
Video Data
Field |
Type |
Comment |
Frame type |
UB4 |
1:keyframe (for AVC, a seekable frame)--h264 IDR, keyframes, can be re-entered frames. 2:inter frame (for AVC, a non-seekable frame)--h264 Normal frames 3:disposable inter frame (h.263 only) 4:generated keyframe (reserved for server with only) 5:video Info/command frame |
Encoding ID |
UB4 |
What type of encoding to use: 1:jpeg (currently unused) 2:sorenson H.263 3:screen video4:on2 vp65:on2 VP6 with alpha channel 6:screen video version 27:AVC |
Video data |
Ui[n] |
If it is AVC, refer to the following description: Avcvideopacket |
Avcvideopacket
Field |
Type |
Comment |
AVC Packet Type |
UI8 |
0:AVC Sequence Header 1:AVC Nalu unit 2:AVC sequence ends. Low-level AVC is not required. |
Cts |
SI24 |
If the AVC packet type is 1, then the CTS offset (see explanation below), 0 is 0 |
Data |
Ui8[n] |
If the AVC packet type is 0, then it is the decoder configuration, Sps,pps. If it is 1, it is the Nalu unit, which can be multiple, specific format: the following |
about CTS: This is a more incomprehensible concept that needs to be understood in conjunction with Pts,dts. First, the concept of PTS (presentation time stamps), DTS (decoder timestamps), CTS (Compositiontime): pts: The time that the receiver displays this frame on the display. The unit is 1/90000 seconds. DTS: The decoding time, which is the time stamp transmitted in the RTP packet, indicates the order of decoding. Unit unit is 1/90000 seconds. According to the following understanding, PTS is the compositiontimects offset in the standard: CTS = (PTS-DTS)/90. The CTS unit is in milliseconds. pts and DTS do not have the same time, should only appear in the case containing B-frame, that is, the profile main above. Baseline is not the problem, Baseline pts and DTS have always wanted to vomit, so the CTS has been 0. The time stamp in the FLV tag is DTS. Research documentation, iso/iec 14496-12:2005 (E) 8.15 time to Sample Boxes, found that Compositiontime is presentation time stamps, just a different term. -Need further confirmation. In, CP is the PTS, which shows the time. DT is the timestamp of the decoding time, RTP. The I1 is the first frame, the B2 is the second, and the subsequent sequence number is the camera output order. Determines the order in which the display is displayed. DT, is the order of the encodings, especially in cases where there is a B-frame, P4 to be in the second solution, because B2 and B3 depend on P4, but the P4 is displayed after B3 because his order is back. This shows the difference between the display time CT (PTS) and the decoding time DT, and there is a CT offset. The   P4 decoding time is 10, but the display time is 40, avcvideopacket in the data format:
Field |
Type |
Comment |
Length |
UI32 |
The length of the Nalu unit, not including the length field. |
Nalu data |
Ui8[n] |
Nalu data, no four bytes of Nalu cell head, starting directly from the H264 head, for example: 65 * * * * * * * * * * * * |
Length |
UI32 |
The length of the Nalu unit, not including the length field. |
Nalu data |
Ui8[n] |
Nalu data, no four bytes of Nalu cell head, starting directly from the H264 head, for example: 65 * * * * * * * * * * * * |
... |
... |
... |
Data tags The main Onmeta information needs attention.
Avcdecoderconfigurationrecord Avcvideopacket data format, save control information. Record Sps,pps information. Usually appears in the second tag, immediately after the Onmeta. A typical sequence: 0000190:0900 0033 0000 0000 0000 00
0000 0000... 3............00001A0:
0164 002a ffe1 001e 6764 002a acd9 4078. D.*....gd.*[email protected]00001b0:
0227 e5ff c389 4388 0400 0003 0028 0000.‘ .... C...... (.. 00001C0:
0978 3c60 c658 0100 0568 ebec b22c0000 .x< '. X...h ...,.. 17: H264idr data00: Indicates that the AVC sequence header is 00:cts for 0//
Avcdecoderconfigurationrecord01: Version Number
2a: Profile level Id,sps Three bytes, 64 means H264 high profile,2a represents level. Ff:nalu length, for 3? I don't know where this length is used. E1: Indicates that there is an SPS immediately below. An array of Sps[n]:sps. 1e: The front is a two-byte SPS length, indicating that the length of the subsequent SPS is 1e in size.
6764 002a acd9 4078
0227 e5ff c389 4388 0400 0003 0028 0000
0978 3c60 c658:sps data. Because there is only one SPS, skipping these lengths, and then the number of PPS information: number of 01:pps, 1//pps[n] PPS number 00 05: The Size of PPS is 5 bytes. Data of EB EC B2 2c:pps 00 00 ... this is the next tag.
My public number.
FLV file Format Official specification detailed