Overview
Flash Video ( FLV), is a popular network format. At present, most of the video sharing sites at home and abroad are used in this format.
File Structure
From the entire file, FLV is composed of the FLV header and the flv file Body .
1.The FLV Header
Field |
Type |
Comment |
Signature |
UI8 |
Signature byte always ' F ' (0x46) |
Signature |
UI8 |
Signature byte always ' L ' (0x4C) |
Signature |
UI8 |
Signature byte always ' V ' (0x56) |
Version |
UI8 |
File version (for example, 0x01 for FLV version 1) |
Typeflagsreserved |
UB [5] |
shall be 0 |
Typeflagsaudio |
UB [1] |
1 = Audio tags is present |
Typeflagsreserved |
UB [1] |
shall be 0 |
Typeflagsvideo |
UB [1] |
1 = Video tags is present |
Dataoffset |
UI32 |
The length of this header in bytes |
Signature: The first 3 bytes of the FLV file are fixed 'F'L'V', which identifies the file as an FLV format. When doing a format probe,
If the first 3 bytes are found to be "flv", it is considered to be an FLV file.
version: The 4th byte represents the FLV version number.
Flags: the No. 0 and 2nd digits of the 5th byte, respectively, indicate the presence of video and audio. (1 means present, 0 means not present)
Dataoffset : The last 4 bytes represent the length of the FLV header.
2.The FLV File Body
Field |
Type |
Comment |
PreviousTagSize0 |
UI32 |
Always 0 |
Tag1 |
Flvtag |
First tag |
PreviousTagSize1 |
UI32 |
Size of previous tag, including its header, in bytes. For FLV Version1, This value is one plus the datasize of the previous tag. |
Tag2 |
Flvtag |
Second tag |
... |
... |
... |
PreviousTagSizeN-1 |
UI32 |
Size of second-to-last tag, including its header, in bytes. |
Tagn |
Flvtag |
Last tag |
Previoustagsizen |
UI32 |
Size of last tag, including it header, in bytes |
After the FLV header , it is the flv File Body.
The FLV File Body is composed of a series of back-pointers + tags. The back-pointers is 4 bytes of data representing the size of the previous tag.
FLV Tag Definition
The data in the FLV file is made up of one tag, and the data in the tag may be video, audio, scripts.
The following table is the structure of the tag:
1.FLVTAG
Field |
Type |
Comment |
Reserved |
UB [2] |
Reserved for FMS, should is 0 |
Filter |
UB [1] |
Indicates if packets is filtered. 0 = No pre-processing required. 1 = pre-processing (such as decryption) of the packet is Required before it can be rendered. Shall is 0 in unencrypted files, and 1 for encrypted tags. See Annex F. FLV encryption for the use of filters. |
Tagtype |
UB [5] |
Type of contents in the this tag. The following types is Defined 8 = Audio 9 = Video = Script Data |
DataSize |
UI24 |
Length of the message. Number of bytes after streamid to End of Tag (Equal to length of the tag–11) |
Timestamp |
UI24 |
Time in milliseconds at which the "the data" this tag applies. This value was relative to the first tag in the FLV file, which Always have a timestamp of 0. |
timestampextended |
UI8 |
Extension of the Timestamp field to form a SI32 value. This field represents the upper 8 bits, while the previous Timestamp field represents the lower bits of the time in Milliseconds. |
Streamid |
UI24 |
Always 0. |
Audiotagheader |
IF Tagtype = = 8 Audiotagheader |
|
Videotagheader |
IF Tagtype = = 9 Videotagheader |
|
Encryptionheader |
IF Filter = = 1 Encryptiontagheader |
|
Filterparams |
IF Filter = = 1 Filterparams |
|
Data |
IF Tagtype = = 8 Audiodata IF Tagtype = = 9 Videodata IF Tagtype = = 18 Scriptdata |
Data specific for each media type. |
Tagtype: The first 5 bits in the 1th byte of the tag indicate the type of data contained in this tag, 8 = audio,9 = video,18 = Script data.
datasize: The length of data after Streamid.
Timestamp and timestampextended make up the data of this tag packet pts information, remember the first time to do FVL Demux, and did not consider the value of timestampextended, Directly to the Timestamp is a pts, and then the phenomenon is that the picture has the phenomenon of jumping frame, later only carefully read the document found that the real data pts is pts= Timestamp | Timestampextended<<24.
Streamid After the data is the case of each format is not the same, followed by a detailed description of the format.
Audio Tags
If the tagtype==8 in the tag package, it means that the tag is audio.
The data after Streamid is expressed as Audiotagheader, and theAudiotagheader structure is as follows:
Field |
Type |
Comment |
Soundformat |
UB [4] |
Format of Sounddata. The following values are defined: 0 = Linear PCM, platform endian 1 = ADPCM 2 = MP3 3 = Linear PCM, little endian 4 = Nellymoser KHz Mono 5 = Nellymoser 8 KHz Mono 6 = Nellymoser 7 = g.711 A-law logarithmic PCM 8 = g.711 Mu-law logarithmic PCM 9 = Reserved Ten = AAC one = Speex MP3 = 8 KHz = Device-specific Sound Formats 7, 8, and reserved. AAC is supported in Flash Player 9,0,115,0 and higher. Speex is supported in Flash Player and higher. |
Soundrate |
UB [2] |
Sampling rate. The following values are defined: 0 = 5.5 KHz 1 = one kHz 2 = KHz 3 = About KHz |
Soundsize |
UB [1] |
Size of each audio sample. This parameter only pertains to Uncompressed formats. Compressed formats always decode to + bits internally. 0 = 8-bit Samples 1 = 16-bit Samples |
Soundtype |
UB [1] |
Mono or stereo sound 0 = Mono Sound 1 = Stereo Sound |
Aacpackettype |
IF Soundformat = = 10 UI8 |
The following values are defined: 0 = AAC Sequence Header 1 = AAC Raw |
The first 1 bytes of Audiotagheader , which is the 1 bytes followed by streamid , contain basic information about the audio type, sample rate, and so on. The list is very clear.
Audiotagheader followed by the audiodata data, that is, audio payload but there is a special case, if the audio format (Soundformat) is ten = AAC, The Audiotagheader will have more than 1 bytes of data aacpackettype, this field represents the type of aacaudiodata : 0 = AAC sequence header,1 = AAC Raw
Field |
Type |
Comment |
Data |
IF Aacpackettype ==0 Audiospecificconfig |
The audiospecificconfig is defined in iso14496-3. Note that this is not the same as the contents of the Esds box from the mp4/f4v file. |
|
ELSE IF Aacpackettype = = 1 Raw AAC frame data in UI8 [] |
Audio payload |
The AAC sequence header also contains audiospecificconfig,audiospecificconfig contains some more detailed audio information, The definition of audiospecificconfig in the iso14496-3 1.6.2.1 audiospecificconfig, here is not the detailed paste. And in ffmpeg in the Audiospecificconfig analytic function,ff_mpeg4audio_get_config(), you can compare the look, understand more deeply.
AAC Raw This is the audio es stream, that is, payload.
In the FLV file, generally the AAC sequence Header This package only appears 1 times, and is the first audio tag, why mention this tag, because when doing flvdemux, if it is AAC audio, Need to add 7 bytes adst header in front of each AAC es stream,adst in the audio format, this is the decoder common format, is AAC's pure ES stream to be packaged into ADST format AAC file, The decoder will play properly. It is in packing ADST, need Samplingfrequencyindex This information, samplingfrequencyindex the most accurate information is in Audiospecificconfig , so the audiospecificconfig is parsed and Samplingfrequencyindex is obtained.
In this step, you can completely extract the audio information and data from the FLV file and send it to the audio decoder for normal playback.
Video Tags
If the tagtype==9 in the tag package, it means that the tag is video.
The data after Streamid is expressed as Videotagheader, and theVideotagheader structure is as follows:
Field |
Type |
Comment |
Frame Type |
UB [4] |
Type of video frame. The following values are defined: 1 = key frame (for AVC, a seekable frame) 2 = Inter frame (for AVC, a non-seekable frame) 3 = disposable Inter frame (h.263 only) 4 = Generated key frame (reserved for server with only) 5 = Video Info/command frame |
Codecid |
UB [4] |
Codec Identifier. The following values are defined: 2 = Sorenson H.263 3 = Screen Video 4 = On2 VP6 5 = On2 VP6 with alpha channel 6 = Screen Video version 2 7 = AVC |
Avcpackettype |
IF Codecid = = 7 UI8 |
The following values are defined: 0 = AVC Sequence Header 1 = AVC Nalu 2 = AVC end of sequence (lower level Nalu sequence ender are not required or supported) |
Compositiontime |
IF Codecid = = 7 SI24 |
IF Avcpackettype = = 1 Composition Time Offset ELSE 0 See ISO 14496-12, 8.15.3 for an explanation of composition Times. The offset in a FLV file is always in milliseconds. |
The first 1 bytes of Videotagheader , that is, 1 bytes followed by Streamid , contain the video frame type and the most basic information of the video codecid. The list is very clear.
Videotagheader followed by the videodata data, that is, video payload. Of course, like audio AAC, there are special cases where the video format is AVC (H. Videotagheader will have more than 4 bytes of information.
Avcpackettype and Compositiontime. Avcpackettype represents the following Videodata (avcvideopacket) :
IF Avcpackettype = = 0 avcdecoderconfigurationrecord(AVC sequence header)
IF Avcpackettype = = 1 One or more nalus (full frames is required)
Avcdecoderconfigurationrecord. Contains the most important SPS and PPS information, and then the AVC decoder to send the data stream must be sent to the SPS and PPS information, otherwise the decoder will not be normal decoding. And before start again after the decoder stop, such as seek, fast-forward fallback state switch, etc., all need to re-send the SPS and PPS information. Avcdecoderconfigurationrecord in the FLV file, the general situation also occurs 1 times, that is, the first video tag.
Avcdecoderconfigurationrecord is defined in the ISO 14496-15, 5.2.4.1, not in detail here,
Scriptdata
If the tagtype==18 in the tag packet, it indicates that the tag is a SCRIPT.
The scriptdata structure is very complex and defines a number of format types, each of which corresponds to a structure.
Field |
Type |
Comment /td> |
type |
UI8 |
type of the Scriptdatavalue. The following types is defined: 0 = number 1 = Boolean 2 = String 3 = Object 4 = MovieClip (reserved, Not supported) 5 = Null 6 = Undefined 7 = Reference 8 = ECMA array 9 = Object End marker Ten = Str ICT array One = Date = Long string |
scriptdatavalue |
IF Type = = 0 D Ouble If Type = = 1 UI8 If type = = 2 scriptdatastring If type = = 3 Scriptdataobject If type = = 7 UI16 If type = = 8 Scriptdataecmaarray If type = = Ten Scriptdatastrictarray If type = = All Scri Ptdatadate IF Type = = Scriptdatalongstring |
Script data value. The Boolean value is (scriptdatavalue≠0). |
The types are described in detail in the official FLV documentation.
Onmetadata
Onmetadata is a very important information for us in Scriptdata , and the structure is as follows:
Property Name |
Type |
Comment |
Audiocodecid |
Number |
Audio codec ID used in the file (see e.4.2.1 for available Soundformat values) |
Audiodatarate |
Number |
Audio bit rate in kilobits per second |
Audiodelay |
Number |
Delay introduced by the audio codec in seconds |
Audiosamplerate |
Number |
Frequency at which the audio stream is replayed |
Audiosamplesize |
Number |
Resolution of a single audio sample |
Canseektoend |
Boolean |
Indicating the last video frame is a key frame |
CreationDate |
String |
Creation Date and time |
Duration |
Number |
Total duration of the file in seconds |
FileSize |
Number |
Total size of the file in bytes |
Framerate |
Number |
Number of frames per second |
Height |
Number |
Height of the video in pixels |
Stereo |
Boolean |
Indicating stereo audio |
Videocodecid |
Number |
Video codec ID used in the file (see e.4.3.1 for available codecid values) |
Videodatarate |
Number |
Video bit rate in kilobits per second |
Width |
Number |
Width of the video in pixels |
The duration, filesize, width and height of the video are very useful to us.
keyframes
When doing FLV Demux, found that the official document does not describe the keyframes index, but this structure of the FLV each tag is not like TS have synchronization head, if there is no keyframes index, The effect of seek and fast forward rewind is very poor, because a tag is required to read sequentially. Later through the network to look up some information, found a keyframes of information hidden in the scriptdata .
keyframes is almost an unofficial standard, that is, civil standards. The FLV file format is hard to see online, but metadata does not contain videos of keyframes items. Two common tools for operating metadata are flvtool2 and flvmdi, all of which use keyframes as a default meta-information item. On the FLVMDI homepage (http://www.buraks.com/flvmdi/), there is a description:
keyframes: (object) This object was added only if you specify the/k switch. ' KeyFrames ' is known-flvmdi and if/k switch is not specified, ' keyframes ' object would be deleted.
' KeyFrames ' object has 2 arrays: ' Filepositions ' and ' Times '. Both arrays has the same number of elements, which is equal to the number of key frames in the FLV. Values in times array is in ' seconds '. Each correspond to the timestamp of the n ' th key frame. Values in filepositions array is in ' bytes '. Each correspond to the fileposition of the nth key frame video tag (which starts with byte tag type 9).
That is, keyframes contains 2 content ' filepositions ' and ' Times ', respectively, referring to the file location of the keyframe and the PTS of the keyframe. by keyframes You can build your own index , and then seek and fast-forward the quick-rewind operation, quickly and efficiently jump to the location of the keyframe you want to find to handle.
FLV Video Encapsulation Format