FLV Video Encapsulation Format

Source: Internet
Author: User
Tags rewind time in milliseconds flv file

Overview

Flash Video ( FLV), is a popular network format. At present, most of the video sharing sites at home and abroad are used in this format.

File Structure

From the entire file, FLV is composed of the FLV header and the flv file Body .

1.The FLV Header
Field Type Comment
Signature UI8 Signature byte always ' F ' (0x46)
Signature UI8 Signature byte always ' L ' (0x4C)
Signature UI8 Signature byte always ' V ' (0x56)
Version UI8 File version (for example, 0x01 for FLV version 1)
Typeflagsreserved UB [5] shall be 0
Typeflagsaudio UB [1] 1 = Audio tags is present
Typeflagsreserved UB [1] shall be 0
Typeflagsvideo UB [1] 1 = Video tags is present
Dataoffset UI32 The length of this header in bytes

Signature: The first 3 bytes of the FLV file are fixed 'F'L'V', which identifies the file as an FLV format. When doing a format probe,

If the first 3 bytes are found to be "flv", it is considered to be an FLV file.

version: The 4th byte represents the FLV version number.

Flags: the No. 0 and 2nd digits of the 5th byte, respectively, indicate the presence of video and audio. (1 means present, 0 means not present)

Dataoffset : The last 4 bytes represent the length of the FLV header.

2.The FLV File Body
Field Type Comment
PreviousTagSize0 UI32 Always 0
Tag1 Flvtag First tag
PreviousTagSize1 UI32

Size of previous tag, including its header, in bytes. For FLV Version1,

This value is one plus the datasize of the previous tag.

Tag2 Flvtag Second tag
... ... ...
PreviousTagSizeN-1 UI32 Size of second-to-last tag, including its header, in bytes.
Tagn Flvtag Last tag
Previoustagsizen UI32 Size of last tag, including it header, in bytes

After the FLV header , it is the flv File Body.

The FLV File Body is composed of a series of back-pointers + tags. The back-pointers is 4 bytes of data representing the size of the previous tag.

FLV Tag Definition

The data in the FLV file is made up of one tag, and the data in the tag may be video, audio, scripts.

The following table is the structure of the tag:

1.FLVTAG
Field Type Comment
Reserved UB [2] Reserved for FMS, should is 0
Filter UB [1] Indicates if packets is filtered.
0 = No pre-processing required.
1 = pre-processing (such as decryption) of the packet is
Required before it can be rendered.
Shall is 0 in unencrypted files, and 1 for encrypted tags.
See Annex F. FLV encryption for the use of filters.
Tagtype UB [5]

Type of contents in the this tag. The following types is
Defined
8 = Audio
9 = Video
= Script Data

DataSize UI24 Length of the message. Number of bytes after streamid to
End of Tag (Equal to length of the tag–11)
Timestamp UI24 Time in milliseconds at which the "the data" this tag applies.
This value was relative to the first tag in the FLV file, which
Always have a timestamp of 0.
timestampextended UI8 Extension of the Timestamp field to form a SI32 value. This
field represents the upper 8 bits, while the previous
Timestamp field represents the lower bits of the time in
Milliseconds.
Streamid UI24 Always 0.
Audiotagheader IF Tagtype = = 8
Audiotagheader
Videotagheader IF Tagtype = = 9
Videotagheader
Encryptionheader IF Filter = = 1
Encryptiontagheader
Filterparams IF Filter = = 1
Filterparams
Data IF Tagtype = = 8
Audiodata
IF Tagtype = = 9
Videodata
IF Tagtype = = 18
Scriptdata
Data specific for each media type.

Tagtype: The first 5 bits in the 1th byte of the tag indicate the type of data contained in this tag, 8 = audio,9 = video,18 = Script data.

datasize: The length of data after Streamid.

Timestamp and timestampextended make up the data of this tag packet pts information, remember the first time to do FVL Demux, and did not consider the value of timestampextended, Directly to the Timestamp is a pts, and then the phenomenon is that the picture has the phenomenon of jumping frame, later only carefully read the document found that the real data pts is pts= Timestamp | Timestampextended<<24.

Streamid After the data is the case of each format is not the same, followed by a detailed description of the format.

Audio Tags

If the tagtype==8 in the tag package, it means that the tag is audio.

The data after Streamid is expressed as Audiotagheader, and theAudiotagheader structure is as follows:

Field Type Comment
Soundformat UB [4] Format of Sounddata. The following values are defined:
0 = Linear PCM, platform endian
1 = ADPCM
2 = MP3
3 = Linear PCM, little endian
4 = Nellymoser KHz Mono
5 = Nellymoser 8 KHz Mono
6 = Nellymoser
7 = g.711 A-law logarithmic PCM
8 = g.711 Mu-law logarithmic PCM
9 = Reserved
Ten = AAC
one = Speex
MP3 = 8 KHz
= Device-specific Sound
Formats 7, 8, and reserved.
AAC is supported in Flash Player 9,0,115,0 and higher.
Speex is supported in Flash Player and higher.
Soundrate UB [2] Sampling rate. The following values are defined:
0 = 5.5 KHz
1 = one kHz
2 = KHz
3 = About KHz
Soundsize UB [1]

Size of each audio sample. This parameter only pertains to
Uncompressed formats. Compressed formats always decode
to + bits internally.
0 = 8-bit Samples
1 = 16-bit Samples

Soundtype UB [1] Mono or stereo sound
0 = Mono Sound
1 = Stereo Sound
Aacpackettype IF Soundformat = = 10
UI8
The following values are defined:
0 = AAC Sequence Header
1 = AAC Raw

The first 1 bytes of Audiotagheader , which is the 1 bytes followed by streamid , contain basic information about the audio type, sample rate, and so on. The list is very clear.

Audiotagheader followed by the audiodata data, that is, audio payload but there is a special case, if the audio format (Soundformat) is ten = AAC, The Audiotagheader will have more than 1 bytes of data aacpackettype, this field represents the type of aacaudiodata : 0 = AAC sequence header,1 = AAC Raw

Field Type Comment
Data

IF Aacpackettype ==0 Audiospecificconfig

The audiospecificconfig is defined in iso14496-3. Note that this is not the same as the contents of the Esds box from the mp4/f4v file.

ELSE IF Aacpackettype = = 1 Raw AAC frame data in UI8 []

Audio payload

The AAC sequence header also contains audiospecificconfig,audiospecificconfig contains some more detailed audio information, The definition of audiospecificconfig in the iso14496-3 1.6.2.1 audiospecificconfig, here is not the detailed paste. And in ffmpeg in the Audiospecificconfig analytic function,ff_mpeg4audio_get_config(), you can compare the look, understand more deeply.

AAC Raw This is the audio es stream, that is, payload.

In the FLV file, generally the AAC sequence Header This package only appears 1 times, and is the first audio tag, why mention this tag, because when doing flvdemux, if it is AAC audio, Need to add 7 bytes adst header in front of each AAC es stream,adst in the audio format, this is the decoder common format, is AAC's pure ES stream to be packaged into ADST format AAC file, The decoder will play properly. It is in packing ADST, need Samplingfrequencyindex This information, samplingfrequencyindex the most accurate information is in Audiospecificconfig , so the audiospecificconfig is parsed and Samplingfrequencyindex is obtained.

In this step, you can completely extract the audio information and data from the FLV file and send it to the audio decoder for normal playback.

Video Tags

If the tagtype==9 in the tag package, it means that the tag is video.

The data after Streamid is expressed as Videotagheader, and theVideotagheader structure is as follows:

Field Type Comment
Frame Type UB [4] Type of video frame. The following values are defined:
1 = key frame (for AVC, a seekable frame)
2 = Inter frame (for AVC, a non-seekable frame)
3 = disposable Inter frame (h.263 only)
4 = Generated key frame (reserved for server with only)
5 = Video Info/command frame
Codecid UB [4] Codec Identifier. The following values are defined:
2 = Sorenson H.263
3 = Screen Video
4 = On2 VP6
5 = On2 VP6 with alpha channel
6 = Screen Video version 2
7 = AVC
Avcpackettype IF Codecid = = 7
UI8

The following values are defined:
0 = AVC Sequence Header
1 = AVC Nalu
2 = AVC end of sequence (lower level Nalu sequence ender are not required or supported)

Compositiontime IF Codecid = = 7
SI24
IF Avcpackettype = = 1
Composition Time Offset
ELSE
0
See ISO 14496-12, 8.15.3 for an explanation of composition
Times. The offset in a FLV file is always in milliseconds.

The first 1 bytes of Videotagheader , that is, 1 bytes followed by Streamid , contain the video frame type and the most basic information of the video codecid. The list is very clear.

Videotagheader followed by the videodata data, that is, video payload. Of course, like audio AAC, there are special cases where the video format is AVC (H. Videotagheader will have more than 4 bytes of information.

Avcpackettype and Compositiontime. Avcpackettype represents the following Videodata (avcvideopacket) :

IF Avcpackettype = = 0 avcdecoderconfigurationrecord(AVC sequence header)
IF Avcpackettype = = 1 One or more nalus (full frames is required)

Avcdecoderconfigurationrecord. Contains the most important SPS and PPS information, and then the AVC decoder to send the data stream must be sent to the SPS and PPS information, otherwise the decoder will not be normal decoding. And before start again after the decoder stop, such as seek, fast-forward fallback state switch, etc., all need to re-send the SPS and PPS information. Avcdecoderconfigurationrecord in the FLV file, the general situation also occurs 1 times, that is, the first video tag.

Avcdecoderconfigurationrecord is defined in the ISO 14496-15, 5.2.4.1, not in detail here,

Scriptdata

If the tagtype==18 in the tag packet, it indicates that the tag is a SCRIPT.

The scriptdata structure is very complex and defines a number of format types, each of which corresponds to a structure.

Field Type Comment /td>
type UI8 type of the Scriptdatavalue.
The following types is defined:
0 = number
1 = Boolean
2 = String
3 = Object
4 = MovieClip (reserved, Not supported)
5 = Null
6 = Undefined
7 = Reference
8 = ECMA array
9 = Object End marker
Ten = Str ICT array
One = Date
= Long string
scriptdatavalue IF Type = = 0
D Ouble
If Type = = 1
UI8
If type = = 2
scriptdatastring
If type = = 3
Scriptdataobject
If type = = 7
UI16
If type = = 8
Scriptdataecmaarray
If type = = Ten
Scriptdatastrictarray
If type = = All
Scri Ptdatadate
IF Type = =
Scriptdatalongstring
Script data value.
The Boolean value is (scriptdatavalue≠0).

The types are described in detail in the official FLV documentation.

Onmetadata

Onmetadata is a very important information for us in Scriptdata , and the structure is as follows:

Property Name Type Comment
Audiocodecid Number Audio codec ID used in the file (see e.4.2.1 for available Soundformat values)
Audiodatarate Number Audio bit rate in kilobits per second
Audiodelay Number Delay introduced by the audio codec in seconds
Audiosamplerate Number Frequency at which the audio stream is replayed
Audiosamplesize Number Resolution of a single audio sample
Canseektoend Boolean Indicating the last video frame is a key frame
CreationDate String Creation Date and time
Duration Number Total duration of the file in seconds
FileSize Number Total size of the file in bytes
Framerate Number Number of frames per second
Height Number Height of the video in pixels
Stereo Boolean Indicating stereo audio
Videocodecid Number Video codec ID used in the file (see e.4.3.1 for available codecid values)
Videodatarate Number Video bit rate in kilobits per second
Width Number Width of the video in pixels

The duration, filesize, width and height of the video are very useful to us.

keyframes

When doing FLV Demux, found that the official document does not describe the keyframes index, but this structure of the FLV each tag is not like TS have synchronization head, if there is no keyframes index, The effect of seek and fast forward rewind is very poor, because a tag is required to read sequentially. Later through the network to look up some information, found a keyframes of information hidden in the scriptdata .

keyframes is almost an unofficial standard, that is, civil standards. The FLV file format is hard to see online, but metadata does not contain videos of keyframes items. Two common tools for operating metadata are flvtool2 and flvmdi, all of which use keyframes as a default meta-information item. On the FLVMDI homepage (http://www.buraks.com/flvmdi/), there is a description:

keyframes: (object) This object was added only if you specify the/k switch. ' KeyFrames ' is known-flvmdi and if/k switch is not specified, ' keyframes ' object would be deleted.
' KeyFrames ' object has 2 arrays: ' Filepositions ' and ' Times '. Both arrays has the same number of elements, which is equal to the number of key frames in the FLV. Values in times array is in ' seconds '. Each correspond to the timestamp of the n ' th key frame. Values in filepositions array is in ' bytes '. Each correspond to the fileposition of the nth key frame video tag (which starts with byte tag type 9).

That is, keyframes contains 2 content ' filepositions ' and ' Times ', respectively, referring to the file location of the keyframe and the PTS of the keyframe. by keyframes You can build your own index , and then seek and fast-forward the quick-rewind operation, quickly and efficiently jump to the location of the keyframe you want to find to handle.

FLV Video Encapsulation Format

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.