For an example of a MP4 length of 10 seconds, the structure may be as follows:
Type:ftyp, size:24
Type:mdat, size:8884701
Type:mdat, size:136125
Type:moov, size:4656 1 Ftyp
A Ftyp describes the type of file, indicating which format it conforms to. It's usually MP4 format. There are many types of media that conform to this document, and the type of box entry is different, so brand and compatible_brands are required to describe the type of box in this file. The document defines Isom, AVC1, ISO2, mp71, ISO3 the format that the brand should have, and when the decoder reads its brand, it knows the format of the file.
Aligned (8) class Filetypebox extends Box (' Ftyp ') {
unsigned int () Major_brand;
unsigned int () minor_version;
unsigned int () compatible_brands[]; To end of the box
}
2. Mdat
In the example above, there are 2 Mdat, one is video content, another audio content. For H264, AAC encoded media, the content of its video mdat is nal, for audio, its content is a AAC frame. The frames in the mdat are stored sequentially, and each frame's position, time, and length are specified by the information in the Moov. As you can see, the Mdat is well formed, and this box contains only data.
Aligned (8) class Mediadatabox extends Box (' Mdat ') {
bit (8) data[];
3. Moov
Moov stores all the information about the movie, and a Moov contains multiple Trak. Usually for a movie, it's a video Trak, an audio Trak. This is also the focus of the MP4 file.
(1) TRAK/TKHD
For video Trak, keep wide and high information; for audio Trak, keep volume information. is not too important, the true initialization decoder depends on the information in the STSD.
(2) TRAK/MDIA/HDLR
Indicate whether the Trak is video or audio
(3) Trak/mdia/minf/stbl
All the important watches are here. which
-STSD: Encoder codec information
-Stsz: For the division of sample, usually a sample can correspond to a frame.
-STSC: More than one sample to form a trunk, but the actual operation can make a sample directly constitute a trunk
-The location of the Stco:trunk in the file for positioning.
-Stts/ctts: Specify PTS for each sample, DTS
(4) Trak/edts/elst
Divide the video into multiple segment, each starting time and length