FFmpeg Audio and video synchronization

Source: Internet
Author: User
Tags file size

Avstream

The struct describes a media stream

The main domain is interpreted as follows, where the value of most fields can be determined by av_open_input_file based on the information of the header, and the missing information needs to be further obtained by invoking the Av_find_stream_info read frame and soft decoding: index/ Id:index the index of the corresponding stream, the number is automatically generated, according to index can be indexed from the Avformatcontext::streams table to the stream, while the ID is the identity of the stream, depending on the specific container format. For example, for MPEG TS format, the ID is PID. Time_base: The time base of the stream, which is a real number, and the PTS and DTS for the media data in the stream will be the granularity of this time benchmark. In general, you can use AV_RESCALE/AV_RESCALE_Q to achieve different time benchmarks for conversions. Start_time: The start time of the stream, in the time base of the stream, usually the PTS of the first frame in the stream. Duration: The total time of the stream, in the time base of the stream. Need_parsing: The control domain for the flow parsing process. Nb_frames: The number of frames within the stream. R_frame_rate/framerate/avg_frame_rate: Frame rate dependent. Codec: The AVCODECCONTEXT structure that corresponds to the stream, generated when the av_open_input_file is called. Parser: The AVCODECPARSERCONTEXT structure that corresponds to the stream, generated when the av_find_stream_info is called.

Avformatcontext

This structure describes the composition and basic information of a media file or media stream.

This is the most basic structure in FFmpeg, is the root of all other structures, is the fundamental abstraction of a multimedia file or stream. Where: The avstream structure pointer array represented by NB_STREAMS and streams contains a description of all the embedded media streams; Iformat and Oformat point to corresponding demuxer and muxer pointers; PB points to a BYTEIOCONTEXT structure that controls the reading and writing of the underlying data. Start_time and duration are the starting time and length of the multimedia file inferred from each avstream of the streams array, in subtle units.

Typically, this structure is created internally by Av_open_input_file and initializes some of the members with default values. However, if the caller wants to create the structure themselves, you need to explicitly set the default for some members of the struct-if there are no default values, the subsequent action will cause an exception. The following members need to be followed: Probesize mux_rate packet_size flags max_analyze_duration key Max_index_size Max_picture_buffer Max_delay

Avpacket

Avpacket defined in Avcodec.h

FFmpeg uses Avpacket to temporarily store the media data (a sound/video frame, a caption packet, etc.) and additional information (decoding timestamp, display timestamp, length, etc.) after the de-duplication, before decoding. Where: DTS represents the decoding timestamp, and PTS represents the time stamp displayed, and their units are the time baselines of the owning media stream. Stream_index gives the index of the owning media stream, data is the buffer pointer, size is length, duration is the duration of the data, and is the time base of the owning media stream; Pos represents the byte offset of the data in the media stream; Destruct is the function pointer used to release the data buffer; Flags is a flag field, where a minimum of 1 indicates that the data is a keyframe.

The avpacket structure itself is simply a container that uses the data member to point to the actual buffer, which can be created through av_new_packet, either through av_dup_packet copies or by FFmpeg APIs (such as Av_read_ frame), which needs to be freed by calling Av_free_packet after use. Av_free_packet calls the destruct function of the struct itself, which has two conditions: 1) av_destruct_packet_nofree or 0;2) Av_destruct_packet, where The former simply clear the value of data and size by 0, and the latter will actually release the buffer. The FFmpeg internally uses the Avpacket structure to build the buffer load data, providing the destruct function, and if FFMPEG intends to maintain the buffer itself, destruct is set to Av_destruct_packet_nofree and the user calls Av_ Free_packet does not release the buffer when it is cleaned, and if ffmpeg no longer uses the buffer, destruct is set to Av_destruct_packet, indicating that it can be disposed. For avpackt that the buffer is not able to be freed, it is best to call Av_dup_packet to clone the buffer before using it, and convert it to a buffer that can be freed Avpacket to avoid causing an exception to the improper use of the buffer. Instead, Av_dup_packet creates a new buffer for the avpacket of the destruct pointer to Av_destruct_packet_nofree, and then copies the data from the original buffer to the new buffer, and the value of the set data is the address of the new buffer. Also set the destruct pointer to Av_destruct_packet.

Time Information

Time information is used to achieve multimedia synchronization.

The purpose of synchronization is to maintain the inherent temporal relationship between media objects when displaying multimedia information. There are two types of synchronization, one is in-stream synchronization, the main task is to ensure a single media flow within the time relationship to meet the perceptual requirements, such as playing a video in accordance with the specified frame rate, the other is the synchronization between streams, the main task is to ensure the different media flow between the time relationship, such as audio and video relationship (Lipsync)

For fixed-rate media, such as fixed-frame-rate video or fixed bitrate audio, time information (frame rate or bitrate) can be placed at the file header (header), such as Avi's HDRL List, MP4 's Moov box, and a relatively complex scenario is to embed time information inside the media stream, such as MPEG TS and real video, this solution can handle variable rate media, but also effectively avoid the time drift during the synchronization process.

FFMPEG will label each packet with a time tag to more effectively support the synchronization mechanism of the upper application. There are two kinds of time labels, one is DTS, called decoding Time label, and the other is PTS, which is called Display time label. For sound, these two time labels are the same, but for some video encoding formats, the use of bidirectional predictive technology results in inconsistent DTS and pts.

acquisition of time information:

By calling Av_find_stream_info, the multimedia application can get the time information of the media file from the Avformatcontext object: Mainly the total time and start time, plus the bit rate and file size associated with the time information. The unit of time information is av_time_base: microseconds.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.