FFmpeg decoding process.

Last Update:2018-12-05 Source: Internet

Author: User

Tags flv file

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Document directory

1. register all container formats and CODEC: av_register_all ()
2. open the file: av_open_input_file ()
3. Extract stream information from the file: av_find_stream_info ()
4. Search for all the streams in codec_type_video format.
5. Find the corresponding decoder: avcodec_find_decoder ()
6. Enable the codec: avcodec_open ()
7. allocate memory for decoding frames: avcodec_alloc_frame ()
8. Continuously extract frame data from the code stream: av_read_frame ()
9. Determine the frame type. For Video Frames, call avcodec_decode_video ()
10. Release the decoder avcodec_close () After decoding ()
11. Close the input file av_close_input_file ()

FFmpeg decoding process:

1. register all container formats and CODEC: av_register_all () 2. open the file: av_open_input_file () 3. extract stream information from the file: av_find_stream_info () 4. search for all the streams in codec_type_video5. find the corresponding decoder: avcodec_find_decoder () 6. enable the codec: avcodec_open () 7. allocate memory for decoded frames: avcodec_alloc_frame () 8. extract frame data from the code stream continuously: av_read_frame () 9. determine the frame type. For Video Frames, call avcodec_decode_video () 10. after decoding, release the decoder avcodec_close () 11. close the input file: av_close_input_file ()

The first thing is to open a video file and get a stream from it. The first thing we need to do is to use av_register_all (); to initialize libavformat/libavcodec:

This step registers all available file formats and encoders in the library so that when a file is opened, they can automatically select the corresponding file format and encoder. Av_register_all () only needs to be called once, so it should be placed in the initialization code. You can also register only the file format and encoding of an individual.

Next, open the file:

Avformatcontext * pformatctx;
Const char * filename = "myvideo. mpg ";
Av_open_input_file (& pformatctx, filename, null, 0, null); // open the video file
The last three parameters describe the file format, buffer size (size), and format parameters. By specifying null or 0, we will tell libavformat to automatically detect the file format and use the default buffer size. The format parameter here refers to the video output parameter, such as the coordinates of the width and height.

Next, we need to retrieve the stream information contained in the file:
Av_find_stream_info (pformatctx); // retrieves the stream information

Avformatcontext struct

Dump_format (pformatctx, 0, filename, false); // you can use this function to output all the obtained parameters.

For (I = 0; I <pformatctx-> nb_streams; I ++) // differentiate between video streams and audio streams
If (pformatctx-> streams-> codec. codec_type = codec_type_video) // find the video stream. You can change it to audio.
{
Videostream = I;
Break;
}

Next we need to find the decoder

Avcodec * pcodec;
Pcodec = avcodec_find_decoder (pcodecctx-> codec_id );

Avcodec_open (pcodecctx, pcodec); // enable the decoder
Allocate space for video frames to store decoded images:

Avframe * pframe;
Pframe = avcodec_alloc_frame ();

//////////////////////////////////////// /Start decoding ///////////////////////////////////// //////

The first step is to read data:

What we will do is read the entire video stream by reading the package, decode it into a frame, convert the format, and save it.

While (av_read_frame (pformatctx, & Packet)> = 0) {// read data

If (packet. stream_index = videostream) {// determines whether the video stream is video streams.

Avcodec_decode_video (pcodecctx, pframe, & framefinished,

Packet. Data, packet. size); // decode

If (framefinished ){

Img_convert (avpicture *) pframergb, pix_fmt_rgb24, (avpicture *) pframe, pcodecctx-> pix_fmt, pcodecctx-> width, pcodecctx-> height); // conversion}

Saveframe (pframergb, pcodecctx-> width, pcodecctx-> height, I); // save data

Av_free_packet (& Packet); // release

Av_read_frame () reads a package and saves it to the avpacket struct. The data can be released later through av_free_packet. Function avcodec_decode_video () converts a package to a frame. However, when decoding a packet, we may not get the required frame information. Therefore, when we get the next frame, avcodec_decode_video () sets the frame end mark framefinished for us. Finally, we use the img_convert () function to convert frames from the original format (pcodecctx-> pix_fmt) to the RGB format. Remember, you can
Converts an avframe struct pointer to an avpicture struct pointer. Finally, we pass the frame and height width information to our saveframe function.

After decoding, the display process is completed using SDL. Considering that we will use firmware for display in the future, SDL ignores this.

Audio/Video Synchronization

DTS (Decoding timestamp) and PTS (displaying timestamp)

When we call av_read_frame () to obtain a package, the information of PTS and DTS will also be saved in the package. But what we really want is the PTS of the original frame we just decoded, so that we can know when to display it. However, the frame we get from the avcodec_decode_video () function is only an avframe and does not contain any useful PTS values (Note: avframe does not contain timestamp information, but when we wait for the frame, it is not what we want ).. We save the PTS of the first packet of a frame: this will serve as the PTS of the entire frame. We can use the avcodec_decode_video () function to calculate which package is the first package of a frame. How to implement it? At any time, when a package starts to frame
Avcodec_decode_video () will call a function to request a buffer for a frame. Of course, FFMPEG allows us to redefine the function for allocating memory. Calculate the timestamp of the previous frame and the current frame to predict the time of the next timestamp. At the same time, we need to synchronize the video to the audio. We will set an audio time audioclock; an internal value records the position of the Audio being played. Just like the number read from any mp3 player. Since we synchronize the video to the audio, the video thread uses this value to determine whether it is too fast or too slow.

How to solve the problem of audio/video synchronization when using ffmpeg sdk for Video Transcoding and compression:

When you use the ffmpeg sdk for Video Transcoding and compression, you can view the video content after the transcoding is successful and find that the audio and video are not synchronized. This is indeed an annoyance. I encountered this problem when I used FFMPEG SDK to filter the FLV file encoding in h264 format.

After research, we found that the ffmpeg sdk controls the write timestamp in two places when writing videos. One is avpacket and the other is avframe. When avcodec_encode_video is called, you need to pass in the avframe Object Pointer, that is, transfer an uncompressed video for compression. avframe contains a PTS parameter, this parameter is the timestamp of the current frame in future playback. In avpacket, PTS and DTs are also included. To do this, we must describe the three types of video compression frames I, P, and B. I-frame is a key frame, which does not depend on other video frames. P-frame is a frame to be predicted forward and only depends on the previous video frame. B-frame is a two-way prediction video frame, it depends on the front and back video frames. Because frame B exists, because it is bidirectional, you must know the details of the previous video frame and the subsequent video frame before you can know the final image to be presented for this frame B. The PTS and DTS parameters are used to control the display and decoding sequence of video frames.

PTS is the sequence in which frames are displayed.

DTS is the sequence in which frames are read and decoded.

If no B frame exists, DTS and PTS are the same. Otherwise, they are different. For details about this, refer to the principle of MPEG.

What are the values of PTS and DTs in avpacket?

PTS and DTS must set the video frame decoding and display sequence. Add one for each frame added, instead of the video playback timestamp.

However, it is proved that the video decoded with rmvb is not fixed frame rate, but the frame rate is changed. In this way, if each frame is compressed, the PTS and DTS add-on solution causes audio and video not to be synchronized.

How can we solve the problem of audio/video synchronization?

See the following code snippet.

Ltimestamp is the timestamp of the current video frame obtained through DirectShow.

M_llframe_index indicates the number of frames that have been compressed.

First, av_rescale calculates the video frame whose current compression processing requires processing timestamp. If the timestamp has not reached the timestamp of the video frame currently provided by DirectShow, the frame is discarded.

Otherwise, perform the compression operation. Set PTS and DTS for avpacket. Assume that frame B does not exist.

Because in the future, the video will be played at the fixed frame rate we set, therefore, you need to compare the video frame timestamp calculated based on the set playback frame rate with the timestamp of the current video frame provided by DirectShow, and determine whether to implement a playback delay policy. If you need to delay playback, add Step 2 to PTS; otherwise, set it to 1. DTS is the same as normal speed playback.
_ Int64 x = av_rescale (m_llframe_index, av_time_base * (int64_t) C-> time_base.num, C-> time_base.den );

If (x> ltimestamp)
{
Return true;
}
M_pvideoframe2-> PTS = ltimestamp;
M_pvideoframe2-> pict_type = 0;

Int out_size = avcodec_encode_video (C, m_pvideo_outbuf, video_outbuf_size, m_pvideoframe2 );
/* If zero size, it means the image was buffered */
If (out_size> 0)
{
Avpacket Pkt;
Av_init_packet (& Pkt );

If (x> ltimestamp)
{
Pkt. PTs = Pkt. DTS = m_llframe_index;
Pkt. Duration = 0;
}
Else
{
Pkt. Duration = (ltimestamp-x) * C-> time_base.den/1000000 + 1;
Pkt. PTs = m_llframe_index;
Pkt. DTS = Pkt. PTS;
M_llframe_index + = Pkt. duration;
}

// Pkt. PTs = ltimestamp * (_ int64) frame_rate.den/1000;
If (c-> coded_frame & C-> coded_frame-> key_frame)
{
Pkt. Flags | = pkt_flag_key;
}

Pkt. stream_index = m_pvideostream-> index;
Pkt. Data = m_pvideo_outbuf;
Pkt. size = out_size;

/* Write the compressed frame in the media file */
Ret = av_interleaved_write_frame (m_pavformatcontext, & Pkt );
}
Else
{
Ret = 0;
}

Why is the frame decoded by avcodec_decode_video smaller than the front PTS?

The following code is provided:
While (av_read_frame (pformatctxsource, & Packet)> = 0)
{
If (packet. stream_index = videostream)
{
Int out_size = avcodec_decode_video (pcodecctxsource, pframesource, & bframefinished, packet. Data, packet. size); // decode fromsource Frame

If (bframefinished)
{
Pframesource-> PTS = av_rescale_q (packet. pts, pcodecctxsource-> time_base, pstcodec-> time_base );
Int out_size = avcodec_encode_video (pstcodec, video_buffer, 200000, pframesource); // encodeto output
If (out_size> 0)
{
//...
}
}
}

Av_free_packet (& Packet );

}

When I decode, The pframesource-> PTS obtained from the first frame is 96. When the second frame is resolved, pframesource-> PTS becomes 80 after computation, the next few frames are also smaller than 96. After a while, more than 100 frames will be obtained, and the next frame is smaller than 100 frames. Why? In encode, encode a frame with PTS = 96 first, and then encode a frame smaller than 96 then returns-1 until a frame greater than 96 is found.

In addition, is the PTS calculation method correct?

Reply:

Because you have B-Frame

For example:

The inputsequence for Video Encoder
1 2 3 4 5 6 7
I B P B I

Let's take1, 2, 3... as PTS for simplification

The out sequencefor video encoder (this equals the decoder sequence)
1 4 2 3 7 5 6
I P B I B

You will get APTS sequence as following:

1 4 2 3 7 5 6

7 5 6 sequence will be same as your question

Oh, isn't my PTS so computation? Instead, we need to add more than 1 each time, right? So where should PTS and DTS be used in packet? If I decode the data in the storage order, do I need to cache the data myself? Thank you!

In addition, there is another problem. Since the decoded image is not necessarily obtained in the ascending order of PTS, When I encode the image, should the encoding be performed according to the decoded frame sequence? Or is it possible to cache the frames first and then strictly encode them according to the display sequence of the images? It is represented by code:
Method 1:
While (av_read_frame)
{
Decoding;
PTS + 1;
Encoding;
Output;
}

Method 2:
While (av_read_frame)
{
Decoding;
If (PTS <previous)
{
Cache;
}
Else
{
Encode cached frames and write them into files;
}
}

Which of the two methods is correct? Because I see that the Code on the Internet uses method 1, but I think method 2 is correct?

The output of decoderis the right order for display because I/P frames will be cacheduntil next I/P

Understanding:

After decoder, the output PTS is output in the normal order, that is, the display order. If there are B frames, decoder will cache them.

However, after encoder, the output is output by DTs.

PTS, DTS is not a timestamp, but should be understood as the sequence number of the frame. The frame rate of each frame is not necessarily the same and may change. If it is converted to a timestamp, it should be (PTS * frame rate ). To deepen understanding

The PTS ratio can be set to the frame of the second PTS frame. If the frame rate of each frame remains unchanged, the displayed timestamp is (PTS * frame rate). If you consider the frame rate change, you need to add the (PTS * Current Frame Rate) to the end.

Add a trace under decode in tutorial5 and then print it:

Len1 = avcodec_decode_video (is-> video_st-> codec, pframe, & framefinished,

Packet-> data, packet-> size );

Printf ("------------------------------------------------------------------------------- \ n ");

Printf ("avcodec_decode_videopacket-> PTS: % x, packet-> DTS: % x \ n", packet-> pts, packet-> DTS );

Printf ("avcodec_decode_videopframe-> pkt_pts: % x, pframe-> pkt_dts: % x, pframe-> PTS: % x \ n", pframe-> pkt_pts, pframe-> pkt_dts, pframe-> PTS );

If (pframe-> opaque)

Printf ("avcodec_decode_video * (uint64_t *) pframe-> opaque: % x \ n", * (uint64_t *) pframe-> opaque );

Print an MP4 file:

-----------------------------------------------------------------------------

Avcodec_decode_video packet-> PTS: 1ae, packet-> DTS: 0