FFmpeg Integrated Application Example (iv)--video-to-audio synchronization of the camera live

Last Update:2018-07-25 Source: Internet

Author: User

Tags flush time interval

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the FFmpeg ultra-detailed comprehensive tutorial---------------a sample of reading PC camera video data and sending it as a live stream with rtmp protocol is done in the camera live text, but the audio support is not implemented, so the example is further perfected in this article and the video synchronization problem of live stream is analyzed in detail. , a code example is also given.

For the live stream, here only the synchronization of the sending side problem, and the principle is very simple, summed up into the following steps:

1, analytic av stream, the video stream and the audio stream time stamp with the same time base representation

2, compare the two time stamps after conversion, find the smaller value, corresponding to send a slow stream

3, read, transcode, send the corresponding stream, at the same time, if the stream transcoding time quickly, ahead of the wall clock, you also need to carry out the corresponding delay

4, Cycle repeat the above process

The code of this article is in the previous article on the basis of the modification, mainly two parts, one is the content of audio transcoding, and the second is the content of audio-visual synchronization. basic flow of audio transcoding

The first is some basic settings for audio input and output, very simple and common, as follows

Set own audio device ' s name if (Avformat_open_input (&ifmt_ctx_a, Device_name_a, IFMT, &device_param)! = 0) {
        printf ("couldn ' t open input Audio stream. (Unable to turn on input stream) \ n");
    return-1; } ...//input audio initialize if (Avformat_find_stream_info (ifmt_ctx_a, NULL) < 0) {printf ("couldn ' t f
        IND Audio Stream Information (unable to get stream information) \ n ");
    return-1;
    } Audioindex =-1; for (i = 0; i < ifmt_ctx_a->nb_streams; i++) if (Ifmt_ctx_a->streams[i]->codec->codec_type = = AVMEDIA_T
        Ype_audio) {audioindex = i;
    Break
        } if (Audioindex = =-1) {printf ("couldn ' t find a audio stream. (no video stream found) \ n");
	return-1; } if (Avcodec_open2 (Ifmt_ctx_a->streams[audioindex]->codec, Avcodec_find_decoder (ifmt_ctx_a->streams[
        audioindex]->codec->codec_id), NULL) < 0) {printf ("Could not open audio codec. (Unable to turn on decoder) \ n");
    return-1; } ...//output audio encoderInitialize pcodec_a = Avcodec_find_encoder (AV_CODEC_ID_AAC); if (!pcodec_a) {printf ("Can not" Find output audio encoder! (no suitable encoder found.)
        ) \ n ");
    return-1;
    } pcodecctx_a = Avcodec_alloc_context3 (pcodec_a);
    Pcodecctx_a->channels = 2;
	Pcodecctx_a->channel_layout = Av_get_default_channel_layout (2);
    Pcodecctx_a->sample_rate = ifmt_ctx_a->streams[audioindex]->codec->sample_rate;
    PCODECCTX_A-&GT;SAMPLE_FMT = pcodec_a->sample_fmts[0];
    Pcodecctx_a->bit_rate = 32000;
	Pcodecctx_a->time_base.num = 1;
    Pcodecctx_a->time_base.den = pcodecctx_a->sample_rate; /** allow the use of the experimental AAC encoder */pcodecctx_a->strict_std_compliance = Ff_compliance_experimenta
    L /* Some formats want stream headers to be separate. */if (Ofmt_ctx->oformat->flags & Avfmt_globalheader) pcodecctx_a->flags |= Codec_flag_global_hea
    DER; if (Avcodec_open2 (Pcodecctx_a, Pcodec_a, NULL) < 0{printf ("Failed to open ouput audio encoder! (Encoder open failed.)
        ) \ n ");
    return-1; }//add a new stream to output,should is called by the user before Avformat_write_header () for muxing Audio_st =
    Avformat_new_stream (Ofmt_ctx, pcodec_a);
    if (Audio_st = = NULL) {return-1;
	} audio_st->time_base.num = 1;
    Audio_st->time_base.den = pcodecctx_a->sample_rate; Audio_st->codec = pcodecctx_a;

Next, take into account that the input audio sample format may need to be converted, you need to use the function of the Swresample library

First do the corresponding initialization

Initialize the Resampler to is able to convert audio sample formats
	Aud_convert_ctx = swr_alloc_set_opts (NULL,
		Av_get_default_channel_layout (pcodecctx_a->channels),
		pcodecctx_a->sample_fmt,
		pCodecCtx_a-> Sample_rate,
		av_get_default_channel_layout (ifmt_ctx_a->streams[audioindex]->codec->channels),
		ifmt_ctx_a->streams[audioindex]->codec->sample_fmt,
		ifmt_ctx_a->streams[audioindex]-> Codec->sample_rate,
		0, NULL);
Swr_init (AUD_CONVERT_CTX);

In addition, I refer to the practice of TRANSCODE_AAC.C, using FIFO buffer to store audio samples decoded from the input, the data will be converted to the sample format and encoded, thus completing an audio transcoding function, It is similar to the video transcoding in the previous article.

In addition, another buffer is required to store the audio data after the converted format.

Initialize the FIFO buffer to store audio samples to be encoded. 
    Avaudiofifo *fifo = NULL;
	FIFO = Av_audio_fifo_alloc (Pcodecctx_a->sample_fmt, pcodecctx_a->channels, 1);

	Initialize the buffer to store converted samples to be encoded.
	uint8_t **converted_input_samples = NULL;
	/**
	* Allocate as many pointers as there are audio channels.
	* Each pointer would later point to the audio samples of the corresponding
	* channels (although it may is NULL for inte rleaved formats).
	*
	/if (!) ( Converted_input_samples = (uint8_t**) calloc (pcodecctx_a->channels,
		sizeof (**converted_input_samples)))) {
		printf ("Could not allocate converted input sample pointers\n");
		Return Averror (ENOMEM);
	}

At this point, some basic initialization work is completed, now we do not look at the audio synchronization of content, only see the audio transcoding part. Several variables that appear in the code can be ignored first, that is, the three variables aud_next_pts vid_next_pts and Encode_audio.

Friends who have seen my video live tutorial article should find here the method of calculating PTS and similar there. That is, the time interval between each two audio sample is calculated by the sample rate, and the timestamp of the current encoded audio frame is calculated by counting the total number of currently encoded audio sample (the function of the nb_samples variable).

If you make an analogy with the video flow, it's probably the following relationship: Framerate is equivalent to sample rate;framecnt equivalent to Nb-samples.

At the same time, we can see that the method of delay here is different from the previous method, or the same, we do not care here, first concentrate on learning the basic process of audio transcoding.

Audio trancoding here const int output_frame_size = pcodecctx_a->frame_size; /** * Make sure this there is a frame worth of samples in the FIFO * buffer so that the encoder can do I
        TS work. * Since the decoder ' and the encoder ' s frame size may differ, we * need to FIFO buffer to store as many frames Wo
        Rth of input samples * that they make up at least one frame worth of output samples. */while (Av_audio_fifo_size (FIFO) < output_frame_size) {/** * Decode one frame worth
            of audio samples, convert it to the * Output sample format and put it into the FIFO buffer.
			*/Avframe *input_frame = Av_frame_alloc ();
				if (!input_frame) {ret = Averror (ENOMEM);
			return ret; }/** Decode One frame worth of audio samples. *//** Packet used for temporary storage.
			*/Avpacket Input_packet;
			Av_init_packet (&input_packet); Input_packEt.data = NULL;
			
			input_packet.size = 0; /** Read One audio frame from the input file into a temporary packet. */if (ret = Av_read_frame (ifmt_ctx_a, &input_packet)) < 0) {/** If we are at the end of the file, flush T He decoder below.
				*/if (ret = = averror_eof) {encode_audio = 0;
					} else {printf ("Could not read audio frame\n");
				return ret;
			}}/** * Decode the audio frame stored in the temporary packet.
			* The input audio stream decoder is used to does this.
			* If We are at the end of the file, pass a empty packet to the decoder * to flush it. */if (ret = Avcodec_decode_audio4 (Ifmt_ctx_a->streams[audioindex]->codec, Input_frame, &dec_got_frame_a
				, &input_packet)) < 0) {printf ("Could not decode audio frame\n");
			return ret;
			} av_packet_unref (&input_packet); /** If there is decoded data, convert and store it */If (dec_got_frame_a) {/** * AllocAte memory for the samples of any channels in one consecutive * block for convenience. */if (ret = Av_samples_alloc (Converted_input_samples, NULL, Pcodecctx_a->channels, input_frame->nb_s
					Amples, PCODECCTX_A-&GT;SAMPLE_FMT, 0)) < 0) {printf ("Could not allocate converted input samples\n");
					Av_freep (& (*converted_input_samples) [0]);
					Free (*converted_input_samples);
				return ret;
				}/** * Convert the input samples to the desired output sample format.
				* This requires a temporary storage provided by Converted_input_samples. *//** Convert The samples using the Resampler. */if (ret = Swr_convert (Aud_convert_ctx, Converted_input_samples, Input_frame->nb_samples, (const UINT8 _t**) Input_frame->extended_data, input_frame->nb_samples)) < 0) {printf ("Could not convert input samples\n"
					);
				return ret; }/** Add The converted input samples to the FIFO buffer for later ProceSsing.
				*//** * Make the FIFO as large as it needs-to-be-to-hold both, * the old and the new samples. */if (ret = Av_audio_fifo_realloc (FIFO, Av_audio_fifo_size (FIFO) + input_frame->nb_samples)) < 0) {print
					F ("Could not reallocate fifo\n");
				return ret; }/** Store The new samples in the FIFO buffer. */if (Av_audio_fifo_write (FIFO, (void * *) converted_input_samples, input_frame->nb_samples) < input_frame-&
					Gt;nb_samples) {printf ("Could not write data to fifo\n");
				return averror_exit;
        }}}/** * If We have enough samples for the encoder, we encode them.
        * At the end of the file, we pass the remaining samples to * the encoder. */if (Av_audio_fifo_size (FIFO) >= output_frame_size)/** * Take one frame worth of AUD
            IO samples from the FIFO buffer, * encode it and write it to the output file.
        */{/** temporary storage of the output samples of the frame written to the file. */Avframe *output_frame=av
			_frame_alloc ();
				if (!output_frame) {ret = Averror (ENOMEM);
			return ret;
			}/** * Use the maximum number of possible samples per frame. * If There is less than the maximum possible frame size in the FIFO * buffer with this number. Otherwise, use the maximum possible frame size */Const int frame_size = Ffmin (Av_audio_fifo_size (FIFO), PCODECC
			
			Tx_a->frame_size); /** Initialize temporary storage for one output frame.
			*//** * Set The frame ' s parameters, especially its size and format.
			* Av_frame_get_buffer needs this to allocate memory for the * audio samples of the frame.
			* Default channel layouts based on the number of channels * is assumed for simplicity.
			*/output_frame->nb_samples = frame_size;
			Output_frame->channel_layout = pcodecctx_a->channel_layout; Output_frame->format = Pcodecctx_a->sample_fmt;

			Output_frame->sample_rate = pcodecctx_a->sample_rate; /** * Allocate The samples of the created frame.
			This call would make * sure, the audio frame can hold as many samples as specified.
				*/if (ret = Av_frame_get_buffer (output_frame, 0)) < 0) {printf ("Could not allocate output frame samples\n");
				Av_frame_free (&output_frame);
			return ret;
			}/** * Read as many samples from the FIFO buffer as required to fill the frame.
			* The samples is stored in the frame temporarily. */if (Av_audio_fifo_read (FIFO, (void * *) output_frame->data, frame_size) < frame_size) {printf ("Could not re
				Ad data from fifo\n ");
			return averror_exit; }/** Encode One frame worth of audio samples. *//** Packet used for temporary storage.
			*/Avpacket Output_packet;
			Av_init_packet (&output_packet);
			Output_packet.data = NULL;
			
			output_packet.size = 0; /** Set a timestamp based on the Sample rate for the container.
			*/if (output_frame) {nb_samples + = output_frame->nb_samples;
			}/** * Encode the audio frame and store it in the temporary packet.
			* The output audio Stream encoder is used to does this. 
				*/if (ret = Avcodec_encode_audio2 (pcodecctx_a, &output_packet, Output_frame, &enc_got_frame_a)) < 0) {
				printf ("Could not encode frame\n");
				Av_packet_unref (&output_packet);
			return ret; }/** Write One audio frame from the temporary packet to the output file.

				*/if (enc_got_frame_a) {output_packet.stream_index = 1;
				Avrational time_base = ofmt_ctx->streams[1]->time_base;  
				Avrational r_framerate1 = {ifmt_ctx_a->streams[audioindex]->codec->sample_rate, 1};//{44100, 1};  int64_t calc_duration = (double) (av_time_base) * (1/av_q2d (r_framerate1));
				Internal timestamp output_packet.pts = Av_rescale_q (nb_samples*calc_duration, Time_base_q, time_base); Output_packet.dts = output_packet.pts;

				Output_packet.duration = output_frame->nb_samples;
				printf ("Audio pts:%d\n", output_packet.pts);

				aud_next_pts = nb_samples*calc_duration;
				int64_t pts_time = Av_rescale_q (output_packet.pts, Time_base, time_base_q);
				
				int64_t now_time = Av_gettime ()-start_time; if ((Pts_time > Now_time) && ((aud_next_pts + pts_time-now_time) <vid_next_pts) av_usleep (Pts_time-

				Now_time);
					if (ret = Av_interleaved_write_frame (Ofmt_ctx, &output_packet)) < 0) {printf ("Could not write frame\n");
					Av_packet_unref (&output_packet);
				return ret;
			} av_packet_unref (&output_packet);		
        } av_frame_free (&output_frame);      }

AV Synchronization Now let's take a formal look at how to do video-to-audio synchronization, first we define several variables

<span style= "White-space:pre" >	</span>int aud_next_pts = 0;//Video Stream current pts, can be understood as current progress
	int Vid_next_ pts = 0;//Audio stream current pts
	int encode_video = 1, Encode_audio = 1;//whether to encode video, audio

The corresponding audio-visual synchronization method is as follows.

1, first to determine the video, audio and at least one of them is required to transcode,

2, compare the progress of two streams, using the AV_COMPARE_TS function, note: At this time base vid_next_pts and aud_next_pts are ffmpeg internal benchmark, that is

Avrational time_base_q = {1, av_time_base};

3, the backward progress of the stream transcoding, and corresponding to the progress of the update. For video, there is vid_next_pts=framecnt*calc_duration; for audio, there is aud_next_pts = nb_samples*calc_duration; here framecnt and Nb_ Samples are equivalent to counters, while Calc_duration is the time interval between each two frame or sample of the corresponding stream, and is based on the FFmpeg internal time base.
4, if the transcoding progress is completed quickly, it is not eager to write to the output stream, but need to delay first, but also to determine the time lag will not exceed the progress of another stream

In summary, the process is as follows

 Start decode and encode int64_t start_time = Av_gettime (); while (Encode_video | | encode_audio) {if (Encode_video && (!encode_audio | | av_compare_ts (vid_next
              _pts, Time_base_q, Aud_next_pts, time_base_q) (<= 0)) {video transcoding; after transcoding is completed; Vid_next_pts=framecnt*calc_duration; General Timebase//delay int64_t pts_time = Av_rescale_q (enc_pkt.pts, Time_base, Time_bas
						E_Q);						
						int64_t now_time = Av_gettime ()-start_time; if ((Pts_time > Now_time) && ((vid_next_pts + pts_time-now_time) <aud_next_pts) Av_usleep (pts_time
              -Now_time);

				Write stream,} else {audio transcoding, after transcoding, aud_next_pts = nb_samples*calc_duration;
				int64_t pts_time = Av_rescale_q (output_packet.pts, Time_base, time_base_q);
				int64_t now_time = Av_gettime ()-start_time; if ((Pts_time > Now_time) && ((aud_next_pts + pts_Time-now_time) (<vid_next_pts)) Av_usleep (pts_time-now_time); Write stream;}

At this point, the video-audio synchronization is completed. Finally, finish the work of some flush encoder.

In addition, there is a pit, when using dshow device push flow, often reported real time buffer too full dropping frames error message, the reason is written in this article, you can add rtbufsize parameters to solve, The higher the code rate corresponds to the higher the rtbufsize, but the high rtbufsize will bring the video delay, in order to maintain synchronization, it may be necessary to add a certain delay to the audio. According to my test, even if I do not add the Rtbufszie parameter, although I will report the error message, but it does not affect the live stream viewing or recording, and can stay in sync. This is a matter of trade off.

Finally, this project is the full source code github address. You are welcome to point out errors and share discussions.

Everyone crossing, if you feel that my blog is helpful to you, you can scan the following QR code to make a reward, how much you are free ~

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More