Some basic concepts of gstreamer and A/V synchronization Analysis

Last Update:2018-12-05 Source: Internet

Author: User

Tags gstreamer

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. streaming)
The event and cache contained in the streamline are as follows:
-Events
-New_segment (NS)
-Eos (EOS )*
-Tag (t)
-Buffers (B )*
The number * that won the bid must be synchronized with the clock.

Typical stream 1:

Figure 1 media stream Composition

(1) new_segment, rate, start/stop, time
Including the valid timestamp range (start/stop); stream_time; the expected playback rate and the applied playback rate.
(2) Buffers
Only buffers between start and stop in new_segment can be displayed, otherwise they will be discarded or cropped.

Running_time calculation:
If (NS. rate> 0.0)
Running_time = (B. timestamp-ns. Start)/ns. abs_rate + NS. accum
Else
Running_time = (NS. Stop-B. timestamp)/ns. abs_rate + NS. accum

Stream_time calculation:
Stream_time = (B. timestamp-ns. Start) * NS. abs_applied_rate + NS. Time

(3) Eos
The end of the data.

Ii. Several clock concepts
1. Clock Time (absolute_time): A Global clock maintained by the MPs queue. It is a monotonically increasing clock time in the unit of nanoseconds. You can use the maid function. If no element in the MPs queue provides a clock, the system clock is used.
2. Base time: the global time value when the media starts from 0. It can be obtained through the _ get_time () function.
3. Running time: the time elapsed when the media is in the playing state.
4. Stream time: the position where the media is played (in the whole media stream ).
2.

Figure 2 gstreamer clock and variable diagram (derived from the gstreamer document)

We can conclude that:
Running_time = clock_time-base_time;
If a media stream starts to run at the same frequency, running_time = stream_time;

Detailed calculation of running time:
Running_time is based on the clock selected by the MPs queue. It represents the total time of media playing, as shown in table 1.

Table 1 computing table of running_time
MPs queue status
Running_time
Null/ready
Undefined
Paused
Pause Time
Playing
Absolute_time-base_time
Flushing seek
0

Iii. Clock provision and synchronization principles
Clock providers:
Because the media playing frequency is not necessarily the same as the global clock frequency, an element needs to provide a time for playing at the desired frequency. It is mainly responsible for ensuring that the clock is consistent with the current media time. You need to maintain the playback latency, buffer time management, and so on, affecting A/V synchronization.

Clock slaves:
Allocates a clock from the channels that contain them. You often need to call stst_clock_id_wait () to wait for the current sample to be played, or discard it.
When a clock is marked as stst_clock_flag_can_set_master, it can be set to belong to another clock, and then it can be synchronized with the subordinate master clock through constant correction. It is mainly used when the component provides an internal clock, and at this time the pipeline allocates another clock, it synchronizes them through constant correction. The concept of internal time and external time is introduced in the master-slave clock mechanism:
Internal time: the time provided by the clock, not adjusted;
External time: the time after correction based on the internal time.
Internal_calibration, external_calibration, rate_numerator, and rate_denominator attributes are stored in clock, and the external time is corrected using these values. The correction formula is
External = (internal-cinternal) * cnum/cdenom + cexternal;
External, internal, cinternal, cnum, cdenom, and cexternal indicate the external time, internal time, Internal correction values stored in clock, correction rate molecules, correction rate denominator, and external correction values respectively.

Master-slave synchronization can be adjusted with the following three attributes:
1. Timeout: defines the time interval for sampling the master clock from the clock;
2. Window-size: defines the number of samples to be sampled during correction;
3. Window-threshold: defines the minimum number of samples required for correction.

To synchronize different elements, the MPs queue is responsible for selecting and releasing a global clock for all elements in the MPs queue.
The timing of clock release includes:
1. the MPs queue enters the playing status;
2. Add an element that provides the clock;
Send a ststst_message_clock_provide message --> bus --> notification parent bin --> select a clock --> send a new_clock message --> Bus
3. remove an element that provides the clock;
Send a clock_lost message --> paused --> playing

Clock Selection Algorithm:
1. Select an element that provides the clock from the top of the media stream (most upstream;
2. If no elements in the MPs queue provide a clock, the system clock is used.

Pipeline synchronization is achieved through the following three aspects:
1. stststclock
The MPs queue selects a clock from all the elements that provide the clock, and then publishes it to all the elements in the MPs queue.
2. timestamps of stbuffer
3. new_segment event preceding the buffers

As mentioned above, running_time has two calculation methods:
1. Global clock and base_time CALCULATION OF ELEMENTS
Running_time = absolute_time-base_time;
2. calculate with the buffer Timestamp and newsegment event (assuming the rate is positive)
Running_time = (B. timestamp-ns. Start)/ns. abs_rate + NS. accum

Synchronization mainly ensures that the calculation values of the preceding two times are the same. That is
Absolute_time-base_time = (B. timestamp-ns. Start)/ns. abs_rate + NS. accum
And absolute_time is the buffer synchronization time (B. sync_time = absolute_time), so
B. sync_time = (B. timestamp-ns. Start)/ns. abs_rate + NS. accum + base_time

Wait before render until the clock reaches sync_time. for multiple streams, a stream with the same running_time will be played simultaneously. demuxer) ensure that the buffers to be played at the same time have the same running_time. Therefore, the buffers will be appended with the same timestamp to ensure synchronization.

Iv. latency calculation and implementation
1. Introduction of latency
The synchronization of element and clock in the pipeline only occurs in each sink. If other elements have no latency on the buffer, the latency is 0. Latency is introduced mainly based on this consideration. It takes some time for the buffer to be pushed from the source to the sink, which may cause the buffer to be discarded. This problem usually occurs in the active pipeline, where the sink is set to playing and the buffer is not pre-sent (preroll) to the sink.

2. Implementation of latency
The general solution is that all sinks cannot be set to playing State before being pre-delivered (preroll. To achieve this goal, the pipeline needs to track all the elements that need to be pre-delivered (that is, the elements that return async after the state changes). These elements send an async_start message. When the element is pre-delivered, set the status to paused and send an async_done message, which exactly corresponds to the previous async_start. After the MPs queue collects all the async_done messages corresponding to the async_start message, you can start to calculate the global latency.

3. latency Calculation

The latency calculation method 3 is shown in.

Figure 3 latency Calculation

The pipeline sets a latency by sending a latency event to all sinks in the pipeline. This event sets the total latency for sinks. The latency is the same for all sinks, in this way, the sink can maintain relative synchronization when submitting data.

5. QoS)
Service quality is about measuring and adjusting the real-time performance of pipelines. The real-time performance is measured mainly by the pipeline clock, which usually occurs in the buffer synchronization in the sink.
QoS is generally used in the buffer of a video for two reasons: one is that discarding audio is more troublesome than video, which is based on human physiological characteristics. The other is that, videos require more complex processing than audios, so they consume more time.

Source of service quality problems:
1. CPU load;
2. Network Problems;
3. Disk Load and memory bottlenecks.

The purpose of measurement is to adjust the data transmission rate in the element. There are two types of adjustment:
1. Urgent adjustments detected in the sink;
2. Long-term adjustment (Transfer Rate Adjustment) detected in the sink to detect the overall trend.

Service quality events:
Service quality events are collected by elements, including the following attributes:
1. Timestamp
2. Jitter
The difference between timestamps and the current clock. A negative value indicates arrival in time (in fact, the arrival time value in advance), and a positive value indicates the late time value.
3. Proportion
In order to get an ideal processing rate forecast for the optimized quality relative to the normal data processing rate.

The service quality is mainly implemented in the stststbasesink. Each time the buffers arriving at the sink is processed with a render, a service quality event is triggered. This event is sent to the upstream element by sending the calculated information, to notify upstream element to make corresponding adjustments to ensure service quality (mainly audio and video synchronization ). One of the key information here is the processing rate. The calculation of the processing rate is as follows:
First understand the calculation of AVG values:
Next_avg = (current_vale + (size-1) * current_avg)/size
In this case, the size is generally 8, except for the calculation of the average processing rate (4 or 16 ).

Jitter is calculated by the buffer Timestamp and the current time.
Jitter = current_time-timestamps;
Jitter <0 indicates that the buffer arrives at the sink in advance;
Jitter> 0 indicates that the buffer reaches the sink after a long time of jitter;
Jitter = 0 indicates exactly.

The following describes the processing rate calculation process:
Start = sink-> priv-> current_rstart;
Stop = sink-> priv-> current_rstop;
Duration = stop-start;
If jitter is <0, entered = start + jitter; left = start;
If jitter is greater than 0, entered = left = start + jitter;
Entered indicates the time when the buffer reaches the sink, and left indicates the time when the buffer is extended by the render.
PT = entered-Sink-> priv-> last_left;
Avg_pt and avg_duration are calculated based on the formula for calculating the average value;
Rate = avg_pt/avg_duration;

If 0 <rate <1, it indicates that the upstream element production speed is relatively fast, resulting in the sink being too late to process, resulting in flood;
If rate = 1, perfect;
If the rate is greater than 1, the upstream element cannot provide enough buffer to the sink, which may cause starvation.

Then, the timestamp, jitter, and rate of the current buffer are sent to upstream elements by sending QoS messages to inform them of corresponding processing, such as discarding some buffer. In addition, you can estimate the buffer for the next render in time.

6. Implementation of Synchronization
Gstreamer synchronization is mainly implemented in sink and before render. Therefore, it is generally implemented in the function stststbasesink: render. Synchronization refers to the synchronization between the buffer and the clock before it enters every sink and render. After a media stream is deduplicated, a timestamp is added to the buffers of multiple streams (such as audio streams and video streams). Therefore, the time stamp is synchronized with the clock before the sink outputs, the synchronous output of A/V can be achieved.
Prior to the gstreamer-0.10.3, the implementation of synchronization in the ststbasesink function ststst_base_sink_render_object () (4) has very few child classes that overwrite it. In later versions, some specific sink sub-classes are overwritten to achieve the best synchronization effect. For example, the synchronization of audio is overwritten in the stst_base_audio_sink_render () function of the ststbaseaudiosink function. However, video synchronization is not covered and is still implemented in the base class. Therefore, the implementation of A/V synchronization mainly involves the two methods, namely, stst_base_sink_render_object () and stst_base_audio_sink_render.
The process of processing the stst_base_sink_render_object () is shown in step 4. The sync object process is shown in step 5.

Figure 4 render object processing process

Figure 5 synchronization object Flowchart

In audio processing, the concept of ringbuffer is introduced. It is only used in audio and the class name is gstringbuffer. Next we will give a brief introduction to the design of ringbuffer:
Ringbuffer is composed of several consecutive segments. It has a playing position, and the playing position is always in the unit of segment. This position is the position where the device currently reads the sample from the buffer. 6.
When playing, samples is written to the device. After each segment is written, ringbuffer calls the configured callback function and moves the playback position backward.

Figure 6 ringbuffer

The normal buffer is represented by the ststbuffer, And the ringbuffer is equivalent to encapsulating the buffer again in the audio, so that it has some features, such as stateful (stopped, paused, started.
Render reconstruction in audio is mainly implemented in the function stst_base_audio_sink_render (). Before writing data to the device, you may also need to synchronize and crop the output data according to the settings, process 7.

Figure 7 audio render Flowchart

VII. debugging Analysis
(1) According to the debugging results and analysis, it can be found that under normal circumstances, writing to the device is performed in the order of first audio and then video, generally, 8-9 audio segments and 6-7 video segments are written each time. After each write, the next write round is performed only after the other streams are written.
(2) Example: the audio uses pulsesink, and the video uses ximagesink as an example to describe the processing process from synchronization to output. In the following process, the audio and video processing functions/processes are used from top to bottom.
Audio (pulsesink) Video (ximagesink)
Ststst_base_sink_render_object stst_base_sink_render_object
Ststst_base_sink_do_sync stst_base_sink_do_sync
Ststst_base_sink_get_sync_times stst_base_sink_get_sync_times
* Maid =-1 * maid! =-1
@ Maid
A series of synchronization procedures
Gst_ring_buffer_commit_full @ gst_ximagesink_show_frame
Ststst_ximagesink_ximage_put

From the above process, we can see that the audio synchronization process is implemented in the stst_base_audio_sink_render, while the video synchronization is implemented in the stst_base_sink_render_object and differentiated after the get_times method. "*" Indicates the implementation of get_times in the audio and video sub-classes, and "@" indicates the implementation of render in the audio and video sub-classes.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More