Digital Audio Mixer algorithm

Source: Internet
Author: User

1.1 questions raised

Mix means mixing, both in nature and in the field of audio processing, which is a very common phenomenon. In nature you can hear birds and water at the same time because the waves of birdsong and water are superimposed in the air, and the ears can be heard to distinguish between the two waveforms of birdsong and water.

In the Digital audio field is the same, for example, you can also play CS while listening to music, this is because the computer has two of sound waveform superimposed. But the difference is that the overlay in the computer can easily cause a cross-border.

Like what

int plus1 (int NUM0, int num1) {

return NUM0+NUM1;

}

If you assign int num0=0x70000000 and int num1=0x70000000, the run result is 0xe0000000, and the transformation is decimal-536870912. A negative number is added to the two positive numbers, and the result is naturally wrong.

We know that a char's complement can be represented by a range of values [-128, 127], written as 16 binary is [0x80,0x7f]. And the range of the complement of an int is [0X80000000,0X7FFFFFFF]. Exceeding this range is overflow.

How to prevent overflow? The simplest approach is to broaden the container for storing data, such as:

Long long plus1 (int NUM0, int num1) {

Return (long Long) num0+ (long Long) num1;

}

assignment int num0=0x70000000 and int num1=0x70000000, the run result is 0xe0000000, and the transformation is decimal 3758096384. There was no overflow this time.

1.2 Formula

How can it be done without spilling? Consider this formula

Z=a+b? Ab

If both A and B are within the range of [0,1], then:

0<= (1-a) (1-b) =1-a-b+ab<=1, then

0<=z<=1

Thus, if we think of a, B as a two input waveform, z as an output waveform, the upper and lower bounds of Z are also within the upper and lower bounds of a and B. In other words, Z is not overflow.

For 3 input signals, according to (1-A) (1-b) (1-c) operation, easy to get

Z=a+b+c? Ab? AC? Bc+abc.

And for the value range is not [0,1] signal, you can first convert to [0,1] to do.

For example, A and B are within the range of [0,255], then a/255 is within [0,1], then

z/255=a/255+b/255-(a/255) * (b/255), then

z=a+b-ab/255

For a signed number, the value range is [-128,127], then A ' = (a+128)/255 value is within [0,1], then

Z ' =a ' +b '? A ' *b ', substituting available

(z+128)/255= (a+128)/255+ (b+128)/255-(a+128)/255* (b+128)/255, the

z=a+b-(a+128) (b+128)/255+128

This algorithm can be considered as simple to add the input signal, and in order to avoid overflow, compressed two signal and the waveform. However, this algorithm has a fatal disadvantage, that is, when two of the signal is added without overflow, the algorithm still compresses the waveform, resulting in a damaged sound quality. Moreover, excessive subtraction operations can increase the power and complexity of the entire system, and also reduce the accuracy of the data in rounding.

To avoid the loss of the accuracy of sound signals in the operation, the industry's high-end audio processing system is now using 32-bit float sampling to calculate the output, which translates to 16bit.

1.3 Android procedure

Let's see how the mature software is done. Android Mixer in AudioMixer.cpp This file, it has a variety of functions to perform the mixing operation for different situations, the following function is to handle stereo audio without resampling.

Voidaudiomixer::p rocess__genericnoresampling (state_t* State, int64_t pts)

Let's take a look at how it's handled: it adds the sound data from each track. The so-called sound data, can be considered as a sampling point, Android default supported sampling accuracy is 16bit, the format is SIGNEDPCM, so each sample point with a signed 16-digit number int16_t represented. If directly add 16bit of data, will certainly cause the value of 16bit overflow, the practice of Android is strong turn into int32_t, add, and assign to the number of 32bit. Note that the volume is multiplied before it is added, and the data type that expresses the volume is int32_t. In this way, you can guarantee that there will be no overflow in the process.

Voidaudiomixer::track__16bitsstereo (track_t* T, int32_t* out, size_t Framecount,

int32_t* temp __unused, int32_t* aux) {

int32_t VL =t->prevvolume[0];

nt32_t VR =t->prevvolume[1];

Const INT16_T*IN = static_cast<const int16_t *> (t->in);

*out++ + = (vl>>) * (int32_t) *in++;

*out++ + = (vr>>) * (int32_t) *in++;

}

At this point, the data after the mix already exists in the out-pointing buffer, and then calls

Convertmixerformat (out, T1.mmixerformat,outtemp, T1.mmixerinformat, BLOCKSIZE * t1.mmixerchannelcount);

There is a function ditherandclamp, this is the int32_t format of the source data sums reduced to int16_t, and the left and right channels together into the int32_t format out.

void Ditherandclamp (int32_t* out, constint32_t *sums, size_t c)

{

size_t i;

for (i=0; i<c; i++) {

int32_t l = *sums++;

int32_t r = *sums++;

int32_t nl = L >> 12;

int32_t nr = r >> 12;

L = clamp16 (NL);

r = clamp16 (NR);

*out++ = (r<<16) | (L & 0xFFFF);

}

}

Look at its practice, a channel of 32bit input, first right to move 12 bits, that is, to retain the first 20 bits, and then CLAMP16 (clamp is "clip" meaning) into 16-bit, at this time, the left and next channels are 16-bit. Then put the right channel high, left channel low so that the composition of a 32bit number.

Let's see what CLAMP16 has done:

Static inline int16_t clamp16 (int32_tsample)

{

if ((sample>>15) ^ (sample>>31))

Sample = 0x7FFF ^ (sample>>31);

return sample;

}

This function simply removes the overflow part from the rough. The following test program can be very intuitive to see:

int Test ()

{

for (int i=32766; i<=32776; i++) {

int temp = CLAMP16 (i);

cout << "clamp16 tempint =" << temp <<endl;

}

return 0;

}

The output is:


We know that the upper bound of the 16-bit signed number is 0X7FFF, which is 32767. The test results show that the number less than it is retained, such as 32766, and the number greater than it is clamped (clamp) to 32767.

So why is Android doing this? Why not to gracefully retain the waveform of the signal, but choose to let it directly cut off (although this will inevitably create a sense of distortion)?

Maybe it's because

1. Mixing is relatively rare

2. The situation of overflow after mixing is also relatively rare

3. If efforts are made to retain the waveform of the signal, it is bound to cause the problem raised in the previous section

Digital Audio Mixer algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.