On the audio voice changer algorithm, this is a topic that many people are particularly interested in.
Of course, there are many open-source algorithms to learn, there are time-domain-based, but also based on the frequency-domain algorithm.
The final algorithm wants to achieve the same purpose.
Recently there are many netizens asked about the voice changer algorithm some of the details of the problem, e-mail asked me.
To give a more reasonable or easy-to-understand explanation, seemingly simple, in fact, quite difficult.
According to a logical idea, a little bit, so this topic must be prefixed with the word "big talk".
It is not intended to be particularly advanced, of course, because it cannot be said.
In the field of image algorithm, the most important algorithm is Gaussian blur,
Of course, it can also be thought of as convolution, Gaussian blur is a special case of convolution, it is not expanded here.
And to the audio, you may have guessed that time-based, no doubt, is the resampling algorithm.
Audio sampling rate refers to the number of times a recording device samples a sound signal in a second,
The higher the sampling frequency, the more realistic the sound is to be restored.
In today's mainstream capture card, the sampling frequency is generally divided into 22.05KHz, 44.1KHz, 48KHz three levels,
22.05KHZ can only achieve FM broadcast sound quality,
44.1KHz is the theoretical limit of CD quality, 48KHz is more accurate.
See here, perhaps most people still can't understand what the sampling frequency is probably mean.
In other words, suppose a person says "Hello", it takes 20 milliseconds, and the machine is within 20 milliseconds,
The amount of data collected can be understood as the sampling rate.
In other words, within 20 milliseconds, the amount of data collected is probably considered the current sampling rate, the larger the amount of data, the higher the accuracy, the higher the sampling rate.
So, let's change another idea and think of a problem.
If at the same rate as the case,
A person's speed is fast, a person's speed is slow, that also may cause the sampling data distribution inconsistent.
Here you can expand an audio algorithm that is variable speed.
Well, yes, it's a variable speed.
In principle, the variable speed is in the same sampling rate environment, the sampling data are stretched or compressed.
From the point of view of the algorithm, it can be considered as interpolation or pumping value.
If you make a person speak faster and more quickly how to do it,
It is clear that some samples are taken out at the same sampling rate.
Conversely, the spin down is to insert some samples.
The final decision on the variable speed effect is the weight calculation of the Insert sample and the extracted sample.
For example, the data that was originally sampled is
1234
When accelerating, pull away samples 1 and 4
23
When you spin down, increase the sample
11223344
Of course, just for example, it is easy to understand the concept logic.
See here, certain people will ask,
What about the size of the sound? Or the strength of the signal?
In fact, that is to raise the volume and lower the volume, I think this should not explain.
Variable speed is the time domain change, the space remains unchanged.
But the volume is opposite, the time domain is invariable, the space changes.
can be easily and rudely understood, is linear stretching.
For example, the data that was originally sampled is
1234
Each sample +4, directly stretched to
5678
It also uses multiplication to stretch,
For example, multiply by 2
2468
Above is to increase the volume, lower the volume and vice versa is to subtract and divide.
In the end, regardless of the speed or volume adjustment,
The final algorithm to do is to determine the corresponding position of the corresponding weights.
Of course, it depends on what kind of effect you want to achieve and how to fit the weights.
Spare such a big circle, still did not mention the voice changer question.
In fact, the voice changer is variable speed + volume adjustment.
The above variable speed, volume adjustment is relatively linear stretching,
Direct subtraction can then be achieved by interpolating the values.
And the concept of a voice changer is actually similar,
is to adjust the volume weights for the temporal domain simultaneously in the same time domain.
In other words, in the same sample rate, control the speed and volume within a specific weight.
is actually a two-dimensional stretch of time domain and space.
Understanding this logic does a bit of a detour.
Use the sampling algorithm to make a simple example.
See the previous article, "An example of a concise interpolation audio resampling algorithm (with full C code)"
The sample functions in this example are:
voidResampler (Char*in_file,Char*out_file) { //Audio sample Rateuint32_t in_samplerate =0; //total number of audio samplesuint64_t Totalsamplecount =0; int16_t*data_in = Wavread_int16 (In_file, &in_samplerate, &totalsamplecount); uint32_t out_samplerate= In_samplerate *2; uint32_t out_size= (uint32_t) (Totalsamplecount * (float) Out_samplerate/in_samplerate)); int16_t*data_out = (int16_t *)malloc(Out_size *sizeof(int16_t)); //If the load succeeds if(data_in! = NULL && Data_out! =NULL) {Resampledata (data_in, In_samplerate, (uint32_t) Totalsamplecount, Data_out, out_samplerate); Wavwrite_int16 (Out_file, Data_out, Out_samplerate, (uint32_t) out_size); Free(data_in); Free(data_out); } Else { if(data_in) Free(data_in); if(data_out) Free(data_out); }}
Let's tweak it a bit and set a sample rate to adjust the speed of the sound while keeping the sample rate constant.
voidResampler (Char*in_file,Char*out_file) { //Audio sample Rateuint32_t in_samplerate =0; //total number of audio samplesuint64_t Totalsamplecount =0; int16_t*data_in = Wavread_int16 (In_file, &in_samplerate, &totalsamplecount); floatSpeed =0.88;//Add a speed weightuint32_t out_samplerate = in_samplerate *Speed ; uint32_t out_size= (uint32_t) (Totalsamplecount * (float) Out_samplerate/in_samplerate)); int16_t*data_out = (int16_t *)malloc(Out_size *sizeof(int16_t)); //If the load succeeds if(data_in! = NULL && Data_out! =NULL) {Resampledata (data_in, In_samplerate, (uint32_t) Totalsamplecount, Data_out, out_samplerate); //out_samplerate to output the same sample rate in_sampleratewavwrite_int16 (Out_file, Data_out, In_samplerate, (uint32_t) out_size); Free(data_in); Free(data_out); } Else { if(data_in) Free(data_in); if(data_out) Free(data_out); }}
That's what it looks like after the change.
A friend of my heart found out. Out_size values are likely to increase or decrease.
The example code above is a simple variable speed algorithm.
Variable speed is one such principle, and the volume is lowered to not do the example.
And the Voice changer is a what algorithm?
Plainly speaking, is the variable speed at the same time to ensure out_size or the original totalsamplecount.
How do you guarantee it?
The answer is interpolation, if it's a little rough, make 0 or delete 0.
This, of course, may result in inconsistent volume and final audible misalignment.
This is certainly not scientific, the final interpolation when the weight and corresponding content, the effect of the production depends on the ability of the home.
The above principle, also said almost, concrete how to realize the words,
You can see the relevant open source code to understand it.
Also say the previous "Sound modulation algorithm Pitchshift (simulated tom Cat) with complete C + + algorithm implementation Code"
The sin and cos in this article are not within the effective interval, so the results of fastsin fastcos calculations are problematic.
For details, refer to the author's original algorithm.
Of course, I'll release it when there's time.
The complete C code and corresponding sample code for the simple and clear voice changer algorithm.
And on the Fourier transform based resampling algorithm, the Fourier transform-based audio resampling algorithm (complete C code)
On the corresponding GitHub project Fftresample, I also made the algorithmic logic corrections.
Published articles are generally rarely edited two times,
On some of the later revisions and changes, let's focus on the GitHub project updates more directly.
The realization principle of the specific voice changer,
As mentioned above, I hope that through this article,
We can have a more intuitive understanding of the audio voice changer algorithm.
Above, right when a.
It's better to have fun together than to play alone.
If you have other related questions or needs, you can contact me to discuss the email.
e-mail address is:
Gaozhihan@vip.qq.com