Recently made a voice changer of the project, which involves a lot of audio-related knowledge, afraid of a long time to remember, write down the memo.
1. Encoding of The Voice
Voice recording when to choose an encoding format, because the mobile side of the reason, this encoding format needs to meet the compression ratio, sound quality is better (at least can be heard after the voice), but also the difficulty of coding small.
We chose several formats earlier: AMR, Speex, AAC, WAV. The advantages and disadvantages of several encodings.
First AMR is the most commonly used in speech coding, especially on the mobile side, the advantage is that the compression ratio is quite high, 60s voice using 8K sampling rate, 16bit sample size, can reach 35k-90k file size. The disadvantage is that iOS native is not supported and requires a third-party library for codec, such as the famous Opencore-amr. (The actual development also found that the Internet can search only 8K sampling rate of the codec, 16K only encoding, no decoding, 32K above the search)
Then is Speex, this encoding format is dedicated to Speex this open source Library, one of the benefits is open source, and the inside to achieve the voice noise reduction, echo cancellation and other functions, especially suitable for the development of VoIP. But the downside is that the whole framework is one, it may not be easy to take apart a module to use, and we have other processes in the process of recording and playing, and don't want to use his set of code. And seemingly Speex has not been maintained, now renamed Opus. This has not been studied carefully.
AAC's format is native to iOS and is said to sound good, but the downside is that the file is a bit large and the Android native SDK is not supported. If it's an iOS local voice app, I think it's pretty good.
WAV do not have to say more, used to know that Windows, Microsoft's own, Apple support for him is also good, shortcomings, like AAC, too big, only suitable for local recording and playback, not suitable for mobile network transmission.
To be compatible with Android, and taking into account the bandwidth of the mobile network, we finally selected AMR-NB as the transmission format, which is the AMR 8Khz sample rate encoding format. (In fact, there is a pit, the back of it) local broadcast we use WAV, because WAV is lossless, can be recorded when the voice changer, and can be easily converted to AMR.
2. Recording and playback
When you choose a good encoding, you are recording your voice. There are a lot of open source code on the Internet, we choose the official speakhere, technically use Audioqueueservice to record and play. Speakhere code has not been updated for a long time, some incompatible with IOS7, we downloaded after the resolution of a lot of warning, only to start using up. About Audioqueueservice knowledge, you can self-Google Baidu, it is always through an asynchronous queue to get the microphone data (recording) or is about to output the data (playback), belonging to the CoreAudio framework part. Voice recording also has AudioUnit, Audiofilestream and other programs, audiounit relatively complex, need to set their own inputbus,outbus, set the bad on anything can not hear. Audiofilestream is a voice stream that can be played directly from the server, or it can be downloaded to a local and then played back (recording? This can't seem to be used for recording.)
From the project development point of view, Audioqueue is the most suitable, so we decisively chose the audioqueue.
3. transcoding
In front of the conversion of WAV and AMR, we used to decode the WAV, WAV head information to get rid of, and then read out the PCM audio data, while opening a new file, write AMR header information, and then the PCM frame of a frame converted to AMR data, written to the AMR file. Vice versa.
The principle is very simple, but really do still encounter a lot of problems. The first is the recording when the resolution and transcoding when the resolution, the resolution of the voice changer should be consistent, otherwise there will be sound aliasing, or can not be played at all. Then on the 32-bit and 64-bit machine, WAV encoding a little bit different, mainly 32 bit on the sampling rate and the number of channels in hex to write in the time needed to remove the low byte, as for the reason, I am not very clear, may be OC and C + + mixed result, this kind of mixing can avoid as far as possible, Otherwise, some strange and wonderful problems are really not solved. Another is that the original Android is to support the direct codec AMR, but because we need to do voice changer processing, can not be recorded directly after the save, and Android itself Mideaplayer and Audioplayer package is very dead, there is no way to provide conversion encoding, So you can only go to the source code to extract Opencore this library, recompile so file. Does the classmate of the android have the feeling special pit father Ah ~ ~
4. Voice Changer
Voice Changer We use the SoundTouch Open Source Library, now make a voice changer basically use this. This library can be used to change, variable speed of their own combination, easy to use, first initialize a SoundTouch instance, and then the recording of the data called Putsamples method, and then call Receivesamples again, and then write to their own files. Note that soundtouch inside the SampleType is based on your sample size, if it is 16bit size, corresponds to the short type, if the size of 32bit, the corresponding float or int.
5. Download and playback logic
Download this piece actually with the audio relationship is not small, but the project also encountered a lot of problems, so still mention.
Our audio files are pulled from the server by the ID in the message, and then downloaded to the local and then sequentially, so a queue is needed to control the logic of the block. At first we were to download the module to manage the downloaded queue and the playback module to manage only the queue that was played. But later found no, because the download is asynchronous, and playback is synchronous, so also need an intermediary controller to coordinate.
Queue control We are using Nsoperationqueue, here is a problem, is iOS7 later, isconcurrent can not be used, replaced by a method called isasynchorized, according to official documents, Isasynchorized set to No, should be synchronous, but the actual results found completely unreliable, or concurrent download, and how many operation, there are how many threads! Am I wrong to understand that synchronization and non-concurrency cannot be equal? No way, we can only add a maxconcurrentoperationcount = 1 limit, this becomes a synchronous download.
Since the callback function of Audioqueue itself is executed asynchronously, when the playing operation is added to the queue, there is a problem, that is, after playing for two seconds, it will stop. This is because operation is over and Audioqueue's callback function is terminated. To solve this problem, another runloop to block the thread, and listen to the running state of Audioqueue, when the last frame playback is complete, Audioqueue will set the running property to False, then the end of the playing thread is OK.
According to the normal situation, the voice is actually almost there, but there are a lot of things to do. Because of the iphone microphone problem, there will be a lot of noise recorded in, so also need to noise, automatic gain, and then there is the Voice playback mode problem (handset and speaker switching), and so on, after the time to write it.
iOS Voice Changer Speech project Summary