IOS voice features

Source: Internet
Author: User

Technically speaking, recording is mainly used to send the recording file to the server and then transmit it to another device for playback. This process is very simple, however, one problem is that it should be suitable for network transmission, because the network is particularly fragile. Whether it is sent or received, it will occupy the traffic of our mobile phones, this traffic is of interest to everyone, because this network is indeed too expensive. To facilitate network transmission, we will perform a compression and decompression process. After knowing this process, we can go to what we want to talk about on the iOS platform. The first is speech recording, speech codec, audio playback, audaio session, and speech practice.

There will be encryption and decryption in the development of this project, so we will not mention it now. Now let's take a look at the audio recording, which of the following will we talk about in the audio recording, we want to know which voice formats are recorded in iOS, how our APIs are used, how the recording volume is, and how the waves change with the sound size during recording, this is also supported by IOS.

Let's take a look at the supported default voice recording formats in IOS. There are not many supported formats, and many of them may not be supported. However, we can introduce you to the basic formats, which support high AAC compression, the effect is better. There is also ALAC and ilbc, a voice format for network transmission. Ima4 is a high compression efficiency, but because of its high efficiency, other algorithms and complexity may be reduced by a bit, which requires high efficiency. In addition, linear PCM, this is uncompress, there are U-LAW and alaw.

After we know the default voice format, let's take a look at how it is recorded. Here we have a very simple class called avaudiorecorder, set the target file of the recording, and set the file information of the recording, there are many recording methods and there are also many formats. during initialization, you can set the basic information of a recording, including the recording format. Many formats are supported in the API, for example, PCM and the supported formats mentioned earlier can be generated in this file. There are also recording sampling rates, as well as single-channel and dual-channel recording channels, as well as linear sampling digits, which can be set as needed.

After we know that we want to set the recording, we use avaudiorecorder to create a recording file, prepare the recording, and start recording. It is more efficient to create and record first.

This is the whole recording process. If you specify any local file, we recommend that you put it in the temp file and set the recording. You can set the recording format here, that is, the PCM recording format, usage, number of channels, and number of digits. We can transfer these parameters to create a recording file. The recording is so simple that we can implement a recording process. Apple's APIs are relatively standard and can easily achieve the desired results.

Recording: the volume of the microphone is captured during recording. You may not see it clearly. It means that the volume of the microphone needs to be played. We are also very simple. meteringenabled is also a coder method, it is easy to get the average volume and the peak volume in two ways during this time period. After these two values come out, the volume will obviously come out. This is how we capture the volume during recording.

There is also a recording method for audio streams. Because audio-oriented streams do not directly generate a file when you record the file, you can record the file for the audio stream if you want to transfer it while recording it. This is not detailed, because there are a lot of content, you can go to the next speakheredemo in apple. We can talk about how to use it and how it works. If we use it flexibly, we can also implement many very powerful functions.

After recording the sound, we just said that the recording has been completed to generate a file process. At this time, we want to enable cloud encoding/decoding when sending the file. Let's take a look at a picture, this figure shows the process from recording to storing the last hard disk file. The recording is first recorded in PCM format and then converted to AAC, in the middle, codec is used as an encoding decoder to compress and decompress audio signals. Let's talk about its basic process. First, we can record PCM from the microphone. This codec converts PCM into AAC format and writes AAC into a hard disk file, which can be achieved in three steps, in the middle of a codec transcoding process, you will find out how it records the data format.

I just talked about codec, but the focus is on codec. We may not have IOS support for recording this codec. Let's take a look at the codec supported by IOS, including the default codec format we just mentioned, however, in our usual applications or development, we may want to provide a wider range of features, such as MP3, WMA, Midi, Ogg, and speex, which may not be known to anyone before, but today, let's take a look at the speech format. We want to record these formats, but IOS does not have the default supported codec. What should we do? We can use open-source codec because codec is provided by some service providers and may be charged. Here we can look at which open-source codec can be used, many open-source vendors provide their projects on the Internet for free. Let's look at speex.

Speex is a well-compressed, easy to transmit over the network, and has some noise reduction functions. This means it is suitable for voice chat, the lame format, and Apple's lossless, this is a project open-source only at the end of last year. This is Apple's lossless compression codec. In addition, FLAC is a free lossless codec, and LBC is also suitable for network transmission codec.

After we download it to codec, there will be a difficult technical threshold at this time. With codec, how can we program our IOS Connection Library? To use PCM and get the link library, we need to compile it. First, we need to understand xcode. In order to enable xcode to support better iOS devices, such as iPhone 2 or 3, for lower-version devices, we need more processor architectures. First, what processing architecture does xcode need? I386, armv6, and armv7. After we know the processing architecture, we need to compile our codec. First, we need to compile a codec suitable for i386, including V6, and V7. We can use it after compilation. I will discuss the compilation method with you.

We can use the codec encoder and decoder after compiling the Link Library. The encoding process is like this. After the recording is complete, we use the PCM file by default because we want to record it as our own codec location. PCM is recorded in IOS as a WAV file format, because it is not only a data, but also a file that is played in our player. We can see that we can find information on the Internet in WAV format. The data structure here has a lot of information, but we can finally understand that what we really need is this block of data, we finally got this through the data format. After we know the digital structure, we can get the sample data. I believe you know a little about it and how to get it. If you don't know how to get it, you can also talk about it offline. Then we deduct the sample data and compress it into the desired target format. At last, we cannot compress the data format because the recorded format is not only transmitted by devices, but also by other platforms, such as the files sent from the iPhone to Android, because I don't know what compression is, add a file structure and re-assemble the compressed file structure so that other people will know what format the compressed file is. This is an encoding process.

After the encoding is complete, we can put the data on another device, and the encoding is complete. After someone else takes the data, they will definitely consider playing the video. Before how to handle this data, let's talk about how to implement playing in IOS. IOS supports playing the video in the same format as recording, only the MP3 and ilbc formats are added, and MP3 decoding is supported. Let's take a look at how to use playback. Playing is much easier than recording. First, the transmitted value can be the data extracted from the file, and then we are ready to play the video, you can play it out, which is very simple.

How does the AAC file play process? The first is the opposite of the recording process. The first is decoding, decoding, and playing inside, reading AAC, and playing PCM files.

After learning about the playing principle, we can decode the codec we just added to another client. What should we do with decoding? This is how we get a piece of data. The encoded data is like this. We need to remove this information from the decoding process, because we only need to decode the intermediate data, instead of parsing the specific structure of the file, but care about the encoded data, this is also very simple, codecdecode method, the last PCM format is a wave file for playback. This is a decoding process.

We should understand the whole process. We want to use our own codec recording process. Next, let's take a look at the audio session. Audio session is used in IOS to process applications. For example, how to process audio output from multiple applications, for example, when I listen to Apple, another application is enabled and another application is played with sound, how can this problem be solved? In addition, when we lock the screen, our applications and playing sounds will be locked at this time, and some applications will not be locked when I turn the volume key to the minimum, even if you play the game, the volume key will not be shut down. There is also whether recording is supported in the application, and whether audio playback is supported. Audio session is used to manage these processes. I will give you a more intuitive picture. My application is a plane console application. First, the first plane is flying, that is, playing the sound, another app says that I want to play the sound too. At this time, the console records it and says it wants to play the sound. This speakhere says it also wants to play the sound, that is, tell the console, when the audio session is about to play the sound, the console will tell speakhere to disable the sound.

Several audio modes can be set in the application: whether the sound is played when your voice is set to mute or locked, and whether other applications are supported when I use this application, this sound is also in use. That is, whether my app supports recording or playing audio. There are also different combinations in which we say that I cannot play the sound or play the sound when the screen is locked. How can it be used? Let's take a look at the bottom line of the code. We can set it to a pattern, which is also a sentence and very convenient to use.

We have finished the audio session. The following describes how the demo was created? First of all, the process is the same as what we just talked about.

The first step is to download the speex codec. Its webpage can be downloaded. the download is a C Project, at this time, we compile a speex codec lib suitable for our xcode development, which is the Development Library. Here is an example of how to compile a simulator-based demo. Here we can package host and build directly. Introduce libspeex. A to the xcode development environment. This setting should be very careful. After specifying the header file, we can introduce the speex. h file.

After we have our file library, we can obtain the audio data of the recorded PCM file to extract the code, that is, extract the sample file. This process may be relatively cumbersome, you only need to find this block to obtain the sample data. We can encode the sample data. Speex is an encoding function. This method is frame-based. Instead of processing a file or a file, it sets the length of a frame. Frame-based encoding means the desired speex format, finally, the speex file header information is added to the speex file, and then the speex file data is sent to the server through the socket. The server transmits the data to another device. The device accepts the data as the speex file and decodes the data as PCM audio. This speex also has many features, including the setting before recording and the setting of noise reduction. You can play back this file by decoding and restoring PCM data to the wave file format. There are so many things today.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.