IOS instant voice chat technology practices

Source: Internet
Author: User

Cmdn club 15th was successfully held on April 9, March 15. This activity focused on the development of voice technology with the topic "Application and Practice of mobile platform speech technology, this article explores hot topics of speech product innovation and technical practices from the perspectives of basic voice services, voice product development, and voice technology implementation. We invited Zhang Tianhong, Senior iOS development engineer of Beijing aitotem technology, to give a speech on iOS instant voice chat technology.

Figure: Zhang Tianhong, Senior iOS development engineer of Beijing aitotem technology, gave a speech

The following is text recording:

Zhang Tianhong: Hello, everyone. I am Zhang Tianhong from AI totem. You may be familiar with AI totem. Some are unfamiliar. AI totem is a company mainly engaged in Mobile outsourcing, our features are relatively open, because we usually develop our developers, designers, and various platforms. Some of them share the same idea in terms of technology, you can see from our official blog that many of our implementations have been posted on our blog for discussion. In addition, our company has a technical exchange meeting every Friday afternoon. Before that, we also told everyone that we could study these technologies together, discuss them together, and play together. Now we can also go, you can take a look every Friday afternoon. (HOST: I would like to add that love totem is also a company of engineering culture, so we can find out from love totem and there will be technical personnel like this, therefore, we strongly encourage your company to share such technologies, whether on a blog or offline activity, in a cooperative manner .)

Demo introduction:It is mainly a function. I don't know whether the effect is good, because it is related to the network. Now let's take a look at the effect. This is the iPhone 4S mobile phone. I am doing this demo. We press and hold the recording to send it out. It can be based on the size of my voice. After I let it go, the voice will be sent out, because this is restricted by the network, I do not know the effect is good. This is related to the network, because after the data is sent. What is the result? He can play the same effect as the rice chat we usually play. After reading this demo, we know the basic functions of the demo. We know that the first function of the demo is recording. After recording, it is sent to the server through the network, the server then transmits the voice back and plays it on another device. After knowing this, let's see how it works.

Technically speaking, it is mainly recording.: Sending the recording file to the server and then transmitting it to another device for playing. This process is very simple, but one problem is that it should be suitable for network transmission, because the network is very fragile, whether it is sent or received, it will occupy the traffic of our mobile phone. This traffic is of interest to everyone, because this network is indeed too expensive, in order to facilitate network transmission, we can perform a compression and decompression process. After knowing this process, we can go to what we want to talk about on the iOS platform. The first is speech recording, speech codec, audio playback, audaio session, and speech practice.

There will be encryption and decryption in the development of this project, so we will not mention it now. Now let's take a look at the audio recording, which of the following will we talk about in the audio recording, we want to know which voice formats are recorded in iOS, how our APIs are used, how the recording volume is, and how the waves change with the sound size during recording, this is also supported by IOS.

Let's take a look at the supported default voice recording formats in IOS. There are not many supported formats, and many of them may not be supported. However, we can introduce you to the basic formats, which support high AAC compression, the effect is better. There is also ALAC and ilbc, a voice format for network transmission. Ima4 is a high compression efficiency, but because of its high efficiency, other algorithms and complexity may be reduced by a bit, which requires high efficiency. In addition, linear PCM, this is uncompress, there are U-LAW and alaw.

After we know the default voice formatLet's see how it is recorded,Here we have a very simple class called avaudiorecorder, set the target file of the recording, and set the file information of the recording. There are many recording methods and there are also many formats, you can set the basic information of a recording during initialization, including the recording format. Many formats are supported in the API, such as PCM, and the supported formats mentioned above can be put in this to generate. There are also recording sampling rates, as well as single-channel and dual-channel recording channels, as well as linear sampling digits, which can be set as needed.

After we know that we want to set the recording, we use avaudiorecorder to create a recording file, prepare the recording, and start recording. It is more efficient to create and record first.

This is the whole recording process. If you specify any local file, we recommend that you put it in the temp file and set the recording. You can set the recording format here, that is, the PCM recording format, usage, number of channels, and number of digits. We can transfer these parameters to create a recording file. The recording is so simple that we can implement a recording process. Apple's APIs are relatively standard and can easily achieve the desired results.

Recording: The volume of the microphone is captured during the recording. You may not see it clearly, that is, the microphone volume needs to be played. We are also very simple. meteringenabled is also a coder method, it is easy to get the average volume and the peak volume in two ways during this time period. After these two values come out, the volume will obviously come out. This is how we capture the volume during recording.

There isAudio Recording Method for audio streams. Because audio-oriented streams do not directly generate a file when you record the file, you can record the file for the audio stream if you want to transfer it while recording it. This is not detailed, because there are a lot of content, you can go to the next speakheredemo in apple.We can talk about how to use it and how it works. If we use it flexibly, we can also implement many very powerful functions.

After recording the sound, we just said that the recording has been completed to generate a file process. At this time, we want to enable cloud encoding/decoding when sending the file. Let's take a look at a picture, this figure shows the process from recording to storing the last hard disk file. The recording is first recorded in PCM format and then converted to AAC, in the middle, codec is used as an encoding decoder to compress and decompress audio signals. Let's talk about its basic process. First, we can record PCM from the microphone. This codec converts PCM into AAC format and writes AAC into a hard disk file, which can be achieved in three steps, in the middle of a codec transcoding process, you will find out how it records the data format.

I just talked about codec, but the focus is on codec. We may not have IOS support for recording this codec. Let's take a look at the codec supported by IOS, including the default codec format we just mentioned, however, in our usual applications or development, we may want to provide a wider range of features, such as MP3, WMA, Midi, Ogg, and speex, which may not be known to anyone before, but today, let's take a look at the speech format. We want to record these formats, but IOS does not have the default supported codec. What should we do? We can use open-source codec because codec is provided by some service providers and may be charged. Here we can look at which open-source codec can be used, many open-source vendors provide their projects on the Internet for free. Let's look at speex.

Speex is a well-compressed, easy to transmit over the network, and has some noise reduction functions. This means it is suitable for voice chat, the lame format, and Apple's lossless, this is a project open-source only at the end of last year. This is Apple's lossless compression codec. In addition, FLAC is a free lossless codec, and LBC is also suitable for network transmission codec.

After we download it to codec, there will be a difficult technical threshold at this time. With codec, how can we program our IOS Connection Library? To use PCM and get the link library, we need to compile it. First, we need to understand xcode. In order to enable xcode to support better iOS devices, such as iPhone 2 or 3, for lower-version devices, we need more processor architectures. First, what processing architecture does xcode need? I386, armv6, and armv7. After we know the processing architecture, we need to compile our codec. First, we need to compile a codec suitable for i386, including V6, and V7. We can use it after compilation. I will discuss the compilation method with you.

We can use the codec encoder and decoder after compiling the Link Library. The encoding process is like this. After the recording is complete, we use the PCM file by default because we want to record it as our own codec location. PCM is recorded in IOS as a WAV file format, because it is not only a data, but also a file that is played in our player. We can see that we can find information on the Internet in WAV format. The data structure here has a lot of information, but we can finally understand that what we really need is this block of data, we finally got this through the data format. After we know the digital structure, we can get the sample data. I believe you know a little about it and how to get it. If you don't know how to get it, you can also talk about it offline. Then we deduct the sample data and compress it into the desired target format. At last, we cannot compress the data format because the recorded format is not only transmitted by devices, but also by other platforms, such as the files sent from the iPhone to Android, because I don't know what compression is, add a file structure and re-assemble the compressed file structure so that other people will know what format the compressed file is. This is an encoding process.

After the encoding is complete, we can put the data on another device, and the encoding is complete. After someone else takes the data, they will definitely consider playing the video. Before how to handle this data, let's talk about how to implement playing in IOS. IOS supports playing the video in the same format as recording, only the MP3 and ilbc formats are added, and MP3 decoding is supported. Let's take a look at how to use playback. Playing is much easier than recording. First, the transmitted value can be the data extracted from the file, and then we are ready to play the video, you can play it out, which is very simple.

How does the AAC file play process?The first is the opposite of the recording process. The first is decoding, decoding, and playing inside, reading AAC, and playing PCM files.

After learning about the playing principle, we can decode the codec we just added to another client. What should we do with decoding? This is how we get a piece of data. The encoded data is like this. We need to remove this information from the decoding process, because we only need to decode the intermediate data, instead of parsing the specific structure of the file, but care about the encoded data, this is also very simple, codecdecode method, the last PCM format is a wave file for playback. This is a decoding process.

We should understand the whole process. We want to use our own codec recording process. Next, let's take a look at the audio session. Audio session is used in IOS to process applications. For example, how to process audio output from multiple applications, for example, when I listen to Apple, another application is enabled and another application is played with sound, how can this problem be solved? In addition, when we lock the screen, our applications and playing sounds will be locked at this time, and some applications will not be locked when I turn the volume key to the minimum, even if you play the game, the volume key will not be shut down. There is also whether recording is supported in the application, and whether audio playback is supported. Audio session is used to manage these processes. I will give you a more intuitive picture. My application is a plane console application. First, the first plane is flying, that is, playing the sound, another application said that I want to play the audio too. At this time, the console recorded it and said it was about to play the sound. This speakhere said it was about to play the sound, that is, to tell the console that I audio
When the session is about to play the sound, the console will tell speakhere to disable the sound.

Several audio modes that can be set in the Application: Whether the sound is played when your voice is set to mute or locked. In other cases, whether this application supports other applications while using this sound. That is, whether my app supports recording or playing audio. There are also different combinations in which we say that I cannot play the sound or play the sound when the screen is locked. How can it be used? Let's take a look at the bottom line of the code. We can set it to a pattern, which is also a sentence and very convenient to use.

We have finished the audio session. belowHow did we do the demo just now?First of all, the process is the same as what we just talked about.

The first step is to download the speex codec. Its webpage can be downloaded. the download is a C Project, at this time, we compile a speex codec lib suitable for our xcode development, which is the Development Library. Here is an example of how to compile a simulator-based demo. Here we can package host and build directly. Introduce libspeex. A to the xcode development environment. This setting should be very careful. After specifying the header file, we can introduce the speex. h file.

After we have our file library, we can obtain the audio data of the recorded PCM file to extract the code, that is, extract the sample file. This process may be relatively cumbersome, you only need to find this block to obtain the sample data. We can encode the sample data. Speex is an encoding function. This method is frame-based. Instead of processing a file or a file, it sets the length of a frame. Frame-based encoding means the desired speex format, finally, the speex file header information is added to the speex file, and then the speex file data is sent to the server through the socket. The server transmits the data to another device. The device accepts the data as the speex file and decodes the data as PCM audio. This speex also has many features, including the setting before recording and the setting of noise reduction. You can play back this file by decoding and restoring PCM data to the wave file format. There are so many things today.

Offline communication is expected to involve technical issues, because I think the voice is developing very fast now, And I think xunfei and UC are doing very well and very user-friendly, we can communicate with each other about how technologies in this field can be implemented and solved, regardless of Weibo platforms. Thank you.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.