I have been studying the implementation of voice calls recently. Now I have recorded my Implementation ideas here. However, this is a simple idea and a learning conclusion after Google and Baidu because of the first access to voice calls.
I think a voice call system has at least four modules. These are PCM (Pulse Code Modulation) Speech collection, encoding/decoding, network transmission, and audio playback. If UI interaction is counted, five modules are involved.
The overall process is probably as follows: A calls B, and a's voice is integrated with PCM raw data through mic, then encoded and compressed, and then connected through the network (establishing P2P connections) transmit the encoded data. End B decodes the data received by the network and calls the playback module to play the data.
I. speech acquisition module
The Android platform uses the audiorecord interface to collect PCM data. This step is easier. However, you must note how to use the audiorecord interface. Parameters required to construct an audiorecord instance
Public audiorecord (INT audiosource, int samplerateinhz, int channelconfig, int audioformat, int buffersizeinbytes)
Audiosource |
The recording source. SeeMediaRecorder.AudioSource For recording source definitions. |
Samplerateinhz |
The sample rate expressed in hertz. 44100Hz is currently the only rate that is guaranteed to work on all devices, but other rates such as 22050,160 00, and 11025 may work on some devices. |
Channelconfig |
Describes the configuration of the audio channels. SeeCHANNEL_IN_MONO AndCHANNEL_IN_STEREO .CHANNEL_IN_MONO Is guaranteed to work on all devices. |
Audioformat |
The format in which the audio data is represented. SeeENCODING_PCM_16BIT AndENCODING_PCM_8BIT |
Buffersizeinbytes |
The total size (in bytes) of the buffer where audio data is written to during the recording. New audio data can be read from this buffer in smaller chunks than this size. SeegetMinBufferSize(int, int, int) To determine the minimum required buffer size for the successful creation of an audiorecord instance. Using values smaller than getminbuffersize () will result in an initialization failure. |
- Audiosource: mediarecorder. audiosource. Mic
- Samplerateinhz: recording frequency, which can be 8000hz or 11025hz. The value varies with hardware devices.
- Channelconfig: Specifies the recording channel, which can be audioformat.
CHANNEL_IN_MONO
And audioformat.CHANNEL_IN_STEREO
. Audioformat.CHANNEL_IN_MONO
- Audioformat: Specifies the recording encoding format, which can be audioformat. encoding_16bit and 8 bit. The simulation of 16 bit is better than that of 8 bit, but it consumes more power and storage space.
- Buffersizeinbytes: Specifies the recording buffer size. You can use the getminbuffersize () method to obtain the buffer size.
Call audiorecord'sread(byte[], int, int)
,read(short[], int, int)
OrThe read (bytebuffer, INT) method can collect PCM speech data.
Ii. Audio Playback
After the voice data is collected, the audio playback module can be implemented. Playing PCM Data on Android is also very easy, just use the audiotrack interface. Pay attention to the use of this interface.
The construction method of audiotrack corresponds to audiorecord.
Public audiotrack (INT streamtype, int samplerateinhz, int channelconfig, int audioformat, int buffersizeinbytes, int Mode)
Streamtype |
The type of the audio stream. SeeSTREAM_VOICE_CALL ,STREAM_SYSTEM ,STREAM_RING ,STREAM_MUSIC ,STREAM_ALARM , AndSTREAM_NOTIFICATION . |
Samplerateinhz |
The sample rate expressed in hertz. |
Channelconfig |
Describes the configuration of the audio channels. SeeCHANNEL_OUT_MONO AndCHANNEL_OUT_STEREO |
Audioformat |
The format in which the audio data is represented. SeeENCODING_PCM_16BIT AndENCODING_PCM_8BIT |
Buffersizeinbytes |
The total size (in bytes) of the buffer where audio data is read from for playback. if using the audiotrack in streaming mode, you can write data into this buffer in smaller chunks than this size. if using the audiotrack in static mode, this is the maximum size of the sound that will be played for this instance. seegetMinBufferSize(int, int, int) To determine the minimum required buffer size for the successful creation of an audiotrack instance in streaming mode. Using values smaller than getminbuffersize () will result in an initialization failure. |
Mode |
Streaming or static buffer. SeeMODE_STATIC AndMODE_STREAM |
The implementation of the above two modules is better. After these two modules are implemented, we can implement many examples on the network to illustrate the effect of "edge recording and play" when audiorecord and audiotrack are used. Of course this is not my goal, but we can test whether the collected data is correct.
In fact, the effect of this side recording and play is very serious if the loudspeaker is used, and the noise is also very serious. This is also a problem! So proceed to the next step, codec!
Iii. Speech Coding/Decoding
The collected PCM data is the original voice data. If we directly transmit the data over the network, it is not desirable. Therefore, we need to package the code.
Encoding we need a third-party library, and currently the library I am using is speex (http://www.speex.org ). I have seen that many sip voice calls use this library for encoding and decoding. Of course, there are also some bad comments on this library, but I think it is advisable to study because speex is very convenient to use.
Speex is a C library (of course there are also Java versions, protocol), so we need to use JNI. If you have never used JNI, it is also a learning opportunity. Can refer to sipdroid http://code.google.com/p/sipdroid/ if you still feel trouble can refer to this open source project http://code.google.com/p/android-recorder/
However, when I use speex, the noise is reduced, but when I use a loudspeaker, there is still a lot of ECHO, but the effect is much better. We can see from the official website that the latest speex version adds Echo and noise reduction processing, but when I add the echo and noise reduction modules, there is no obvious effect, so this is a problem I encountered when using the speex library and is still under study. If you know the reason, leave a message to learn.
Iv. Network Transmission
After the encoding is packaged, the network is transmitted. Network Transmission mainly uses RTP (real-time transmission protocol ). Currently my library is jlibrtp library http://sourceforge.net/projects/jlibrtp? Source = Directory, which is a Java implementation. However, this database may cause packet loss and some exceptions in the database will be thrown. Because I didn't find a better library for RTP transmission, I had to use this library. Students who like to study can also study the RTP Implementation of sipdroid. I am also reading it, but it has not been thoroughly studied yet. If you have studied it, you can leave a message. Let's take a look at it.
This is a simple implementation of P2P voice calls. However, this idea has not yet been implemented on the server side. Therefore, this idea can only be implemented within the LAN. It is also difficult to implement P2P calls through NAT. I think the difficulty is to break it.
Code: because it is a company project, it is not easy to paste it out. I will post it later.
This article is an original blog. If it is reproduced, please enter the original article link. Thank you!