Android Advanced Audio Application
When it comes to audio applications, the first thing that comes to mind is the music player. Some players can play streaming media and some can play local music files. With the evolution of the Android platform, more advanced audio APIs are required. Fortunately, Google added this API to support low-latency audio streaming and recording.
Android audio APIs provide some advanced functions that developers can integrate into their own applications. With these APIs, you can now easily implement VoIP applications, build custom streaming media music clients, and achieve low-latency game sound effects. In addition, APIs for text-to-speech conversion and speech recognition are provided. Users can directly use audio to interact with applications without using user interfaces or touch technologies.
Low-latency audio
Android has four APIs for playing audio (five for MIDI) and three for recording. Next, we will briefly introduce these APIs and some advanced usage examples.
① Audio playback API
MediaPlayer is used for music playback by default. This class is suitable for playing music or videos. It can play streaming resources and local files. Each MediaPlayer has an associated State Machine that needs to be tracked in the application. Developers can use MediaPlayer APIs to embed music or video playing functions in their own applications without additional processing or consideration of latency requirements.
The second option is the SoundPool class, which provides low-latency support and is suitable for playing sound effects and other relatively short audios. For example, you can use SoundPool to play game sounds. However, it does not support audio streams, so it is not suitable for applications that require real-time audio stream processing, such as VoIP.
The third option is the AudioTrack class, which allows the audio stream to be buffered into the hardware, supports low-latency playback, and is even suitable for streaming media scenarios. The AudioTrack class generally provides a low enough latency for use in VoIP or similar applications.
The following code demonstrates how to use AudioTrack in a VoIP application:
public class AudioTrackDemo { private final int mMinBufferSize; private final AudioTrack mAudioTrack; public AudioTrackDemo() { this.mMinBufferSize = AudioTrack.getMinBufferSize(16000, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT); this.mAudioTrack = new AudioTrack(AudioManager.STREAM_MUSIC, 16000, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT, this.mMinBufferSize*2, AudioTrack.MODE_STREAM); } public void playPcmPacket(byte[] pcmData) { if (this.mAudioTrack != null && this.mAudioTrack.getState() == AudioTrack.STATE_INITIALIZED) { if (this.mAudioTrack.getPlaybackRate() != AudioTrack.PLAYSTATE_PLAYING) { this.mAudioTrack.play(); } this.mAudioTrack.write(pcmData, 0, pcmData.length); } } public void stopPlayback() { if (this.mAudioTrack != null) { this.mAudioTrack.stop(); this.mAudioTrack.release(); } }}
First, determine the minimum buffer size of the audio stream. To achieve this, you need to know the sampling rate, whether the data is single-channel or stereo, and whether 8-or 16-bit PCM encoding is used. Then, AudioTrack. getMinBufferSize () is called Based on the sampling rate and sampling size. This method returns the minimum buffer size of the AudioTrack instance in bytes.
Next, use the correct parameters to create an AudioTrack instance. The first parameter is the audio type. Different applications use different values. STREAM_VOICE_CALL is used for VoIP applications, while STREAM_MUSIC is used for streaming media music applications.
Specific options include:
STREAM_ALARM
STREAM_MUSIC: audio of mobile phone music
STREAM_DTMF: DTMF tone sound
STREAM_RING
STREAM_NOTFICATION: The sound prompted by the System
STREAM_SYSTEM: system sound
STREAM_VOICE_CALL: voice calls
The second, third, and fourth parameters vary depending on the scenario. These parameters indicate the sampling rate, stereo or single channel, and sample size. Generally, a VoIP application uses 16-bit single-channel with 16 kHz, while a conventional music CD may adopt a 16-bit stereo sound with a 44.1KHZ. 16-bit stereo high sampling rate requires a larger buffer zone and more data transmission, but the sound quality is better. All Android devices support 8-or 16-bit stereo playback at 8 KHZ, 16 KHZ, and 44.1KHZ sampling rates.
The buffer size parameter should be a multiple of the minimum buffer, depending on the actual needs, sometimes network latency and other factors will also affect the buffer size.
Avoid using an empty buffer at any time, because playing may fail.
The last parameter determines whether to send audio data only once (MODE_STATIC) or continuously (MODE_STREAM ). In the first case, the entire audio clip needs to be sent at a time. For continuous sending of audio streams, you can send any small block of PCM data, which can be used to process streaming media music or VoIP calls.
② Recording API
When talking about recording audio, the first API to consider is MediaRecorder. Similar to MediaPlayer, You need to track the internal status of the MediaRecorder class in the application code. Because MediaRecorder can only save recordings to files, it is not suitable for recording streaming media.
If you need to record streaming media, you can use AudioRecord, which is very similar to the code just displayed.
The following example shows how to create an AudioRecord instance to record 16-bit single-channel 16 Khz audio samples:
public class AudioRecordDemo { private final AudioRecord mAudioRecord; private final int mMinBufferSize; private boolean mDoRecord = false; public AudioRecordDemo() { this.mMinBufferSize = AudioTrack.getMinBufferSize(16000, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT); this.mAudioRecord = new AudioRecord(MediaRecorder.AudioSource.VOICE_COMMUNICATION, 16000, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, this.mMinBufferSize * 2); } public void writeAudioToStream(OutputStream stream){ this.mDoRecord=true; this.mAudioRecord.startRecording(); byte[] buffer=new byte[this.mMinBufferSize*2]; while(this.mDoRecord){ int byteWritten=this.mAudioRecord.read(buffer,0,buffer.length); try{ stream.write(buffer,0,byteWritten); }catch(IOException e){ this.mDoRecord=false; } } this.mAudioRecord.stop(); this.mAudioRecord.release(); } public void stopRecording(){ this.mDoRecord=false; }}
Because the process of creating AudioTrack is very similar, it can be easily combined when using VoIP or similar applications.
I believe that people who have learned multimedia are no stranger to sampling rates and other such things. If they lack this knowledge, they can add it as appropriate and look at this code.
If you read it carefully, you will find that this article only introduces three playback APIs and two recording APIs. What about the four APIs? In fact, the last one can be clearly explained without a sentence or two. You also need to list an article separately-OpenSL ES, which will be explained later in this blog.