Core tip: Background introduction opensl ES is a hardware audio acceleration API specifically optimized for embedded systems, with no licensing fees and can be used across platforms. It provides a high-performance, standardized, low-latency feature implementation that provides standards for embedded media development, and embedded developers are also making it easier to develop native audio applications that enable the use of the API for software/hardware audio
Background Introduction
OpenSL ES is a hardware audio acceleration API specifically optimized for embedded systems, with no licensing fees and can be used across platforms. It offers high-performance, standardized, low-latency features that provide the standard for embedded media development, and embedded developers are also making it easier to develop native audio applications that enable direct cross-platform deployment of soft/hardware audio performance, reducing the difficulty of execution and facilitating the development of the advanced audio market. opensl es frame diagram
Hardware implementation:
Software implementation:
There is a delay in audio input in Android apps, and some time before the sound is output to the speaker. On most arm-and x86-based devices, the audio RTL may be delayed to 300 milliseconds, most of which are developed using an audio-oriented Android approach. The user base cannot accept this delay range, the expected delay must be less than 100 milliseconds, in most cases, less than 20 milliseconds is the most ideal RTL. You also need to consider the total number of audio processing delays and buffer queues.
As with other APIs, OpenSL es works by using a callback mechanism. In OpenSL es, callbacks are used only to notify the app, and new buffers can be queued (for playback or recording). In other APIs, callbacks can also handle pointers to audio buffers that need to be populated or used. However, in OpenSL es, it is more selective to implement the API so that callbacks run as signaling mechanisms, thus maintaining all processing on the audio processing thread. This will include queuing for the desired buffer after receiving the assigned signal. opensl es use process
Before the research television K song period has the plan involves obtains the microphone the audio data, but uses the system Audiorecord carries on the data collection to have certain delay, although after 5.0 Google for the audio has made certain optimization, the delay slightly improves, but the effect is still passable. So in order to better return to listen to the effect, OpenSL es is the most suitable, the main reason is the following three points.
OpenSL es uses a buffer queue mechanism that makes it more efficient in the Android media framework.
If your phone supports low latency features then you need to use OpenSL es (google original: low-latency audio is just supported when using Android ' s implementation of the Open SL ES? API and the Android NDK.)
Because the implementation is native code, it can provide higher performance because native code is not subject to Java or Dalvik VM overhead
So this approach helps with Android-based audio development. The following is the initialization flowchart for OpenSL es.
All operations in OpenSL es are done through interfaces, similar to Java interfaces, which provide the underlying method invocation. The commonly used interfaces are the following:
SLOBJECTITF: Object interface
SLENGINEITF: Engine interface
SLPLAYITF: Playback interface
SLBUFFERQUEUEITF: Buffer Queue Interface
SLVOLUMEITF: Sound Volume interface
The following sections are divided into four parts: initialization, audio data acquisition, audio transmission, audio playback. Initialize
Initialization mainly includes, OpenSL es engine initialization, recording/player initialization. opensl es engine initialization
The main point of the
OpenSL ES engine initialization is the new Engine object connection JNI interacts with the underlying, setting the engine's sampling parameters, including sample leveling, sample frame size, sampling channel, and sample depth, and initializing the buffer queue of the audio data. It is important to note that the sampling parameters used in this experiment need to be set the same as the server side.
Slresult result;
memset (&engine, 0, sizeof (engine));
Set sampling Parameters Engine.fastpathsamplerate_ = static_cast<slmillihertz> (samplerate) * 1000;
Engine.fastpathframesperbuf_ = static_cast<uint32_t> (FRAMESPERBUF);
Engine.samplechannels_ = Audio_sample_channels;
Engine.bitspersample_ = sl_pcmsampleformat_fixed_16;
New Object result = Slcreateengine (&engine.slengineobj_, 0, NULL, 0, NULL, NULL);
Slassert (result);
Initialize result = (*engine.slengineobj_)->realize (Engine.slengineobj_, Sl_boolean_false);
Slassert (result); Gets the engine interface so that the engine can be used to build other objects result = (*engine.slengineobj_)->getinterface (Engine.slengineobj_, Sl_iid_engine, &
ENGINE.SLENGINEITF_);
Slassert (result); Calculating the recommended fastest audio buffer size//low latency requires the following two points//buffers to be as small as possible (adjust it here) and//To receive buffered data from the recorder and to minimize data buffering time before playback//adjust buffer size to suit your requirements [before
It busts] bufSize = engine.fastpathframesperbuf_ * Engine.samplechannels_ * ENGINE.BITSPERSAMPLE_; BufSize = (bufSize + 7) >> 3; Bits-to-byte enGine.bufcount_ = Buf_count;
Engine.bufs_ = Allocatesamplebufs (Engine.bufcount_, bufSize);
ASSERT (ENGINE.BUFS_);
The free buffer and the receive buffer engine.freebufqueue_ = new Audioqueue (ENGINE.BUFCOUNT_);
Engine.recbufqueue_ = new Audioqueue (ENGINE.BUFCOUNT_);
ASSERT (Engine.freebufqueue_ && engine.recbufqueue_); for (uint32_t i=0; i<engine.bufcount_; i++) {Engine.freebufqueue_->push (&engine.bufs_[i]);}
After the new engine object Slengineobj is not available, it needs to be realize to get to the engine interface, the engine interface can be used to obtain the subsequent playback, recording interface. The FASTPATHFRAMESPERBUF is the number of samples per buffer buffer, and the entire bufsize is twice times the size of all channel samples, since the sampling depth is 16bit, or 2 bytes. Freebufqueue refers to the free buffer queue, which mainly provides an empty sampling array. The recbufqueue is the receiving buffer queue, which is mainly used to store the collected audio data, as well as the source of the playback data. After initialization of the engine initializes the Freebufqueue, initializing 16 empty arrays of size 480 bytes. This completes the initialization of the audio engine. OpenSL ES Recorder initialization
The initialization of the recorder is mainly to set the sound source, set the acquisition data format, get the sampling buffer queue and configuration interface, etc., the code is as follows:
Sampleinfo_ = *sampleformat;
SLANDROIDDATAFORMAT_PCM_EX FORMAT_PCM;
Converttoslsampleformat (&FORMAT_PCM, &sampleinfo_);
Set the source Sldatalocator_iodevice Loc_dev = {sl_datalocator_iodevice,sl_iodevice_audioinput,
Sl_defaultdeviceid_audioinput,null};
Sldatasource audiosrc = Http://www.tuicool.com/articles/{&loc_dev, NULL}; Set audio data pool Sldatalocator_androidsimplebufferqueue LOC_BQ = {sl_datalocator_androidsimplebufferqueue, DEVIC
E_shadow_buffer_queue_len};
Sldatasink audiosnk = {&LOC_BQ, &format_pcm};
Create recorder requires Record_audio Permissions const Slinterfaceid Id[2] = {Sl_iid_androidsimplebufferqueue,
Sl_iid_androidconfiguration};
Const Slboolean Req[2] = {sl_boolean_true, sl_boolean_true}; result = (*slengine)->createaudiorecorder (SLENGINE,&RECOBJECTITF_,&AUDIOSRC, &am
P;audiosnk,sizeof (ID)/sizeof (id[0]), ID, req); Configure preset values for speech recognition SLANDROIDCONFIGURATIONITF INPUTCOnfig;
result = (*recobjectitf_)->getinterface (recobjectitf_,sl_iid_androidconfiguration,&inputconfig);
if (sl_result_success = = RESULT) {SLuint32 presetvalue = sl_android_recording_preset_voice_recognition; (*inputconfig)->setconfiguration (Inputconfig,sl_android_key_recording_preset, &
;p resetvalue,sizeof (SLuint32));
}//Implement recording Object result = (*recobjectitf_)->realize (recobjectitf_, Sl_boolean_false);
Get the Recording interface result = (*recobjectitf_)->getinterface (recobjectitf_, Sl_iid_record, &recitf_); Get the recording queue interface result = (*recobjectitf_)->getinterface (recobjectitf_, Sl_iid_androidsimplebufferqueue, &
; recbufqueueitf_);
Register Recording Queue Callback result = (*recbufqueueitf_)->registercallback (recbufqueueitf_, Bqrecordercallback, this); Initialize audio capture Transit queue Devshadowqueue_ = new Audioqueue (Device_shadow_buffer_queue_len);
The first is to define the sound source data Sldatasource, which contains two members, datalocator data locator and data format, the data format is generally used in the more common PCM data, data Locator generally refers to the storage location after the sound acquisition, divided into four kinds of MIDI buffer queue position, Buffer queue position, input/output device location, and memory location. This verifies that we use PCM data and, in order to be able to capture data more efficiently, uses the storage location of the buffer queue.
followed by the initialization of the audio data pool, the audio data pool refers to the data output, the main settings recorder need to be audio data output location and output format.
After initializing the recording object RECOBJECTITF, get to the recording interface RECITF, the subsequent start recording needs to use this interface. The RECBUFQUEUEITF is the interface of the recording queue, through which the callback interface of the buffer queue is registered. opensl ES Player initialization
The player initialization is similar to recorder, mainly setting up the sound source, setting the acquisition data format, obtaining the sampling buffer queue and configuration interface, etc., the code is as follows:
Sampleinfo_ = *sampleformat;
Initializes the Outputmix, which is used to output sound data result = (*slengine)->createoutputmix (Slengine, &outputmixobjectitf_,
0, NULL, NULL);
Achieve outputmix result = (*outputmixobjectitf_)->realize (outputmixobjectitf_, Sl_boolean_false); Configuring the sound source data sldatalocator_androidsimplebufferqueue Loc_bufq = {sl_datalocator_androidsimplebufferqueue, DEVI
Ce_shadow_buffer_queue_len};
SLANDROIDDATAFORMAT_PCM_EX FORMAT_PCM;
Converttoslsampleformat (&FORMAT_PCM, &sampleinfo_);
Sldatasource audiosrc = Http://www.tuicool.com/articles/{&loc_bufq, &format_pcm};
Configure the audio data output pool Sldatalocator_outputmix Loc_outmix = {sl_datalocator_outputmix, outputmixobjectitf_};
Sldatasink audiosnk = {&loc_outmix, NULL};
/* Initialize player */Slinterfaceid ids[2] = {sl_iid_bufferqueue, sl_iid_volume};
Slboolean Req[2] = {sl_boolean_true, sl_boolean_true}; result = (*slengine)->createaudioplayer (Slengine, &playerobjectitf_, &AUDIOSRC, &amP;audiosnk,sizeof (IDs)/sizeof (Ids[0]), IDs, req);
Implement player result = (*playerobjectitf_)->realize (playerobjectitf_, Sl_boolean_false);
Slassert (result);
Get the Player interface result = (*playerobjectitf_)->getinterface (playerobjectitf_, Sl_iid_play, &playitf_);
Get Volume interface result = (*playerobjectitf_)->getinterface (playerobjectitf_, Sl_iid_volume, &volumeitf_); Gets the buffer queue interface result= (*playerobjectitf_)->getinterface (playerobjectitf_,sl_iid_bufferqueue,&
PLAYBUFFERQUEUEITF_); Register Buffer Interface Callback result= (*playbufferqueueitf_)->registercallback (playbufferqueueitf_, Bqplayercallback, this);
Compared to the initialization of recorder, which has more than one outputmix initialization, Outputmix is mainly used to output data to the loudspeaker, so it can be considered as the initialization of the output mixing object interface.
The final acquired PLAYBUFFERQUEUEITF is the interface of the playback buffer queue, which can be considered to be consistent with the data source of RECBUFQUEUEITF in recorder. In fact, the data in the data buffer queue is uploaded to PLAYBUFFERQUEUEITF by the socket for player to achieve playback. Audio Data Acquisition
The main process of audio data acquisition is to initialize the buffer queue, record the startup settings, and finally start recording, the flowchart is as follows:
The boot size setting is 2, before starting the recording of 2 recording array into the recording memory space, after the start of recording data will be collected in these two arrays, when the recording array is filled will trigger the callback set in the above recorder, the callback to remove the recorded sound data, and send it out through the socket.
Sample_buf *databuf = null;//captured array of audio data
Devshadowqueue_->front (&DATABUF);//Remove the collected array
devshadowqueue _->pop ();//delete head
Databuf->size_ = databuf->cap_;//callback only after the array is full, so size can be set to maximum length
sendudpmessage ( DATABUF);//use UDP to send
sample_buf* freebuf;
while (Freequeue_->front (&freebuf) && Devshadowqueue_->push (freebuf)) {
Freequeue_->pop () ;//delete the idle array used
slresult result = (*BQ)->enqueue (BQ, freebuf->buf_, FREEBUF->CAP_);//Continue the next collection
Sample_ BUF *vienbuf = Allocateonesamplebufs (Getbufsize ());
Freequeue_->push (VIENBUF);//Add a new free array
}
The above is the code in the callback, first Devshadowqueue take out the captured audio data, send it out, and continue the next acquisition, where the while loop is used in order to put as many arrays as possible into the acquisition buffer, Guarantees continuity of the free array (used to store data collected by the microphone). Audio data Transfer
Here the transmission is divided into send and receive, where sending is relatively simple, because at this time the network has established a connection, the direct call to send is good.
void Sendudpmessage (Sample_buf *databuf) {
sendto (client_socket_fd, databuf->buf_, Databuf->size_, 0,
(struct sockaddr *) &server_addr, sizeof (SERVER_ADDR));
}
The receiving part is mainly to put the received data into the play buffer, it is best to pre-play a certain amount of sound data to the playback buffer before starting playback, to avoid the playback of the data cannot get the situation.
Sample_buf *vien_buf = Samplebufs (buf_size);
if (Recvfrom (SERVER_SOCKET_FD, vien_buf->buf_, Buf_size, 0, (struct sockaddr*) &client_addr, &client_addr_ Length) = =-1) {
exit (1);
}
if (getaudioplayer () = NULL) {
getrecbufqueue ()->push (VIEN_BUF);
if (count_buf++ = = 3) {
getaudioplayer ()->playaudiobuffers (Play_kickstart_buffer_count);}
}
Where Getrecbufqueue gets the queue for the play buffer, notifies the player that it will start to play after depositing three arrays. Audio Data Playback
Starting playback After accepting the buffered data needed, the Playaudiobuffers method called here is the way to turn on playback.
Sample_buf *buf = NULL;
if (!playqueue_->front (&buf)) {
uint32_t totalbufcount;
Callback_ (Ctx_, Engine_service_msg_retrieve_dump_bufs,
&totalbufcount);
break;
}
if (!devshadowqueue_->push (BUF)) {break
; The Playerbufferqueue is full.
}
(*playbufferqueueitf_)->enqueue (playbufferqueueitf_,buf->buf_, buf->size_);
Playqueue_->pop (); Delete an array that has already been played
Playqueue is the play queue, if empty, there is no buffer data, where the callback to the place to do error handling, if successfully removed, then put it into the relay queue, and the method of its incoming call to play, and finally delete the play queue in the already played array, After playback is completed, it enters the callback registered by the player playback queue.
Sample_buf *buf;
if (!devshadowqueue_->front (&buf)) {
if (callback_) {
uint32_t count;
Callback_ (Ctx_, Engine_service_msg_retrieve_dump_bufs, &count);
}
return;
}
Devshadowqueue_->pop ();
Buf->size_ = 0;
if (Playqueue_->front (&buf) && Devshadowqueue_->push (buf)) {
(*BQ)->enqueue (BQ, buf-> Buf_, buf->size_);
Playqueue_->pop ();
} else{
sample_buf *buf_temp = new Sample_buf;
buf_temp->buf_ = new uint8_t [buf_size];
Buf_temp->size_ = buf_size;
BUF_TEMP->CAP_ = buf_size;
(*BQ)->enqueue (BQ, buf_temp->buf_, buf_size);
Devshadowqueue_->push (buf_temp);
}
In the callback, the first is to take out the playback data in the relay queue Devshadowqueue, if there is a normal deletion, and continue to remove the play array from the play queue playqueue, and into the devshadowqueue of the relay queue, Devshadowqueue has two functions, one is to ensure the continuity of playback, there is a temporary storage point of playback data. If the current network latency is not receiving playback data, the playback queue can not get the data, here is an array of incoming empty, the experience will find that the sound has a certain time lag, where the logical follow-up to continue to optimize, how to effectively control the sound lag will greatly improve the user experience.
This sharing is mainly the JNI layer of sound collection, transmission and playback process to do the corresponding introduction, if you have better suggestions for optimization, welcome advice.