The project needs to use the Baidu voice interface in the web to achieve speech recognition, the use of such a technical solution, but the implementation encountered a lot of problems, found that most of the online articles are only in detail the official example samples provided, the actual development did not provide any valuable advice, and recorder.js
is not directly suitable for the voice interface of Baidu AI, so this article will be developed in the details of the record with this, welcome to communicate.
I. Technology STACK selection
demand : Using Baidu Voice interface to implement speech recognition function on the web side
Technology Stack : + + + + React
recorder-tool.js
recorder.js
Express
Baidu语音识别API
recorder.js
Project Address: Https://github.com/mattdiamond/Recorderjs
Demo Effect :
Two. Front-end development details provide a proxy object for Recorder.js
The main framework of the front-end, React
in the basic structure and syntax is not too many problems, in order to use recorder.js
, we encapsulated a recorder-tool.js
as a proxy, its implementation is simple, is the official example of the file in the example
example of the script part of the html
package into a singleton object as a recorder.js
The agent then exposes a set of API
calls for the upper layer, the approximate structure is as follows:
import Recorder from ‘./recorder-src‘;//Singletonvar recorder;//start recordfunction startRecord() { recorder && recorder.record();}//stop recordfunction stopRecord(button) { recorder && recorder.stop();}//....其他一些方法export default { init : init, start: startRecord, stop: stopRecord, exportData: exportData, sendRequest: sendRequest, clear: clearRecord, createDownloadLink : createDownloadLink}
To relieve the Exportwav method of the callback Hell
The official example outputs wav
data in encoded format this action is accomplished by means of the webworker
start and end time of the binary data processing is triggered by the event, recorder.exportWAV( )
receive a callback function as the input parameter, after the obtained wav
format data will execute the incoming callback function, If you want to react
implement in, you need to write:
//record-page.js...//处理录音-事件监听proce***ecord(){ RecorderTools.exportData(function(blob){ var wav = preProcessData(blob); //发送请求 axios.post({...}) .then(function(response){ handle(response); }) });}...
You may have discovered this phenomenon of "callback hell", where deep nesting makes logic complex and the code is highly coupled, and it's very difficult to get some methods react
out of the way, we want to use some other way to transform the control of the code, rather than putting a lot of subsequent logic into the exportData( )
method.
- Method One: Use HTML custom events
We add a listener for a custom event on an existing DOM element and, in the recorder.export
recorder.exportWAV( )
callback function of the incoming method, manually initialize a custom event (regardless of the compatibility issue) and recorder.js
hang the exported data on the event object. The event is then dispatched on the specified element:
//export datafunction exportData() { recorder && recorder.exportWAV(function (blob) { //init event var exportDone = document.createEvent(‘HTMLEvents‘); exportDone.initEvent(‘recorder.export‘, true, true); //add payload exportDone.data = blob; //dispatch document.getElementById(‘panel‘).dispatchEvent(exportDone); });}
This allows our subsequent processing logic to React
continue to write the subsequent business logic in the component in a regular fashion, thus enabling basic separation of duties and separation of code .
- Method Two: Monitor Webworker
recorder.js
Using the DOM0 -level event model to webworker
communicate with, in order not to overwrite the original function, we can use the DOM2 event model to bind a quota outside the Recorder instance listener:
recorder.worker.addEventListener(‘message‘,function(event){ //event.data中就包含了转换后的WAV数据 processData(event.data); ...})
This allows us to monitor the transcoding action in our own logic code or in the two-time encapsulated code.
Used Promise
to implement asynchronous calls, the code of the audio processing is stripped out, the final call method is:
RecorderTools.exportData().then(data){ //继续在React组件文件中编写其他逻辑或调用方法}
The reference code is as follows:
//RecorderTools.js中的方法定义function exportData(){ return new Promise(function(resolve, reject){ recorder && recorder.exportWAV(function(blob){ resolve(blob); }) });}
Callbacks, event snooping, and promise are all javascript
important asynchronous patterns that can be used according to personal preferences and real-world scenarios.
How to submit a Blob object
recorder.js
The official example you can see is that if you do not export the recording to a locally wav
formatted file, we get an Blob
object that the Blob
object needs to submit using the form form, as follows (using a axios
send http
request):
var formData = new FormData(); formData.set(‘recorder.wav‘,blob);//blob即为要发送的数据 axios({ url:‘http://localhost:8927/transmit‘, method : ‘POST‘, headers:{ ‘Content-Type‘: ‘multipart/form-data‘//此处也可以赋值为false }, data:formData });
Three. recorder.js's function extension
The voice file received by the Baidu AI speech recognition interface needs to meet the following requirements:
pcm
wav
binary data of a format or format file is encoded after base64 conversion
- 16000Hz Sample Rate
- 16bit bit Depth
- Single channel
To take advantage of the recorder.js
above requirements, the source code needs to do some functional expansion. The encoding conversion can be done on the server side, and the recorder.js
method in the floatTo16BitPCM( )
name should be to meet the 16bit bit depth of the condition, then we only need to consider the mono and 16000 sampling rate of the two conditions.
The constructor in the source code Recorder
can accept parameters, and this parameter will be combined into the properties of the instance config
, which numChannles
is the number of channels, so we just need to instantiate is the number of incoming custom channels:
new Recorder({ numChannels:1//单声道})
Then look at the 16000 sampling rate this condition, view source code can know, sampleRate
the use of the source code, all use the audio stream data source context, that is, sampleRate
the corresponding computer sound card sampling rate ( 48000Hz
or 44100Hz
), how to get 16000Hz
the sample rate of data? For example, a 48000Hz
sampling rate of the sound card acquisition of the signal point, 1 seconds to collect 48,000 times, then the 48,000 data to become 16,000 data, the simplest way is to take 4 points per 1 and then make up the new data, that is, the audio collection device transmitted over the sampling rate is fixed, How much sample rate we need, we just need to take a scale factor to convert it, and then discard some of the data points (can also be averaged) on it, the package after the call method is:
new Recorder({ numChannels:1, sampleRate:16000});
Then in the source code need to do some extension of the function, the key part in the following:
//recorder.js部分源码function exportWAV(type) { var buffers = []; for (var channel = 0; channel < numChannels; channel++) { buffers.push(mergeBuffers(recBuffers[channel], recLength)); } var interleaved = undefined; if (numChannels === 2) { interleaved = interleave(buffers[0], buffers[1]); } else { interleaved = buffers[0]; //此处是重点,可以看到对于单声道的情况是没有进行处理的,那么仿照双声道的处理方式来添加采样函数,此处改为interleaved = extractSingleChannel(buffers[0]); } var dataview = encodeWAV(interleaved); var audioBlob = new Blob([dataview], { type: type }); self.postMessage({ command: ‘exportWAV‘, data: audioBlob });}
extractSingleChannel( )
The specific implementation referenceinterleave( )方法
/***sampleStep是系统的context.sampleRate/自定义sampleRate后取整的结果,这个方法实现了对单声道的*采样数据处理。*/function extractSingleChannel(input) { //如果此处不按比例缩短,实际输出的文件会包含sampleStep倍长度的空录音 var length = Math.ceil(input.length / sampleStep); var result = new Float32Array(length); var index = 0, inputIndex = 0; while (index < length) { //此处是处理关键,算法就是输入的数据点每隔sampleStep距离取一个点放入result result[index++] = input[inputIndex]; inputIndex += sampleStep; } return result;}
In this way, the exportWAV( )
data stored in the Blob object of the method output can satisfy the requirement of the recognition of the voice of Baidu.
Four. Service-side development details
On the server side we use the Express
framework to deploy a message relay service, where the knowledge points involved are relatively small, you can use the Baidu AI nodejs-sdk
to achieve, can also be self-encapsulation, authorization verification method is almost universal, according to official documents to do it.
Forms submitted through Multipart/form-data cannot be directly req.body
or req.params
processed, and are handled using the officially recommended middleware, which is Multer
relatively straightforward and is directly attached to the author's reference code:
One thing to note here is that when instantiated, the Multer
converted objects are not the same when the parameters are passed and are not passed, and if relevant scenarios can be printed directly on the console to ensure that the correct properties are being used.
"Recorder.js+ Baidu speech recognition" Full stack solution technical details