"Recorder.js+ Baidu speech recognition" Full stack solution technical details

Source: Internet
Author: User

The project needs to use the Baidu voice interface in the web to achieve speech recognition, the use of such a technical solution, but the implementation encountered a lot of problems, found that most of the online articles are only in detail the official example samples provided, the actual development did not provide any valuable advice, and recorder.js is not directly suitable for the voice interface of Baidu AI, so this article will be developed in the details of the record with this, welcome to communicate.

I. Technology STACK selection

demand : Using Baidu Voice interface to implement speech recognition function on the web side

Technology Stack : + + + + React recorder-tool.js recorder.js ExpressBaidu语音识别API

recorder.jsProject Address: Https://github.com/mattdiamond/Recorderjs

Demo Effect :

Two. Front-end development details provide a proxy object for Recorder.js

The main framework of the front-end, React in the basic structure and syntax is not too many problems, in order to use recorder.js , we encapsulated a recorder-tool.js as a proxy, its implementation is simple, is the official example of the file in the example example of the script part of the html package into a singleton object as a recorder.jsThe agent then exposes a set of API calls for the upper layer, the approximate structure is as follows:

import Recorder from ‘./recorder-src‘;//Singletonvar recorder;//start recordfunction startRecord() {    recorder && recorder.record();}//stop recordfunction stopRecord(button) {    recorder && recorder.stop();}//....其他一些方法export default {    init : init,    start: startRecord,    stop: stopRecord,    exportData: exportData,    sendRequest: sendRequest,    clear: clearRecord,    createDownloadLink : createDownloadLink}
To relieve the Exportwav method of the callback Hell

The official example outputs wav data in encoded format this action is accomplished by means of the webworker start and end time of the binary data processing is triggered by the event, recorder.exportWAV( ) receive a callback function as the input parameter, after the obtained wav format data will execute the incoming callback function, If you want to react implement in, you need to write:

//record-page.js...//处理录音-事件监听proce***ecord(){    RecorderTools.exportData(function(blob){        var wav = preProcessData(blob);        //发送请求        axios.post({...})                    .then(function(response){                          handle(response);                   })    });}...

You may have discovered this phenomenon of "callback hell", where deep nesting makes logic complex and the code is highly coupled, and it's very difficult to get some methods react out of the way, we want to use some other way to transform the control of the code, rather than putting a lot of subsequent logic into the exportData( ) method.

    • Method One: Use HTML custom events

We add a listener for a custom event on an existing DOM element and, in the recorder.export recorder.exportWAV( ) callback function of the incoming method, manually initialize a custom event (regardless of the compatibility issue) and recorder.js hang the exported data on the event object. The event is then dispatched on the specified element:

//export datafunction exportData() {    recorder && recorder.exportWAV(function (blob) {        //init event        var exportDone = document.createEvent(‘HTMLEvents‘);            exportDone.initEvent(‘recorder.export‘, true, true);            //add payload            exportDone.data = blob;            //dispatch            document.getElementById(‘panel‘).dispatchEvent(exportDone);    });}

This allows our subsequent processing logic to React continue to write the subsequent business logic in the component in a regular fashion, thus enabling basic separation of duties and separation of code .

    • Method Two: Monitor Webworker

recorder.jsUsing the DOM0 -level event model to webworker communicate with, in order not to overwrite the original function, we can use the DOM2 event model to bind a quota outside the Recorder instance listener:

recorder.worker.addEventListener(‘message‘,function(event){    //event.data中就包含了转换后的WAV数据    processData(event.data);    ...})

This allows us to monitor the transcoding action in our own logic code or in the two-time encapsulated code.

    • Method Three: Promise

Used Promise to implement asynchronous calls, the code of the audio processing is stripped out, the final call method is:

RecorderTools.exportData().then(data){     //继续在React组件文件中编写其他逻辑或调用方法}

The reference code is as follows:

//RecorderTools.js中的方法定义function exportData(){    return new Promise(function(resolve, reject){        recorder && recorder.exportWAV(function(blob){            resolve(blob);        })    });}

Callbacks, event snooping, and promise are all javascript important asynchronous patterns that can be used according to personal preferences and real-world scenarios.

How to submit a Blob object

recorder.jsThe official example you can see is that if you do not export the recording to a locally wav formatted file, we get an Blob object that the Blob object needs to submit using the form form, as follows (using a axios send http request):

 var formData = new FormData();     formData.set(‘recorder.wav‘,blob);//blob即为要发送的数据     axios({            url:‘http://localhost:8927/transmit‘,            method : ‘POST‘,            headers:{                ‘Content-Type‘: ‘multipart/form-data‘//此处也可以赋值为false            },            data:formData        });
Three. recorder.js's function extension

The voice file received by the Baidu AI speech recognition interface needs to meet the following requirements:

    • pcmwavbinary data of a format or format file is encoded after base64 conversion
    • 16000Hz Sample Rate
    • 16bit bit Depth
    • Single channel

To take advantage of the recorder.js above requirements, the source code needs to do some functional expansion. The encoding conversion can be done on the server side, and the recorder.js method in the floatTo16BitPCM( ) name should be to meet the 16bit bit depth of the condition, then we only need to consider the mono and 16000 sampling rate of the two conditions.

The constructor in the source code Recorder can accept parameters, and this parameter will be combined into the properties of the instance config , which numChannles is the number of channels, so we just need to instantiate is the number of incoming custom channels:

new Recorder({    numChannels:1//单声道})

Then look at the 16000 sampling rate this condition, view source code can know, sampleRate the use of the source code, all use the audio stream data source context, that is, sampleRate the corresponding computer sound card sampling rate ( 48000Hz or 44100Hz ), how to get 16000Hz the sample rate of data? For example, a 48000Hz sampling rate of the sound card acquisition of the signal point, 1 seconds to collect 48,000 times, then the 48,000 data to become 16,000 data, the simplest way is to take 4 points per 1 and then make up the new data, that is, the audio collection device transmitted over the sampling rate is fixed, How much sample rate we need, we just need to take a scale factor to convert it, and then discard some of the data points (can also be averaged) on it, the package after the call method is:

new Recorder({    numChannels:1,    sampleRate:16000});

Then in the source code need to do some extension of the function, the key part in the following:

//recorder.js部分源码function exportWAV(type) {    var buffers = [];    for (var channel = 0; channel < numChannels; channel++) {        buffers.push(mergeBuffers(recBuffers[channel], recLength));    }    var interleaved = undefined;    if (numChannels === 2) {        interleaved = interleave(buffers[0], buffers[1]);    } else {        interleaved = buffers[0];        //此处是重点,可以看到对于单声道的情况是没有进行处理的,那么仿照双声道的处理方式来添加采样函数,此处改为interleaved = extractSingleChannel(buffers[0]);    }    var dataview = encodeWAV(interleaved);    var audioBlob = new Blob([dataview], { type: type });    self.postMessage({ command: ‘exportWAV‘, data: audioBlob });}

extractSingleChannel( )The specific implementation referenceinterleave( )方法

/***sampleStep是系统的context.sampleRate/自定义sampleRate后取整的结果,这个方法实现了对单声道的*采样数据处理。*/function extractSingleChannel(input) {    //如果此处不按比例缩短,实际输出的文件会包含sampleStep倍长度的空录音    var length = Math.ceil(input.length / sampleStep);    var result = new Float32Array(length);    var index = 0,        inputIndex = 0;    while (index < length) {        //此处是处理关键,算法就是输入的数据点每隔sampleStep距离取一个点放入result        result[index++] = input[inputIndex];        inputIndex += sampleStep;    }    return result;}

In this way, the exportWAV( ) data stored in the Blob object of the method output can satisfy the requirement of the recognition of the voice of Baidu.

Four. Service-side development details

On the server side we use the Express framework to deploy a message relay service, where the knowledge points involved are relatively small, you can use the Baidu AI nodejs-sdk to achieve, can also be self-encapsulation, authorization verification method is almost universal, according to official documents to do it.

Forms submitted through Multipart/form-data cannot be directly req.body or req.params processed, and are handled using the officially recommended middleware, which is Multer relatively straightforward and is directly attached to the author's reference code:

One thing to note here is that when instantiated, the Multer converted objects are not the same when the parameters are passed and are not passed, and if relevant scenarios can be printed directly on the console to ensure that the correct properties are being used.

"Recorder.js+ Baidu speech recognition" Full stack solution technical details

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.