Baidu Speech Recognition Service--speech recognition REST API Development notes

Source: Internet
Author: User

In the previous project used the Baidu Speech recognition service, here to make a note. Here is still to emphasize with you, the best learning materials is the official website. I'm just a note here, on the one hand to organize the idea, on the other hand, convenient later I use the time can be quickly recalled.

What is the Baidu speech recognition service?

The Baidu Speech recognition service can recognize a voice file (a specified format, not all formats) as text. Speech recognition We have been contacted, the phone input method has a voice recognition service.

What is the REST API for Baidu speech recognition?

According to the official website

The industry pioneered speech recognition Rest API, HTTP request, can be applied to any platform of speech recognition, give you the greatest degree of freedom!

Simply put, there is no need to write code in the developer's project, or introduce a jar package. REST API is to convert audio files to a specific format, through the HTTP request sent to Baidu speech recognition server, by the Baidu Speech recognition server for speech recognition, and finally return the recognized text.

In my opinion, she can be very convenient to call, we do not have to maintain the voice recognition part of the code, access is very simple, the key is that she is free !

The way it's used is simply
1, according to the official website of Baidu voice recognition to provide
APP ID and API Key get Accesstoken.
2, according to the previous step Accesstoken together with other request parameters to the Baidu voice recognition Gateway issued a request to obtain the recognized text.

Do you feel a bit like the development of the public platform? That's true. The development of public platforms is really about getting tokens, and then requesting other data through tokens.

Integration steps

This integration step refers to "baidu_voice_rest_api_manual". We recommend that you first download this information to learn.

1th Step: Register as Baidu Developer, create application, get API KeyAnd Secret Key

This step is very simple, there are operating tips on the official website, there is not much to introduce.

2nd Step: Open the speech recognition service

The process of opening the speech recognition service is also very simple, you can do it yourself or refer to the official documentation. It is important to note that the "Voice recognition" service will be able to receive 50,000 online call quotas per day after the first successful launch.

If we call more than 50,000 times a day, you can apply to Baidu to increase the number of times, is said to be free, great praise.

The following steps are critical, because we're going to start writing code.

3rd step: Get access Token

In short, it is a request to the gateway of Baidu OAuth2.0 authorization service to parse the returned data (usually a string) and parse the Access Token we want. The picture below is captured from the official website and is written in great detail.

Description: In fact, the use of API key and Secret key and a fixed value of the parameters to the Baidu OAuth2.0 authorization Service Gateway issued a POST request, if the request is successful, parse the returned string, from which to parse the access Tok En for use.

To help explain the problem, the code below is not well-formed, just a test method and is not recommended to be applied directly to the production environment.

This example uses the HttpClient framework to send a POST request, and HttpClient's Gradle dependency is:

compile "org.apache.httpcomponents:httpclient:4.5.2"

Example code:

/** * Get token, recommended POST method */@Test Public void test01(){Try{Closeablehttpclient httpClient = Httpclients.createdefault (); HttpPost HttpPost =NewHttpPost ("Https://"); List<namevaluepair> Nvps =NewArraylist<> (); Nvps.add (NewBasicnamevaluepair ("Grant_type","Client_credentials")); Nvps.add (NewBasicnamevaluepair ("client_id", ApiKey)); Nvps.add (NewBasicnamevaluepair ("Client_secret", Secretkey)); Httppost.setentity (NewUrlencodedformentity (Nvps)); responsehandler<string> ResponseHandler =NewResponseHandler () {@Override             PublicStringHandleresponse(HttpResponse response)throwsClientprotocolexception, IOException {intStatus = Response.getstatusline (). Getstatuscode ();if(Status >= $&& Status < -) {httpentity entity = response.getentity ();Try{returnEntity! =NULL? Entityutils.tostring (Entity):NULL; }Catch(ParseException ex) {Throw NewClientprotocolexception (ex); }                }Else{Throw NewClientprotocolexception ("Unexpected response status:"+ status);        }            }        };        String responsebody = Httpclient.execute (Httppost,responsehandler);    System.out.println (responsebody); }Catch(Unsupportedencodingexception e)    {E.printstacktrace (); }Catch(Clientprotocolexception e)    {E.printstacktrace (); }Catch(IOException e)    {E.printstacktrace (); }}

Return Data:

{"Access_token":"24.463f2a9f7ce6721fe4d15568f812c086.2592000.1469627568.282335-7038695","Session_key":"9mzddxlm148ma1qmcnnrxgflybu9voknbuy\/8wsj1r4rusev1bjp9gtkp6l6svdnjx4bzxe5zpjoqzta2k7o0mm9l0z4","Scope":"public audio_voice_assistant_get wise_adapt lebo_resource_base lightservice_public hetu_basic lightcms_map_poi Kaidian_kaidian ","Refresh_token":"25.f77abdb8f638404747dd969615c7b557.315360000.1782395568.282335-7038695","Session_secret":"3efb3872a362beacab28879eed85497b","expires_in":2592000}

After formatting:

We need to parse out the Access_token, JSON string parsing framework has a lot of Fastjson, Jackson, Json-lib, Gson, here do not do more introduction.

Also want to explain, because the effective time of Access_token is 2592000 (seconds), that is 30x24x60x60 (seconds), 30 days, so there is no need to every request to get access_token, suggest Access_token In the application's cache, if it fails, then to get it can improve the efficiency of the application, which is the same as the public platform development.

4th Step: Call the speech recognition interface (implicit send) based on Access Token

Description: We are using implicit send, that is, not to send real audio files, but rather send an audio file converted into a byte array. Here must crossing the net description, strictly call, will recognize the ideal result. The difficulty is the conversion of the audio format.

I looked at the official website documentation and sample code, which was successfully invoked after repeated debugging. In view of the limitations of this space, please first crossing the Web documentation, I do not copy here.

For the sake of convenience, the sample code is used as the code to illustrate the problem and is not recommended for use in a production environment:

/** * 识别英文 */@Testpublicvoidtest02(){    recognize("voice_en.wav","en");}

The above test method calls the method of speech recognition, the method passes two parameters, one is the full path of the file, the other is the Chinese or English parameters.


/**  * 请求语音识别的时候使用  */privatestaticfinal"";
PrivatevoidRecognize (StringWavname,StringLanguage) {File wavfile =NewFile (Wavname); HttpPost HttpPost =NULL; Closeablehttpresponse response =NULL;    Closeablehttpclient httpClient = Httpclients.createdefault (); HttpPost =NewHttpPost (Speech_recognition_url); Speechrecognitionrequestentity requestentity =NewSpeechrecognitionrequestentity ();//Voice compression format: Please fill in the official website document PCM (uncompressed), WAV, opus, Speex, AMR, X-flac One, not case-sensitiveRequestentity.setformat ("WAV");number of channels, only mono, please fill in 1Requestentity.setchannel ("1");///sample rate, support 8000 or 16000 (this type is int, cannot be set to String type, about how the sample rate is converted, see below)Requestentity.setrate (16000);//Todo here should determine whether Accesstoken expires, handle the exception, and if it expires, it should be re-acquired AccesstokenRequestentity.settoken ("24.463f2a9f7ce6721fe4d15568f812c086.2592000.1469627568.282335-7038695");//Cuid seems to be free to fill inRequestentity.setcuid ("Goodluck"); Requestentity.setlen (Wavfile.length ());//Official website said: Speech to pass the real voice data, need to be Base64 encoded    //Focus on: See the method behind encapsulation, which is to convert a file into a byte array of the specified formatRequestentity.setspeech (Handlerwavfile (wavfile));//language selection, Chinese =zh, Cantonese =ct, English =en, case insensitive, default ChineseRequestentity.setlan (language);//Key point 1: Convert request parameters to JSON format    StringRequestentityjson =JSON. tojsonstring (requestentity);//Key point 2: Package stringentity, in order to solve the problem of Chinese garbled, you should set the encodingstringentity entity =NewStringentity (Requestentityjson.tostring (),"UTF-8"); Entity.setcontentencoding ("UTF-8");//Key point 3: Set Stringentity's ContentTypeEntity.setcontenttype ("Application/json");    Httppost.setentity (entity); responsehandler<String> ResponseHandler =Newresponsehandler<String> () {@Override publicStringHandleresponse (HttpResponse response) throws Clientprotocolexception, IOException {StringResdata =NULL; int statusCode = Response.getstatusline (). Getstatuscode ();if(StatusCode >= $&& StatusCode < -) {httpentity httpentity = response.getentity (); Resdata = entityutils.tostring (httpentity,"Utf-8");            Entityutils.consume (httpentity); }returnResdata; }    };Try{StringResponsestr = Httpclient.execute (Httppost,responsehandler);    System.out.println (RESPONSESTR); }Catch(IOException e)    {E.printstacktrace (); }}

Speechrecognitionrequestentity class (The Get and set methods are omitted):

 Public  class speechrecognitionrequestentity {    //Speech compression format    PrivateString format;/** * Note that the data type of the sample rate must be int, not String * /    //sample rate, support 8000 or 16000, in our project, write 16000    Private intRatenumber of channels, only mono, please fill in 1    PrivateString Channel;//Developer authentication key    PrivateString token;//user ID, it is recommended to use device MAC address device uniqueness parameters such as phone IMEI    //Todo seems to be free to fill in, only can    PrivateString cuid;/** * Note: The length of the original voice is filled in here, not the length of the voice encoded using Base64 */    //original voice length, Unit bytes    Private LongLen//real voice data, need to be Base64 encoded    PrivateString speech;//language selection, Chinese =zh, Cantonese =ct, English =en, case insensitive, default Chinese    PrivateString lan;}

This part of the code excerpt from the official website sample code:

Private byte[] LoadFile (Filefile) throws IOException {InputStream is= New FileInputStream (file); Longlength=file.length(); byte[] bytes = new byte[(int)length]; IntOffset=0; int numread =0; while(Offset< bytes.length&& (Numread = is.Read(Bytes,Offset, bytes.length-Offset)) >=0) {Offset+ = Numread; }if(Offset< bytes.length) { is. Close (); throw New IOException ("Could not completely read file"+file. GetName ()); } is. Close ();returnbytes;}

return Result:

{"corpus_no ":  "6300874524819907792"  , "err_msg ":  "success."  , "err_ No  ": 0  ,"  Result  ": [" one day on the cage club got bad news, "," Span class= "hljs-string" > "one day on the case club got bad news," ,  "one day in this case Club got bad news, ", " one day in the cage club got the bad news, ", " one day on the case club got the bad news, "] ," sn  ":  "843115237281467036671"  } 

After formatting:

The above describes how the code is written. But in the development, I encountered a problem, to convert the audio file to Baidu Speech recognition can recognize the format. Please see the official documentation for details.

So, in order to test, I use format factory software for format conversion. The following are the parameters of the format conversion.

Happily, after the conversion of the format factory software audio files can be Baidu Speech recognition REST Service recognition, recognition is also good, this is exciting.

However, I have another problem, on the server can not always be sent from the client audio files are used in the format factory conversion. So, I found a very useful software Sox on the Linux platform. command format for format conversion using SOX commands:

-r16000-c1 生成的文件名全路径

Next, I continue to look for information on Linux on the Linux platform services can use the runtime and the process class in Java to run external programs.
Reference code:

StringnewString"/usr/bin/sox""-r""16000""-c1", soundFileName_16000 };Process psProcess = Runtime.getRuntime().exec(cmdStrings);psProcess.waitFor();

Here, the difficulties of speech recognition development have been overcome. Now summed up, really is a lot of harvest. Here first to make a record, some knowledge point of mastery I am not very thorough, follow up also need to perfect again.

Baidu Speech Recognition Service--speech recognition REST API Development notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.