Speech Recognition Technology

Source: Internet
Author: User

Speech recognition through Google Voice interface

 

Recently, there is a need to implement the speech recognition function in the project. It took a few days to get it done. There was no clue at the beginning, and the information found on the Internet was messy, or very old implementation methods and some simple code snippets. So I decided to share my experience with you.

 

 

The process for implementing Speech Recognition in IOS is as follows:

Recording-> PCM format-> convert wav-> convert FLAC-> send a request to Google-> wait for the returned JSON data-> parse the data;

 

First, if you want to use Google's interface for speech recognition, you must know the following points:

1. How to send a POST request. (You can use the open-source libraries asihttprequest and afnetworking, which encapsulate network requests and are easy to use );

 

2. understand the audio format PCM, WAV, and FLAC (the relationship between the three audio formats is that Google interfaces only accept the FLAC audio format, and other formats cannot be recognized, in iOS, FLAC audio format cannot be recorded, and WAV audio can only be recorded. Therefore, one-step conversion is required );

 

3. Learn how to use and configure the avaudiorecorder class.

The avaudiorecorder class is required for recording in IOS. The instance method of this class is as follows:

-(ID) initwithurl :( nsurl *) URL settings :( nsdictionary *) settings error :( nserror **) outerror;
URL: the location where the sound is stored after the recording is complete,

Settings: sets the recording sound parameters. There is only one key for you to talk about avformatidkey. This key determines the format of the recorded sound. We need to record it into the lpcm format, uncompress the original audio data so that we can convert it. Therefore, the kaudioformatlinearpcm value is used. other keys can be viewed in the help document,

 

Nsmutabledictionary * recordsetting = [[nsmutabledictionaryalloc] init];

[Recordsetting setvalue: [nsnumbernumberwithint: kaudioformatlinearpcm] forkey: avformatidkey];

[Recordsetting setvalue: [nsnumbernumberwithfloat: 16000.0] forkey: avsampleratekey];

[Recordsetting setvalue: [nsnumbernumberwithint: 1] forkey: avnumberofchannelskey];

[Recordsetting setvalue: [nsnumbernumberwithint: 16] forkey: avlinearpcmbitdepthkey];

[Recordsetting setvalue: [nsnumbernumberwithint: avaudioqualityhigh] forkey: avencoderaudioqualitykey];

[Recordsetting setvalue: @ (NO) forkey: avlinearpcmisbigendiankey];

After setting this object, you can start recording. After we get the audio data in the lpcm format, we start converting it to WAV for the first time. What is WAV? Click to know what WAV is and then start transcoding. transcoding is implemented in C, and some code is included in the package file below;

 

After the file is converted to WAV, you still need to convert wav to FLAC to upload to Google interface for speech recognition. Fortunately, someone encapsulated a FLAC open source library on GitHub: https://github.com/jhurt/FLACiOS

After downloading the source code, you must remove the Ogg support. Otherwise, the compilation will fail. Click file-directly, compile and enter. The products directory gets the. A and framework files, and add these two files together to your project.

After the voice processing is completed, a request is sent to the Google Voice interface. I am sending a request using asi. You can use other libraries to send the request. After all, ASI is a little too old. I just get used to it. The filepath here is the address of the converted FLAC file;

# Define google_audio_url @ "http://www.google.com/speech-api/v1/recognize? Xjerr = 1 & client = chromium & lang = ZH-CN"

 

Nsurl * url = [nsurl urlwithstring: google_audio_url];

Asiformdatarequest * request = [asiformdatarequestrequestwithurl: url];

[Request addrequestheader: @ "Content-Type" value: @ "audio/X-Flac; rate = 16000"];

[Request appendpostdatafromfile: filepath];

[Request setrequestmethod: @ "Post"];

Request. completionblock = ^ {

Nslog (@ "JSON: % @", request. responsestring );

Nsdata * Data = request. responsedata;

Id ret = nil;

Ret = [nsjsonserializationjsonobjectwithdata: Data options: nsjsonreadingmutablecontainerserror: Nil];

Nslog (@ "RET % @", RET );

Results (RET );

};

Request. failedblock = ^ {

Uialertview * Alert = [[uialertviewalloc] initwithtitle: @ "error" message: @ "Network request error" delegate: nilcancelbuttontitle: @ "OK" otherbuttontitles: nil, nil];

[Alert show];

Nslog (@ "Network request error: % @", request. Error );

};

[Request startsynchronous];

The following is the JSON resolution response returned by Google --------------------------------------------------------------------------------------------

 

If (DIC = nil | [DIC count] = 0 ){

Return;

}

Nsarray * array = [DIC objectforkey: @ "hypotheses"];

If ([arraycount]) {

Nsdictionary * dic_hypotheses = [arrayobjectatindex: 0];

Nsstring * scontent = [nsstringstringwithformat: @ "% @", [dic_hypothesesobjectforkey: @ "utterance"];

Self. textfield. Text = scontent;

}

 

Here is a test project I wrote all code http://pan.baidu.com/s/1kTMBBk7; can be used directly. Please leave a message if you have any questions. We are looking forward to your discussion.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.