[IOS10 SpeechRecognition] The best practice of current translation of speech recognition, speechrecognition
First of all, I would like to emphasize the need for "Speech Recognition" in four aspects: the user speaks and immediately converts what the user says into text display !, This is what developers really need.
Before proceeding to the demand, Google Baidu was used to see if there were any wheels. The results were really good. They all marked the title of this library for deep learning, it calls the api to extract a local voice file from the URL for identification? The most basic requirements cannot be met.
Today, we have sorted out two ways to implement this function:
First, let's take a look at two types of request recognition APIs: SFSpeechAudioBufferRecognitionRequest and sfspeechurchill lrecognitionrequest. There are also two ways to implement resolution: block and delegate. I will combine the two methods to cover all these contents.
Before development, You need to register the user's privacy permission in info. plist. Although you already know, I 'd like to talk about it for the integrity of this article.
Privacy - Microphone Usage DescriptionPrivacy - Speech Recognition Usage Description
Use requestAuthorization to request Permissions
[SFSpeechRecognizer requestAuthorization: ^ (SFSpeechRecognizerAuthorizationStatus) {// judgment on result enumeration}];
When you start recording for the first time, you can select the microphone permission.
1. Add block to SFSpeechAudioBufferRecognitionRequest
The following steps are used to achieve this:
① Multimedia engine Establishment
You need to add the following attributes to the member variables to start release.
@property(nonatomic,strong)SFSpeechRecognizer *bufferRec;@property(nonatomic,strong)SFSpeechAudioBufferRecognitionRequest *bufferRequest;@property(nonatomic,strong)SFSpeechRecognitionTask *bufferTask;@property(nonatomic,strong)AVAudioEngine *bufferEngine;@property(nonatomic,strong)AVAudioInputNode *buffeInputNode;
We recommend that you write the initialization method in the startup method to enable and disable it. If you want to use the global initialization method, you can initialize it only once.
self.bufferRec = [[SFSpeechRecognizer alloc]initWithLocale:[NSLocale localeWithLocaleIdentifier:@"zh_CN"]]; self.bufferEngine = [[AVAudioEngine alloc]init]; self.buffeInputNode = [self.bufferEngine inputNode];
② Create a Speech Recognition request
self.bufferRequest = [[SFSpeechAudioBufferRecognitionRequest alloc]init]; self.bufferRequest.shouldReportPartialResults = true;
In shouldReportPartialResults, you can set this attribute to enable callback when you finish a sentence, or for every broken speech clip.
③ Create a task and execute the task
// The code outside the block is also a preparation task, such as self. bufferRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init]; self. bufferRequest. shouldReportPartialResults = true; _ weak ViewController * weakSelf = self; self. bufferTask = [self. bufferRec recognitionTaskWithRequest: self. bufferRequest resultHandler: ^ (SFSpeechRecognitionResult * _ Nullable result, NSError * _ Nullable error) {// callback after the result is received}]; // listen to a flag and splice stream files AVAudioFormat * format = [self. buffeInputNode outputFormatForBus: 0]; [self. buffeInputNode installTapOnBus: 0 bufferSize: 1024 format: format block: ^ (AVAudioPCMBuffer * _ Nonnull buffer, AVAudioTime * _ Nonnull when) {[weakSelf. bufferRequest appendAudioPCMBuffer: buffer] ;}]; // prepare and start the engine [self. bufferEngine prepare]; NSError * error = nil; if (! [Self. bufferEngine startAndReturnError: & error]) {NSLog (@ "% @", error. userInfo) ;}; self. showBufferText. text = @ "waiting for command ..... ";
Anyone who knows a little about runloop knows that the code outside the block is executed first in the previous running loop. The normal starting process is to initialize parameters first and then start the engine, then, the callback method of the splicing buffer will be continuously called. After a unit of buffer is enough, the callback of the preceding speech recognition result will be called back, sometimes the buffer method will be called without sound, but the above resulthandler callback will not be called. This method should have a fault tolerance internally (volume power will be automatically ignored if it is not set ).
④ Received result callback
The result callback is in the block in the resultHandler above. After the execution, the returned parameters are result and error. You can perform some operations on the result.
if (result != nil) { self.showBufferText.text = result.bestTranscription.formattedString; } if (error != nil) { NSLog(@"%@",error.userInfo); }
This result type is SFSpeechRecognitionResult. You can check the attributes in the result, including the best results and an array of alternative results. If you want to perform exact match, you should also filter the answers to the alternative arrays.
⑤ End listening
[self.bufferEngine stop]; [self.buffeInputNode removeTapOnBus:0]; self.showBufferText.text = @""; self.bufferRequest = nil; self.bufferTask = nil;
The bus in the middle is a node with a temporary identifier, which is similar to the port concept.
Ii. sfspeechurch lrecognitionrequest and delegate Methods
The main difference between block and delegate is that the block method is concise, and delegate can have more space for custom requirements, because there are more result callback lifecycle methods.
There is nothing to say about these five methods, as the name suggests. Note that the second method is called multiple times, and the third method is called once when a sentence is completed.
// Called when the task first detects speech in the source audio- (void)speechRecognitionDidDetectSpeech:(SFSpeechRecognitionTask *)task;// Called for all recognitions, including non-final hypothesis- (void)speechRecognitionTask:(SFSpeechRecognitionTask *)task didHypothesizeTranscription:(SFTranscription *)transcription;// Called only for final recognitions of utterances. No more about the utterance will be reported- (void)speechRecognitionTask:(SFSpeechRecognitionTask *)task didFinishRecognition:(SFSpeechRecognitionResult *)recognitionResult;// Called when the task is no longer accepting new audio but may be finishing final processing- (void)speechRecognitionTaskFinishedReadingAudio:(SFSpeechRecognitionTask *)task;// Called when the task has been cancelled, either by client app, the user, or the system- (void)speechRecognitionTaskWasCancelled:(SFSpeechRecognitionTask *)task;// Called when recognition of all requested utterances is finished.// If successfully is false, the error property of the task will contain error information- (void)speechRecognitionTask:(SFSpeechRecognitionTask *)task didFinishSuccessfully:(BOOL)successfully;
The idea of this implementation is to first implement a recorder (you can manually control the start and end, or the synchronous recorder that automatically starts to end based on the tone size is similar to the talking Tom Cat ), then, save the recording file to a local directory and use URLRequest to read the file for translation. The steps are as follows:
① Create a synchronous Recorder
The following attributes are required:
/** Recording device */@ property (nonatomic, strong) AVAudioRecorder * recorder;/** listener device */@ property (nonatomic, strong) AVAudioRecorder * monitor; /** recording File URL */@ property (nonatomic, strong) NSURL * recordURL;/** listener URL */@ property (nonatomic, strong) NSURL * monitorURL; /** timer */@ property (nonatomic, strong) nstmer * timer;
Attribute Initialization
// Set NSDictionary * recordsetiterator = [[NSDictionary alloc] iterator: [NSNumber numberWithFloat: 14400.0], iterator, [NSNumber numberWithInt: iterator], AVFormatIDKey, [NSNumber numberWithInt: 2], AVNumberOfChannelsKey, [NSNumber numberWithInt: AVAudioQualityMax], AVEncoderAudioQualityKey, nil]; NSString * recordPath = [NSTemporaryDirectory () Metadata: @ "record. caf "]; _ recordURL = [NSURL fileURLWithPath: recordPath]; _ recorder = [[AVAudioRecorder alloc] initWithURL: _ recordURL settings: recordSettings error: NULL]; // listener NSString * monitorPath = [NSTemporaryDirectory () stringByAppendingPathComponent: @ "monitor. caf "]; _ monitorURL = [NSURL fileURLWithPath: monitorPath]; _ monitor = [[AVAudioRecorder alloc] initWithURL: _ monitorURL settings: recordSettings error: NULL]; _ monitor. meteringEnabled = YES;
In the dictionary set by parameters, the constants do not need to be too hot. This is directly used by the code written earlier, and the optimal speech quality set above.
② Start and end
To control the start and end of the microphone environment through the sound size, you need to set an additional listener outside the recorder to view the voice size. You can use the peakPowerForChannel method to view the sound environment volume of the current microphone environment. There is also a timer to control the cycle of volume detection. The Code is as follows:
-(Void) setupTimer {[self. monitor record]; self. timer = [NSTimer scheduledTimerWithTimeInterval: 0.1 target: self selector: @ selector (updateTimer) userInfo: nil repeats: YES]; // Dong baoran blog Park} // Method for starting and ending listening-(void) updateTimer {// No [self. monitor updateMeters]; // gets the volume of the 0-channel, with no sound at all-160.0, and 0 is the maximum volume float power = [self. monitor peakPowerForChannel: 0]; // NSLog (@ "% f", power); if (power>-20) {if (! Self. recorder. isRecording) {NSLog (@ "Start recording"); [self. recorder record] ;}} else {if (self. recorder. isRecording) {NSLog (@ "stop recording"); [self. recorder stop]; [self recognition] ;}}
③ Voice recognition task request
-(Void) recognition {// the clock stops [self. timer invalidate]; // the listener also stops [self. monitor stop]; // Delete the listener's recording file [self. monitor deleteRecording]; // creates a Speech Recognition operation object SFSpeechRecognizer * rec = [[SFSpeechRecognizer alloc] initWithLocale: [NSLocale handler: @ "zh_CN"]; // SFSpeechRecognizer * rec = [[SFSpeechRecognizer alloc] initWithLocale: [NSLocale localeWithLocaleIdentifier: @ "en_ww"]; // Dong baoran blog Park // parse SFSpeechRecognitionRequest through a local audio file * request = [[initalloc] initWithURL: _ recordURL]; [rec recognitionTaskWithRequest: request delegate: self];}
The code used to identify and convert Chinese characters through a local file should be the most uploaded on the Internet, because it can be written out without the need to move your mind. However, this piece of code is basically useless. (Apart from the long-pressed speech-to-text function, which of the other App's needs cannot be directly resolved using a local audio file to automatically generate mp3 lyrics? Jay Chou's song resolution is difficult, and the requirement for speech recognition time cannot exceed 1 minute)
④ Result callback proxy method
- (void)speechRecognitionTask:(SFSpeechRecognitionTask *)task didFinishRecognition:(SFSpeechRecognitionResult *)recognitionResult{ NSLog(@"%s",__FUNCTION__); NSLog(@"%@",recognitionResult.bestTranscription.formattedString); [self setupTimer];}
This method is the most used. In addition, the callback method at different times can be added as needed. Here is a simple demonstration. You can view more functions in my demo program.
Https://github.com/dsxNiubility/SXSpeechRecognitionTwoWays
IOS10 has made a great leap in speech-related recognition functions, mainly reflected in the speech recognition, in addition, sirikit can transparently transmit external information to the App for operations. However, it has obvious limitations for the time being. It can only implement the types of messages such as car hailing, sending information, and so on the official website, even the type of "open Meituan search for grilled fish shop" cannot be identified, so we cannot do too much research at the moment. Please wait for updates after Apple.