Summary: iOS10 speech recognition Framework Speechframework application
First, the introduction
IOS10 system is a more breakthrough system, which has opened a lot of practical development interfaces in message,notification and so on. This blog will focus on the new Speechframework framework introduced in IOS10. With this framework, developers can easily add speech recognition to their apps without relying on other third party voice recognition services, and the power of Apple's Siri application proves that Apple's voice service is strong enough to pass through third parties, Also greatly enhances the user's security.
Ii. important classes in the Speechframework framework
The Speechframework framework is lightweight, the classes are not very jumbled, and before learning the speechframework framework, we need to have a general familiarity with the relationships between classes and classes.
Sfspeechrecognizer: This class is an action class for speech recognition, for speech recognition user rights applications, locale settings, voice mode settings, and requests to send speech recognition to the Apple service.
Sfspeechrecognitiontask: This class is a speech recognition Service Request task class, each speech recognition request can be abstracted into a sfspeechrecognitiontask instance, In the Sfspeechrecognitiontaskdelegate protocol, a number of listening methods in the process of requesting tasks are agreed.
Sfspeechrecognitionrequest: Speech recognition Request class, which needs to be instantiated through its subclasses.
Sfspeechurlrecognitionrequest: Creates speech recognition requests through an audio URL.
Sfspeechaudiobufferrecognitionrequest: Creates speech recognition requests through an audio stream.
Sfspeechrecognitionresult: Speech recognition request result class.
Sftranscription: The message class after speech conversion.
Sftranscriptionsegment: Audio node class in speech conversion.
Understanding the relationship between the above classes and using the Speechframework framework will be very easy.
Third, the application of User voice recognition rights and voice recognition requests
Developers need to obtain user consent if they want to use speech recognition in their apps. First you need to add a Privacy-speech recognition Usage description key to the project's Info.plist file, which in fact requires a string of values that will be displayed in the System-access warning box. The Info.plist file is shown in the following illustration:
Using the Requestauthorization method of the Sfspeechrecognize class to apply for user permissions, the user's feedback results are passed in to the callback block of this method, as follows:
Request User voice recognition permission
[Sfspeechrecognizer requestauthorization:^ (sfspeechrecognizerauthorizationstatus status) {
} ];
The user's feedback results are defined in the Sfspeechrecognizerauthorzationstatus enumeration as follows:
typedef ns_enum (Nsinteger, sfspeechrecognizerauthorizationstatus) {
//results Unknown user not yet selected
sfspeechrecognizerauthorizationstatusnotdetermined,
//user refuses to authorize speech recognition
Sfspeechrecognizerauthorizationstatusdenied,
//device does not support speech recognition function
sfspeechrecognizerauthorizationstatusrestricted,
//user authorized speech recognition
Sfspeechrecognizerauthorizationstatusauthorized,
};
If the requesting user's speech recognition permission succeeds, the developer can make a speech recognition request through the Sfspeechrecognizer action class, as shown in the following example:
Create speech Recognition Action Class object
Sfspeechrecognizer * rec = [[Sfspeechrecognizer alloc]init];
Create an audio recognition request via an audio path
sfspeechrecognitionrequest * requests = [[Sfspeechurlrecognitionrequest alloc]initwithurl:[[ NSBundle Mainbundle] urlforresource:@ "7011" withextension:@ "M4A"]];
Request
[rec recognitiontaskwithrequest:request resulthandler:^ (sfspeechrecognitionresult * _nullable result, Nserror * _nullable error) {
//print the result string for speech recognition
NSLog (@ "%@", result.bestTranscription.formattedString);
Four, Deep Sfspeechrecognizer class
The main function of the Sfspeechrecognizer class is to request permissions, configure parameters, and make speech recognition requests. Among the more important attributes and methods are as follows:
Gets the current user permission State + (sfspeechrecognizerauthorizationstatus) authorizationstatus;
Request speech Recognition User Rights + (void) Requestauthorization: (void (^) (sfspeechrecognizerauthorizationstatus status)) handler;
Obtain all supported locales + (Nsset<nslocale *> *) supportedlocales;
Initialization methods need to note that this initialization method defaults to the device's current locale as the language environment for speech recognition-(nullable instancetype) init;
The initialization method sets a specific locale-(nullable instancetype) Initwithlocale: (Nslocale *) locale Ns_designated_initializer;
Whether speech recognition is available @property (nonatomic, ReadOnly, getter=isavailable) BOOL available;
Speech Recognition Operation Class Protocol Agent @property (nonatomic, weak) id<sfspeechrecognizerdelegate> delegate; To set the configuration parameters for speech recognition you need to note that there is also a property in each speech recognition request This setting will be the default value//If the Sfspeechrecognitionrequest object is also set, it overrides the value/* typedef ns_enum ( Nsinteger, sfspeechrecognitiontaskhint) {sfspeechrecognitiontaskhintunspecified = 0,//no definition Sfspeechrecognitiontaskhi Ntdictation = 1,//normal dictation style Sfspeechrecognitiontaskhintsearch = 2,//Search Style sfspeechrecognitiontaskhintconfirmation =
3,//phrase style}; * * @property (nonatomic) SFSPEECHrecognitiontaskhint Defaulttaskhint; Use callback block for speech recognition request results are passed into the block-(Sfspeechrecognitiontask *) Recognitiontaskwithrequest: ( Sfspeechrecognitionrequest *) Request Resulthandler: (void (^) (Sfspeechrecognitionresult * __nullable result, NSE
Rror * __nullable error)) Resulthandler;
Speech recognition requests using proxy callbacks-(Sfspeechrecognitiontask *) Recognitiontaskwithrequest: (Sfspeechrecognitionrequest *) request
Delegate: (ID <SFSpeechRecognitionTaskDelegate>) delegate;
Sets the task queue that the request occupies @property (nonatomic, strong) Nsoperationqueue *queue;
In the Sfspeechrecognizerdelegate protocol, only one method is specified, as follows:
Called when the availability of speech recognition operations is changed
-(void) SpeechRecognizer: (Sfspeechrecognizer *) SpeechRecognizer Availabilitydidchange: ( BOOL) available;
Speech recognition requests through block callbacks are simple, and if you use proxy callbacks, developers need to implement the relevant methods in the Sfspeechrecognitiontaskdelegate protocol, as follows:
This method
-(void) Speechrecognitiondiddetectspeech: (Sfspeechrecognitiontask *) task is called first when the speech in the audio source is started to be detected;
When an available message is identified, it is called/
*
to be aware that Apple's speech recognition service identifies multiple possible results based on the audio source provided and each of the results can be called with this method/
*
(void) Speechrecognitiontask: (Sfspeechrecognitiontask *) Task didhypothesizetranscription: (sftranscription *) transcription;
Call
-(void) Speechrecognitiontask: (Sfspeechrecognitiontask *) Task Didfinishrecognition When recognition completes all available results: ( Sfspeechrecognitionresult *) Recognitionresult;
Call
-(void) Speechrecognitiontaskfinishedreadingaudio: (Sfspeechrecognitiontask *) when the call starts processing speech recognition tasks when audio input is no longer accepted task;
Called when the speech recognition task is canceled
-(void) speechrecognitiontaskwascancelled: (Sfspeechrecognitiontask *) task;
Call
-(void) Speechrecognitiontask: (Sfspeechrecognitiontask *) Task didfinishsuccessfully: (BOOL) When speech recognition tasks are complete Successfully;
The Sfspeechrecognitiontask class encapsulates properties and methods as follows:
Current status of this task
/* typedef ns_enum (Nsinteger, sfspeechrecognitiontaskstate) {
sfspeechrecognitiontaskstatestarting = 0,// task started
sfspeechrecognitiontaskstaterunning = 1, //task is running
sfspeechrecognitiontaskstatefinishing = 2, //No audio reads are about to return the recognition results
sfspeechrecognitiontaskstatecanceling = 3, //task cancellation
sfspeechrecognitiontaskstatecompleted = 4,// All results return complete
};
* *
@property (nonatomic, ReadOnly) sfspeechrecognitiontaskstate state;
Audio input is completed
@property (nonatomic, ReadOnly, getter=isfinishing) BOOL finishing;
Manual completion of audio input no longer receives audio
-(void) finish;
Whether the task was canceled
@property (nonatomic, ReadOnly, getter=iscancelled) BOOL cancelled;
Cancel task manually
-(void) cancel;
With regard to the audio recognition request class, in addition to being able to create using the Sfspeechurlrecognitionrequest class, you can use the Sfspeechaudiobufferrecognitionrequest class to create:
@interface sfspeechaudiobufferrecognitionrequest:sfspeechrecognitionrequest
@property (nonatomic, ReadOnly) Avaudioformat *nativeaudioformat;
Splicing Audio stream
-(void) Appendaudiopcmbuffer: (Avaudiopcmbuffer *) Audiopcmbuffer;
-(void) Appendaudiosamplebuffer: (cmsamplebufferref) Samplebuffer;
Complete input
-(void) Endaudio;
@end
V. Speech recognition Result class Sfspeechrecognitionresult
The Sfspeechrecognitionresult class is the encapsulation of speech recognition results, which contains many sets of parallel identification information, each of which has a credibility attribute to describe its accuracy. The properties in the Sfspeechrecognitionresult class are as follows:
Multiple sets of speech-conversion information to be identified they are sorted according to accuracy
@property (nonatomic, readonly, copy) nsarray<sftranscription *> * transcriptions;
The most accurate identification instance
@property (nonatomic, readonly, copy) sftranscription *besttranscription;
Whether it has been completed if yes then all identifying information has been acquired
@property (nonatomic, ReadOnly, getter=isfinal) BOOL final;
The Sfspeechrecognitionresult class is just a package of speech recognition results, and the real identification information is defined in the Sftranscription class, and the properties in the Sftranscription class are as follows:
Full speech recognition after the text message string
@property (nonatomic, readonly, copy) NSString *formattedstring;
Speech Recognition Node Array
@property (nonatomic, readonly, copy) nsarray<sftranscriptionsegment *> *segments;
When an entire sentence is identified, Apple's speech recognition service actually splits the voice into several audio nodes, each node may be a word, and the segments attribute in the Sftranscription class holds the nodes. The attributes defined in the Sftranscriptionsegment class are as follows:
The text information after the current node recognition
@property (nonatomic, readonly, copy) NSString *substring;
The position of the text information after the current node recognition in the whole recognition statement
@property (nonatomic, readonly) Nsrange Substringrange;
The current node's audio timestamp
@property (nonatomic, readonly) nstimeinterval timestamp;
The duration of the current node audio
@property (nonatomic, readonly) nstimeinterval duration;
Reliability/accuracy between 0-1
@property (nonatomic, readonly) float confidence;
Other possible identification results for this node
@property (nonatomic, readonly) nsarray<nsstring *> *alternativesubstrings;
Warm tips: An exception occurs when the Speechframework framework runs on the emulator and cannot make speech recognition requests. will report kafassistanterrordomain error, also hope to have know the solution friend, give some advice.
The above is the entire content of this article, I hope to help you learn, but also hope that we support the cloud habitat community.