Android development and learning: Using Baidu Speech Recognition SDK for Speech Recognition (medium)

Source: Internet
Author: User

Today, we will continue to learn about the content of the Baidu Speech Recognition SDK. Today, we use the API interfaces provided by the Baidu Speech Recognition SDK to implement our own speech recognition interactive interface. Before officially starting today's article, let's first take a look at several important classes in the Baidu Speech Recognition SDK.

1. VoiceRecognitionClient

VoiceRecognitionClient is the entrance API of the entire speech recognition API. Our overall control over speech recognition is concentrated in this class. VoiceRecognitionClient provides three main methods: speakFinish (), startVoiceRecognition (), and stopVoiceRecognition. It is used to control the End of speech recognition (that is, the end of speech recognition), The End of speech recognition, and the start of speech recognition. Through the VoiceRecognitionClient class, we can perform macro-control on the entire speech recognition (please forgive me for saying so), which is the entry class in the entire speech recognition SDK.

2. VoiceRecognitionConfig

VoiceRecognitionConfig is the configuration class of speech recognition. In this class, we can configure the current speech recognition environment, such as the speech recognition mode, speech recognition sound effect, and speech recognition sampling rate.

3. VoiceClientStatusChangeListener

VoiceClientStatusChangeListener is the callback interface class of speech recognition. To call the Baidu Speech Recognition API, we must implement this class. Therefore, this class is the most important class in speech recognition. In other words, if VoiceRecognitionClient controls speech recognition at the macro level, VoiceClientStatusChangeListener controls the microscopic level of speech recognition, A speech recognition process includes Speech Recognition start, speech recognition listening, speech recognition, and speech recognition feedback. Through VoiceClientStatusChangeListener, we can control every process of speech recognition, this class is relatively complex and will be discussed in detail later.

Now, the main classes have been introduced. You can follow me to learn about today's content. First, let's talk about what we want to implement today. In today's program, we will implement two buttons to control the start and end of speech recognition, and feedback the current Speech Recognition Status and final results on the interface, through a progress bar control (the program needs to demonstrate, not required) to display the current user's voice volume. First, we will initialize the entrance class for speech recognition:

@ Overrideprotected void onCreate (Bundle savedInstanceState) {super. onCreate (savedInstanceState); setContentView (R. layout. layout_voice); InitView (); // obtain mClentmClient = VoiceRecognitionClient. getInstance (this); // sets the application authorization information mClient. setTokenApis (API_KEY, SECRET_KEY); // initialize the main thread mHandler = new Handler ();}

To make everyone better focus on the speech recognition SDK, I wrote the Interface Element initialization process to the InitView () method. You can refer to the final code. The mHandler only serves to refresh the interface with the progress bar, that is, it is not necessary. Next we will write the most important class in the entire program, namely the VoiceClientStatusChangeListener interface. Let's take a look at the Code:

/** Speech Recognition callback interface **/private VoiceClientStatusChangeListener mListener = new VoiceClientStatusChangeListener () {public void onClientStatusChange (int status, Object obj) {switch (status) {// the actual start of speech recognition. This is the real start time. You need to prompt the user to speak on the interface. Case VoiceRecognitionClient. CLIENT_STATUS_START_RECORDING: IsRecognition = true; mVolumeBar. setVisibility (View. VISIBLE); BtnCancel. setEnabled (true); BtnStart. setText ("finished"); Status. setText ("Current status: Speak"); mHandler. removeCallbacks (mUpdateVolume); mHandler. postDelayed (mUpdateVolume, UPDATE_INTERVAL); break; case VoiceRecognitionClient. CLIENT_STATUS_SPEECH_START: // The Voice starting point Status is detected. setText ("Current status: speaking"); break; case VoiceRecognitionClient. CLIENT_STATUS_AUDIO_DATA: // you do not need to do anything here. You can simply record the incoming data using break. // The Voice endpoint has been detected and the network returns the case VoiceRecognitionClient. CLIENT_STATUS_SPEECH_END: Status. setText ("Current status: identifying .... "); BtnCancel. setEnabled (false); mVolumeBar. setVisibility (View. INVISIBLE); break; // speech recognition is completed, and the result case VoiceRecognitionClient in obj is displayed. CLIENT_STATUS_FINISH: Status. setText (null); UpdateRecognitionResult (obj); IsRecognition = false; ReSetUI (); break; // process the case VoiceRecognitionClient on a continuous screen. CLIENT_STATUS_UPDATE_RESULTS: UpdateRecognitionResult (obj); break; // you can cancel case VoiceRecognitionClient. CLIENT_STATUS_USER_CANCELED: Status. setText ("Current status: canceled"); IsRecognition = false; ReSetUI (); break; default: break; }}@ Override public void onError (int errorType, int errorCode) {IsRecognition = false; Result. setText ("error: 0x % 1 $ s" + Integer. toHexString (errorCode); ReSetUI () ;}@ Override public void onNetworkStatusChange (int status, Object obj) {// No operations are performed here, but simple recognition is not affected }};

In the above Code, we need to have a deep understanding of what should be done in different States throughout the speech recognition process. This is what we really want to consider. Here we provide several auxiliary methods:

1. Analysis of recognition results

/** Display the recognition result to the interface */private void UpdateRecognitionResult (Object result) {if (result! = Null & result instanceof List) {@ SuppressWarnings ("rawtypes") List results = (List) result; if (results. size ()> 0) {if (mType = VOICE_TYPE_SEARCH) {Result. setText (results. get (0 ). toString ();} else if (mType = VOICE_TYPE_INPUT) {@ SuppressWarnings ("unchecked") List
 
  
> Sentences = (List
  
   
>) Result); StringBuffer sb = new StringBuffer (); for (List
   
    
Candidates: sentences) {if (candidates! = Null & candidates. size ()> 0) {sb. append (candidates. get (0 ). getWord () ;}} Result. setText (sb. toString ());}}}}
   
  
 

2. Recognition Type

There are two types of recognition: Search and Input. Search is applicable to the recognition of short sentences, that is, phrase recognition; Input is applicable to the recognition of long sentences, that is, long sentences. In general, Baidu speech recognition is still very effective.

Next, let's take a look at the code to control speech recognition:

/** Process the Click Event */@ Overridepublic void onClick (View v) {switch (v. getId () {case R. id. start: if (IsRecognition) {// The user finishes mClient. speakFinish ();} else {// user retry to start a new speech recognition Result. setText (null); // you need to start new recognition. First, set the parameter config = new VoiceRecognitionConfig (); if (mType = VOICE_TYPE_INPUT) {config. setSpeechMode (VoiceRecognitionConfig. SPEECHMODE_MULTIPLE_SENTENCE);} else {config. setSpeechMode (VoiceRecognitionConfig. SPEECHMODE_SINGLE_SENTENCE);} // enable semantic parsing config. enableNLU (); // enable the volume feedback config. enableVoicePower (true); config. enableBeginSoundEffect (R. raw. bdspeech_recognition_start); // you can specify the start prompt for config. enableEndSoundEffect (R. raw. bdspeech_speech_end); // sets the config. setSampleRate (VoiceRecognitionConfig. SAMPLE_RATE_8K); // set the sampling rate // use the default microphone as the audio source config. setusedefaauaudiosource (true); // the following code identifies the int code = VoiceRecognitionClient. getInstance (this ). startVoiceRecognition (mListener, config); if (code = VoiceRecognitionClient. START_WORK_RESULT_WORKING) {// can start identification and change the interface BtnStart. setEnabled (false); BtnStart. setText ("finished"); BtnCancel. setEnabled (true);} else {Result. setText ("failed to start: 0x % 1 $ s" + code) ;}} break; case R. id. cancel: mClient. stopVoiceRecognition (); break ;}}

Note that we have enabled volume feedback in the above Code, so we need a thread to refresh the sound progress bar:

/** Volume Update Interval **/private static final int UPDATE_INTERVAL = 200;/** volume update task **/private Runnable mUpdateVolume = new Runnable () {@ Overridepublic void run () {if (IsRecognition) {long vol = VoiceRecognitionClient. getInstance (BaiduVoiceActivity. this ). getCurrentDBLevelMeter (); mVolumeBar. setProgress (int) vol); mHandler. removeCallbacks (mUpdateVolume); mHandler. postDelayed (mUpdateVolume, UPDATE_INTERVAL );}}};
Of course, this code can be avoided, if we cancel the volume feedback. Secondly, in actual speech recognition applications, we usually see that the interface will draw a certain waveform based on the size of the user input sound, which is beyond the scope of this article, but at least it should be explained that we need to study this process in practical applications, or we can just put an animation on it. Finally, we need to write some methods to release Speech Recognition resources:

@ Overrideprotected void onDestroy () {VoiceRecognitionClient. releaseInstance (); // release the recognition database super. onDestroy () ;}@ Overrideprotected void onPause () {if (IsRecognition) {mClient. stopVoiceRecognition (); // cancel identification} super. onPause ();}

So far, we have finished researching all the APIs of the Baidu Speech Recognition SDK. You can sort out your ideas and finally give all the code:

Package com. android. baiduVoice; import java. util. list; import com. baidu. voicerecognition. android. candidate; import com. baidu. voicerecognition. android. voiceRecognitionClient; import com. baidu. voicerecognition. android. voiceRecognitionConfig; import com. baidu. voicerecognition. android. voiceRecognitionClient. voiceClientStatusChangeListener; import android. app. activity; import android. OS. bundle; import android. o S. handler; import android. view. view; import android. view. view. onClickListener; import android. widget. button; import android. widget. progressBar; import android. widget. textView; public class BaiduVoiceActivity extends Activity implements OnClickListener {/** application authorization information **/private String API_KEY = "8MAxI5o7VjKSZOKeBzS4XtxO"; private String SECRET_KEY = "secret "; /** interface layout element **/private TextVi Ew Status, Result; private ProgressBar mVolumeBar; private Button BtnStart, BtnCancel;/** Speech Recognition Client **/private VoiceRecognitionClient mClient; /** Speech Recognition configuration **/private VoiceRecognitionConfig;/** Speech Recognition callback interface **/private VoiceClientStatusChangeListener mListener = new listener () {public void onClientStatusChange (int status, object obj) {switch (status) {// specifies the actual start time of speech recognition. The interface prompts the user to speak. Case VoiceRecognitionClient. CLIENT_STATUS_START_RECORDING: IsRecognition = true; mVolumeBar. setVisibility (View. VISIBLE); BtnCancel. setEnabled (true); BtnStart. setText ("finished"); Status. setText ("Current status: Speak"); mHandler. removeCallbacks (mUpdateVolume); mHandler. postDelayed (mUpdateVolume, UPDATE_INTERVAL); break; case VoiceRecognitionClient. CLIENT_STATUS_SPEECH_START: // The Voice starting point Status is detected. setText ("Current status: Talking"); br Eak; case VoiceRecognitionClient. CLIENT_STATUS_AUDIO_DATA: // you do not need to do anything here. You can simply record the incoming data using break. // The Voice endpoint has been detected and the network returns the case VoiceRecognitionClient. CLIENT_STATUS_SPEECH_END: Status. setText ("Current status: identifying .... "); BtnCancel. setEnabled (false); mVolumeBar. setVisibility (View. INVISIBLE); break; // speech recognition is completed, and the result case VoiceRecognitionClient in obj is displayed. CLIENT_STATUS_FINISH: Status. setText (null); UpdateRecognitionResult (obj ); IsRecognition = false; ReSetUI (); break; // process the case VoiceRecognitionClient on the screen continuously. CLIENT_STATUS_UPDATE_RESULTS: UpdateRecognitionResult (obj); break; // you can cancel case VoiceRecognitionClient. CLIENT_STATUS_USER_CANCELED: Status. setText ("Current status: canceled"); IsRecognition = false; ReSetUI (); break; default: break; }}@ Override public void onError (int errorType, int errorCode) {IsRecognition = false; Result. setText ("Error: 0x % 1 $ s" + Integer. toHexString (errorCode); ReSetUI () ;}@ Override public void onNetworkStatusChange (int status, Object obj) {// No operations are performed here, but simple recognition is not affected }}; /** Speech Recognition Type Definition **/public static final int VOICE_TYPE_INPUT = 0; public static final int VOICE_TYPE_SEARCH = 1; /** volume Update Interval **/private static final int UPDATE_INTERVAL = 200;/** volume update task **/private Runnable mUpdateVolume = new Runnable () {@ Overridepublic void run () {If (IsRecognition) {long vol = VoiceRecognitionClient. getInstance (BaiduVoiceActivity. this ). getCurrentDBLevelMeter (); mVolumeBar. setProgress (int) vol); mHandler. removeCallbacks (mUpdateVolume); mHandler. postDelayed (mUpdateVolume, UPDATE_INTERVAL) ;}};/** main thread Handler */private Handler mHandler;/** being recognized */private boolean IsRecognition = false; /** current Speech Recognition Type **/private int mType = VOICE_TYPE_IN PUT; @ Overrideprotected void onCreate (Bundle savedInstanceState) {super. onCreate (savedInstanceState); setContentView (R. layout. layout_voice); InitView (); // obtain mClentmClient = VoiceRecognitionClient. getInstance (this); // sets the application authorization information mClient. setTokenApis (API_KEY, SECRET_KEY); // initialize the main thread mHandler = new Handler () ;}/ ** interface initialization */private void InitView () {Status = (TextView) findViewById (R. id. status); Result = (TextView) findViewBy Id (R. id. result); mVolumeBar = (ProgressBar) findViewById (R. id. volumeProgressBar); BtnStart = (Button) findViewById (R. id. start); BtnStart. setOnClickListener (this); BtnCancel = (Button) findViewById (R. id. cancel); BtnCancel. setOnClickListener (this) ;}@ Overrideprotected void onDestroy () {VoiceRecognitionClient. releaseInstance (); // release the recognition database super. onDestroy () ;}@ Overrideprotected void onPause () {if (IsRecognition) {mCl Ient. stopVoiceRecognition (); // cancel identification} super. onPause ();}/** handle Click events */@ Overridepublic void onClick (View v) {switch (v. getId () {case R. id. start: if (IsRecognition) {// The user finishes mClient. speakFinish ();} else {// user retry to start a new speech recognition Result. setText (null); // you need to start new recognition. First, set the parameter config = new VoiceRecognitionConfig (); if (mType = VOICE_TYPE_INPUT) {config. setSpeechMode (VoiceRecognitionConfig. SPEECHMODE_MULTI PLE_SENTENCE);} else {config. setSpeechMode (VoiceRecognitionConfig. SPEECHMODE_SINGLE_SENTENCE);} // enable semantic parsing config. enableNLU (); // enable the volume feedback config. enableVoicePower (true); config. enableBeginSoundEffect (R. raw. bdspeech_recognition_start); // you can specify the start prompt for config. enableEndSoundEffect (R. raw. bdspeech_speech_end); // sets the config. setSampleRate (VoiceRecognitionConfig. SAMPLE_RATE_8K); // set the sampling rate // use the default microphone as the audio source Config. setusedefaauaudiosource (true); // the following code identifies the int code = VoiceRecognitionClient. getInstance (this ). startVoiceRecognition (mListener, config); if (code = VoiceRecognitionClient. START_WORK_RESULT_WORKING) {// can start identification and change the interface BtnStart. setEnabled (false); BtnStart. setText ("finished"); BtnCancel. setEnabled (true);} else {Result. setText ("failed to start: 0x % 1 $ s" + code) ;}} break; case R. id. cancel: mClient. stopVoiceReco Gnition (); break ;}/ ** reset interface */private void ReSetUI () {BtnStart. setEnabled (true); // you can start to retry BtnStart. setText ("retry"); BtnCancel. setEnabled (false); // It cannot be canceled yet}/** display the recognition result on the Interface */private void UpdateRecognitionResult (Object result) {if (result! = Null & result instanceof List) {@ SuppressWarnings ("rawtypes") List results = (List) result; if (results. size ()> 0) {if (mType = VOICE_TYPE_SEARCH) {Result. setText (results. get (0 ). toString ();} else if (mType = VOICE_TYPE_INPUT) {@ SuppressWarnings ("unchecked") List
 
  
> Sentences = (List
  
   
>) Result); StringBuffer sb = new StringBuffer (); for (List
   
    
Candidates: sentences) {if (candidates! = Null & candidates. size ()> 0) {sb. append (candidates. get (0 ). getWord () ;}} Result. setText (sb. toString ());}}}}}
   
  
 

Of course, like the previous article, we need to add the necessary permissions, otherwise the program will report an error:

    
     
     
     
 

In this way, the content of today is finished. In the next article, we will base on the technologies introduced in the previous two articles to implement a more practical application, and compare the mainstream speech recognition software. Thanks again for your attention!

Source code download

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.