Turn to the arrogant Chu net: Microsoft SAPI: Make your software articulate

Source: Internet
Author: User
Tags sapi

"IT168 Zhuangao" "No sound, no good play also out." "Although this is an advertisement, but also say a reason, we develop software, especially some multimedia software, if can make sound, articulate, will add a lot of luster to our software." At the same time, we are facing an aging society, there will be more and more poor eyesight of the elderly to become our users, to start using our software, if our software is eloquent, can be used to prompt users to operate in a voice, which will greatly increase the availability of software, so as to obtain the user's favorite.

So how can we make our software articulate? Don't worry, Microsoft has a solution, with Microsoft SAPI can make our software a glib.

  What is SAPI?

The speech technology in the software mainly includes two aspects, one is speech recognition (speech recognition) and the other is speech synthesis (speech synthesis), which is also the text-to-speech system (TTS). TTS systems use synthetic speech to synthesize text strings and file-to-sound audio streams. and speech recognition system is to convert the voice of human voice to a readable text string or file. Both of these jobs are done through a variety of voice engines. Microsoft's SAPI (the full name of the Microsoft Speech API) provides a high-level interface between the application and speech engine, which implements all the necessary low levels of detail for real-time control and management of various speech engines. The speech engine interacts with the DDI layer (device-driven interface) and SAPI, and the application communicates through the API layer and SAPI. By using these APIs, we can quickly develop applications in speech recognition or speech synthesis. The SAPI application Programming Interface (API) significantly reduces the high-level code needed to build an application that uses speech recognition and text-to-speech translation, making speech technology easier to use and expanding the scope of applications. Although SAPI is not an industry standard now, it is widely used.

The SAPI includes the following component objects (interfaces):

(1) Voice Commands API. Control of the application is generally used in speech recognition systems. When a command is identified, the associated interface is called to the application to complete the corresponding function. This group of objects must be used if the program wants to implement voice control.

(2) Voice dictation API. Dictation input, which is the speech recognition interface.

(3) Voice Text API. Complete the conversion from text to speech, i.e. speech synthesis.

(4) Voice telephone API. Speech recognition and speech synthesis are integrated into the telephone system, which can be used to establish a telephone answering system and even control the computer by telephone.

(5) Audio Objects API. The computer pronunciation system is encapsulated.

One voice text API, is the interface of the Microsoft TTS engine, through which we can easily build powerful text speech program, PowerWord Word read aloud function to use these APIs, and currently almost all of the text reading tools are developed with SAPI. Here, the main thing we use is the voice Text API.

  Installing the SAPI SDK

To use SAPI to make our software articulate, we first need to download and install the SAPI SDK. First download the development package from Microsoft's website: http://www.microsoft.com/speech/download/sdk51

After the download is complete, install SpeechSDK51.exe First, then install the Chinese language Patch pack speechsdk51langpack,if if we want to SAPI as part of our software, as our software re-releases, We also need to install SpeechSDK51MSM.exe.

Once the SAPI SDK is installed, you can start using SAPI in VS2010 to make our software a good talker.

  Create a project, add a reference to the SAPI

Here, we will create a normal WinForm program that can use TTS to read text, or convert text files through TTS to sound files, which is really a "glib" software. First, we create a WinForm program in VS2010 and design the form as follows:

The text box in is used to display what we want to read, and the combo box control is used to display all the voices that are already installed on the system, from which the user can select the voice that the current TTS uses.

To use SAPI in our project, we also need to add SAPI references to the project. Using the Add Reference feature provided by VS2010, locate the Microsoft Speech Object Library in the COM tab page of the Add Reference dialog box and add it to your project.

The various classes provided by SAPI are under namespace speechlib, so we also need to use the using Speechlib in our code to indicate that we will use this namespace. This allows us to use the various classes provided by SAPI for speech synthesis or speech recognition.

  Create SpVoice object, initialize SAPI

SAPI TTS is done through the SpVoice object. The SpVoice class is a core class that supports speech synthesis (TTS). The TTS engine is invoked through the SpVoice object to enable read aloud functionality. The SpVoice class has the following main properties:

? Voice: denotes the type of pronunciation, the equivalent of a person who reads aloud, and usually we can add the corresponding voice by installing the corresponding speech engine.

? Rate: The speed of speech reading, the value range is 10 to +10. The higher the value, the faster the speed.

? Volume: volume, with values ranging from 0 to 100. The larger the number, the greater the volume.

SpVoice has the following main methods:

? Speak (): Completes the conversion of text information to speech and follows the specified parameters, which have text and flags two parameters, specifying the text to read aloud and how to read aloud (synchronous or asynchronous, etc.).

? Getvoices (): Gets the voice in the system, which specifies the SpVoice sound property.

? Pause (): Pauses all read-aloud processes that use the object. The method has no parameters.

? Resume (): Resumes the paused read process for the object. The method has no parameters.

So in the form's constructor, we first need to complete the creation of the SpVoice object before we can use this object to read the text. Since there may be more than one voice to choose from in the system, we need to use a combo box control to enumerate all the voices in the system while creating the form, and the default first voice is selected. When the form is created, users can select their favorite voice in this combo box to read the text.


SpVoice object, we'll use this object to read the text

Private SpVoice M_spvoice;

private void Init ()


Create a SpVoice object

M_spvoice = new SpVoice ();

Enumerate the installed voice in the system and populate it with the combo box control

foreach (Ispeechobjecttoken Token in M_spvoice.getvoices (string. Empty, String. Empty))


This.cmbVoices.Items.Add (Token.getdescription (49));


The first voice is selected by default

Cmbvoices.selectedindex = 0;


Read aloud text

After completing the initialization of the form and creating the SpVoice object, we can then use the object's speak () method to read the text in the text box control.


private void Btnspeak_click (object sender, EventArgs e)


Get the Voice index selected by the user in the combo box

int nvoiceindex = This.cmbVoices.SelectedIndex;

Specifies the Voice property of the SpVoice based on the speech index, which specifies which speech to use

M_spvoice.voice = M_spvoice.getvoices (string. Empty, String. Empty). Item (Nvoiceindex);

Use SpVoice's Speak () method to read the text box

M_spvoice.speak (This.textPreview.Text, Speechvoicespeakflags.svsflagsasync);


Here we use one of the most important functions of the SpVoice object Speak (), its first parameter is the text we want to read, and the second argument is the way of reading, there are synchronous, asynchronous, XML files and so on. Thus, with a simple function of the SpVoice object, we can read the text content in the TextBox control.

  Read aloud a text file

More often, instead of reading the text entered in the text box control, we read the text in some text files, which makes it necessary to read the text file and populate it in the text box control.


private void Btnfileselect_click (object sender, EventArgs e)


Use the Open File dialog box to select a text file

OpenFileDialog openFileDialog1 = new OpenFileDialog ();

Openfiledialog1.initialdirectory = e:\\;

Openfiledialog1.filter = txt files (*.txt) |*.txt| All Files (*. *) |*.*;

Openfiledialog1.filterindex = 2;

Openfiledialog1.restoredirectory = true;

if (openfiledialog1.showdialog () = = DialogResult.OK)


Read the text file and populate it with the TextBox control

StreamReader objreader = new StreamReader (openfiledialog1.filename);

String sline =;

string Spreview =;

while (sline! = null)


sline = objReader.ReadLine ();

if (sline! = null)


Here you need to add environment.newline to represent line breaks

Spreview + = sline + Environment.NewLine;



Display the contents of a text file to the TextBox control

This.textPreview.Text = Spreview;

Close File Reader

Objreader.close ();



In this way, we can read the contents of a text file, display it to the text box control, and then spvoice to read the contents of the text box control, that is, to read the literal file indirectly.

Convert text into a sound file

In addition to reading the text directly, more often than not, we also need to convert the text into a sound file. So we can take these sound files with you and listen to them. To convert the text to a sound file, we need to use another important function of SpVoice setoutput (), which we can use to output a WAV file for the SpVoice voice, thus converting the text file to a sound file.

Because converting a long piece of text into a sound file is usually a lengthy process, here we create a dedicated worker thread that is responsible for the conversion of the text, while the interface thread is responsible for displaying the progress of the transformation.


Worker Thread Class

public class Workerthread


User-selected voice

private int nvoiceindex;

The saved file name

private String strFileName;

Text that needs to be converted

Private ArrayList arrtext;

constructors, using constructors to pass parameters to threads

Public workerthread (int nIndex, ArrayList atext, String sfilename)


Nvoiceindex = NIndex;

Arrtext = Atext;

strFileName = sFileName;


Thread Start Event

public event EventHandler Threadstartevent;

Events at thread Execution time

public event EventHandler Threadevent;

Thread End Event

public event EventHandler Threadendevent;

Thread functions

public void Runmethod ()


Create a SpVoice object and select the voice selected by the user

SpVoice voice = new SpVoice ();

Voice. Voice = Voice. Getvoices (String. Empty, String. Empty). Item (Nvoiceindex);



Create a streaming media file

Speechstreamfilemode Spfilemode =


Spfilestream Spfilestream = new Spfilestream ();

Here we set the output frequency so that the output file size can be determined

SpFileStream.Format.Type = Speechaudioformattype.saftccitt_alaw_8khzmono;

You can also choose a higher-quality format, but the resulting file size is larger

SpFileStream.Format.Type = Speechaudioformattype.saft11khz16bitmono;

Create the file and specify the SpVoice output stream as the current file

Voice. Audiooutputstream = Spfilestream;

Send thread Start event, notifies the main interface, sets the maximum value of the progress bar to count

Threadstartevent.invoke (Arrtext.count, New EventArgs ());

Start outputting text to an audio file

int ncount = 0;

foreach (String soutput in Arrtext)


Voice. Speak (Soutput, Speechvoicespeakflags.svsflagsasync);

Send thread run-time events, move the position of the progress bar

Threadevent.invoke (ncount, New EventArgs ());

Voice. Waituntildone (-1);



Close Audio file

Spfilestream.close ();





Send thread End event to notify the main interface to close the progress bar

Threadendevent.invoke (New Object (), new EventArgs ());



Similar to reading text directly, we still use SpVoice's speak () function to read the text, except that by specifying the Audiooutputstream attribute of the SpVoice, we output the speech to an audio file, thus completing the conversion of the text file to the audio file.

Once the conversion worker thread has been created, we can use it to accomplish the specific conversion work. In the click Response function of the Save button for the form, we create the appropriate worker thread to convert the text.


private void Btnsavetowav_click (object sender, EventArgs e)


string strwavfile =;



Use the Save File dialog box to select the saved file

SaveFileDialog sfd = new SaveFileDialog ();

SfD. Filter = All Files (*. *) |*.*|wav files (*.wav) |*.wav;

SfD. Title = Save to a wave file;

SfD. FilterIndex = 2;

SfD. Restoredirectory = true;

if (SFD. ShowDialog () = = DialogResult.OK)


Gets the file name entered by the user

Strwavfile = sfd. FileName;

Get the text you want to convert from the text box control

ArrayList arrtext = new ArrayList ();

foreach (String sline in This.textPreview.Lines)

Arrtext.add (sline);

Show progress bar

Progressform = new Form2 ();

Progressform.show ();

Create a worker thread and pass the text to be converted to the worker thread

Workerthread mythreadfun = new Workerthread (

This.cmbVoices.SelectedIndex, Arrtext, strwavfile);

Registering Thread Events

Mythreadfun.threadstartevent + = new EventHandler (method_threadstartevent);

Mythreadfun.threadevent + = new EventHandler (method_threadevent);

Mythreadfun.threadendevent + = new EventHandler (method_threadendevent);

Creating threads, executing worker threads

Thread thread = new Thread (new ThreadStart (Mythreadfun.runmethod));

Start thread

Thread. Start ();







In addition to creating threads for text conversion, in order to make our software more user-friendly, we also need to respond to thread events, moving the position of the progress bar to reflect the progress of the conversion, lest the user think the software in the longer conversion process died.


The delegate that is called when the thread starts

Private delegate void maxvaluedelegate (int maxValue);

Delegate that is called in thread execution

Private delegate void nowvaluedelegate (int nowvalue);

The delegate that is called when the thread ends

Private delegate void hideprogressdelegate (int n);

Thread completion event, Hide progress bar window

But we can't manipulate the progress bar directly, we need a delegate to do it for us.

void Method_threadendevent (object sender, EventArgs e)


Hideprogressdelegate hide = new Hideprogressdelegate (hideprogress);

This. Invoke (hide, 0);


Events in thread execution, setting progress bar current Progress

The sender here is the current value passed in the Workerthread function.

void Method_threadevent (object sender, EventArgs e)


int nowvalue = Convert.ToInt32 (sender);

Nowvaluedelegate now = new Nowvaluedelegate (Setnow);

This. Invoke (now, nowvalue);


Thread Start event, setting the maximum bar value

But I can't manipulate the progress bar directly, I need a delegate to do it for me.

The sender here is the maximum value passed in the Workerthread function.

void Method_threadstartevent (object sender, EventArgs e)


int maxValue = Convert.ToInt32 (sender);

Maxvaluedelegate max = new Maxvaluedelegate (Setmax);

This. Invoke (max, maxValue);


A function called by a delegate that specifically operates a progress bar

private void Setmax (int maxValue)


ProgressForm.progressBar1.Maximum = MaxValue;


private void Setnow (int nowvalue)



private void hideprogress (int n)


Progressform.hide ();


Control the reading of SpVoice

Here, a glib software is basically finished, but, in order to make our software more user-friendly, we can also use the function provided by SpVoice to control the behavior of SpVoice, let her more in line with our mind. For example, we can control the suspension and continuation of SpVoice.


private void Btnpause_click (object sender, EventArgs e)


if (This.btnPause.Text = = pause)


Let SpVoice pause reading

M_spvoice.pause ();

This.btnPause.Text = continue;




Let SpVoice continue reading.

M_spvoice.resume ();

This.btnPause.Text = pause;



With the functions provided by SpVoice, it is so simple to control the behavior of SpVoice. In addition to pausing and continuing reading, we can set the tone of the sound through the setrate () function, set the volume of the sound through the SetVolume () function, and so on. These functions are not introduced here, left to everyone to try.

Now, use SAPI to instantly make your software articulate.

Turn to the arrogant Chu net: Microsoft SAPI: Make your software articulate

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.