Let Java speak-implementing the speech engine in Java

Source: Internet
Author: User

Let Java speak-implementing the speech engine in Java 2005-11-07 10:04:09

Category: Java technology

What are the benefits of adding voice capabilities to your application? Roughly speaking, it is for fun, it is suitable for all interesting applications, such as games. Of course, from a more serious point of view, it also involves application usability issues. Note that I'm not only thinking about the inherent shortcomings of the visual interface, but there are situations where it's inconvenient and even illegal to let your eyes leave your current job. For example, if you have a voice-enabled browser, you can go for a walk or drive to work while you listen to your favorite websites. [@[email protected]] What are the benefits of adding voice to your application? Roughly speaking, it is for fun, it is suitable for all interesting applications, such as games. Of course, from a more serious point of view, it also involves application usability issues. Note that I'm not only thinking about the inherent shortcomings of the visual interface, but there are situations where it's inconvenient and even illegal to let your eyes leave your current job. For example, if you have a voice-enabled browser, you can go for a walk or drive to work while you listen to your favorite websites.

?? From now on, the mail reader may be a more practical application of voice technology, and with the help of the JavaMail API, it's all possible. The Mail reader can check the Inbox regularly and then use the Voice "You have new mail, would. Cause you to notice. In a similar way, we can also consider a voice-enabled reminder to connect it to a calendar app: it will prompt you to "Don ' t forget your meeting with the boss in the Minutes!".

?? Maybe you've been attracted to these ideas, or have your own better ideas, now let's move on. First I'll show you how to enable the speech engine provided in this article so that if you think that the implementation details of the speech engine are too complex, you can use it directly and ignore its implementation details.

First, trial speech engine

To use this speech engine, you must include the Javatalk.jar file provided in this article in Classpath, and then run (or call from a Java program) the Com.lotontech.speech.Talker class from the command line. If run from the command line, the command is:

Java com.lotontech.speech.Talker "H|e|l|oo"

If called from a Java program, the code is:

Com.lotontech.speech.Talker talker=new Com.lotontech.speech.Talker ();

Talker.sayphoneword ("H|e|l|oo");

You may now be puzzled about the "H|e|l|oo" string provided on the command line (or when you call the Sayphoneword () method). Here's what I'll try to explain.

The speech engine works by connecting small sound samples, each of which is a minimal unit of human language pronunciation (English). These sound samples are called phonemes (allophone). Each of these factors corresponds to one, two, or three letters. As you can see from the voice of the front "hello", the pronunciation of some letter combinations is obvious, while others are not obvious:

H--the pronunciation is obvious

e--the pronunciation is obvious

L--the pronunciation is obvious, but note that two "L" have been abbreviated into an "L".

OO-should be read as "Hello" in the pronunciation, should not be read as "bot", "too" in the pronunciation.

The following is a list of valid phonemes:

A: such as Cat
B: As Cab
C: such as Cat
D: such as Dot
E: If bet
F: If Frog
G: If Frog
H: If hog
I: Like pig
J: If Jig
K: If Keg
L: If leg
M: If met
N: If begin
O: If not
P: If pot
R: Like Rot
S: such as Sat
T: such as the SAT
U: if put
V: If the
W: If wet
Y: if yet
Z: As Zoo
AA: As Fake
Ay: Like hay
EE: such as Bee
II: If high
OO: Like go
Bb:b changes in form, stress is different
Dd:d changes in form, stress is different
Ggg:g changes in form, stress is different
Hh:h changes in form, stress is different
Ll:l changes in form, stress is different
Nn:n changes in form, stress is different
Rr:r changes in form, stress is different
Tt:t changes in form, stress is different
Yy:y changes in form, stress is different
AR: Like car
AER: Like Care
CH: if which
CK: If check
Ear: Like beer
ER: If later
ERR: Like later (Long sound)
NG: As Feeding
Or: As Law
OU: As Zoo
Ouu: As Zoo (long sound)
OW: such as cow
Oy: Like boy
SH: If shut
Th: Like thing
DTH: If this
The change form of uh:u
WH: such as where
En: If Asian

When people speak, the voice rises and falls throughout the sentence. Intonation changes make the voice more natural and more contagious, so that questions and statements can be distinguished from each other. Please consider the following two sentences:

It is fake--f|aa|k

Is it fake? --f| Aa|k

Perhaps you have guessed that the way to improve intonation is to use uppercase letters.

The above is what you need to know when using the software. If you are interested in the details of its background implementation, read on.

Second, the realization of The voice engine

The implementation of the speech engine consists of only one class, four methods. It leverages the Java sound API contained in J2SE 1.3. Here, I'm not going to introduce this API comprehensively, but you can learn how to use it using an example. The Java Sound API is not a particularly complex API, and the comments in the code will tell you what you have to know.

The following is the basic definition of the talker class:

Package Com.lotontech.speech;

Import javax.sound.sampled.*;

Import java.io.*;

Import java.util.*;

Import java.net.*;

public class Talker

{

Private Sourcedataline Line=null;

}

If you execute talker from the command line, the following main () method runs as an entry point. The main () method gets the first command-line argument and passes it to the Sayphoneword () method:

/*

* read out a string of pronounced pronunciations specified on the command line

*/

public static void Main (String args[])

{

Talker player=new Talker ();

if (args.length>0) Player.sayphoneword (Args[0]);

System.exit (0);

}
The Sayphoneword () method can be called either through the main () method above or directly in a Java program. On the face of it, the Sayphoneword () method is more complex than it really is. In fact, it simply iterates through the speech elements of all words (in the input string The voice element is "|" separated), which is played out by an element of a sound output channel. To make the sound more natural, I merged the end of each sound sample with the beginning of the next sound sample:

/*

* read out the specified speech string

*/

public void Sayphoneword (String word)

{

An analog byte array constructed for the previous sound

Byte[] Previoussound=null;

Split the input string into separate phonemes

StringTokenizer st=new StringTokenizer (Word, "|", false);

while (St.hasmoretokens ())

{

Construct the appropriate file name for the phoneme

String Thisphonefile=st.nexttoken ();

Thisphonefile= "/allophones/" +thisphonefile+ ". au";

Reading data from a sound file

Byte[] Thissound=getsound (thisphonefile);

if (previoussound!=null)

{

Merge the previous phoneme with the current phoneme, if possible

int mergecount=0;

if (previoussound.length>=500 && thissound.length>=500)

mergecount=500;

for (int i=0; i

{

Previoussound[previoussound.length-mergecount+i]

= (byte) ((Previoussound[previoussound.length

-mergecount+i]+thissound[i])/2);

}

Play a previous phoneme

PlaySound (Previoussound);

The truncated current phoneme as the previous phoneme

Byte[] Newsound=new Byte[thissound.length-mergecount];

for (int ii=0; II

Newsound[ii]=thissound[ii+mergecount];

Previoussound=newsound;

}

Else

Previoussound=thissound;

}

Play the last phoneme and clear the sound channel

PlaySound (Previoussound);

Drain ();

}

After Sayphoneword (), you can see that it calls PlaySound () to output a single sound sample (that is, a phoneme) and then calls drain () to clean up the sound channel. Here is the code for PlaySound ():

/*

* This method plays a sound sample

*/

private void PlaySound (byte[] data)

{

if (data.length>0) line.write (data, 0, data.length);

}

Here is the code for drain ():

/*

* This method clears the sound channel

*/

private void Drain ()

{

if (line!=null) Line.drain ();

try {thread.sleep;} catch (Exception e) {}

}

Now look back at Sayphoneword (), here's another way we have no analysis, namely the Getsound () method.

The Getsound () method reads a pre-recorded sound sample from an au file in the form of byte data. To understand the detailed procedures for reading data, converting audio formats, initializing sound output lines (soucedataline), and constructing byte data, refer to the comments in the following code:

/*

* This method reads a phoneme from a file,

* and convert it to a byte array

*/

Private byte[] Getsound (String fileName)

{

Try

{

URL Url=talker.class.getresource (fileName);

Audioinputstream stream = audiosystem.getaudioinputstream (URL);

Audioformat format = Stream.getformat ();

Convert a alaw/ulaw sound into a PCM for playback

if ((format.getencoding () = = AudioFormat.Encoding.ULAW) | |

(format.getencoding () = = AudioFormat.Encoding.ALAW))

{

Audioformat Tmpformat = new Audioformat (

AudioFormat.Encoding.PCM_SIGNED,

Format.getsamplerate (), Format.getsamplesizeinbits () * 2,

Format.getchannels (), Format.getframesize () * 2,

Format.getframerate (), true);

stream = Audiosystem.getaudioinputstream (Tmpformat, stream);

format = Tmpformat;

}

Dataline.info Info = new Dataline.info (

Clip.class, Format,

(int) stream.getframelength () * format.getframesize ()));

if (line==null)

{

The output line is not yet instantiated

Can you find the right type of output line?

Dataline.info outinfo = new Dataline.info (Sourcedataline.class,

format);

if (! Audiosystem.islinesupported (Outinfo))

{

System.out.println ("does not support matching" + outinfo + "output lines");

throw new Exception ("does not support matching" + outinfo + "output lines");

}

Open the output line

line = (sourcedataline) audiosystem.getline (outinfo);

Line.open (format, 50000);

Line.start ();

}

int framesizeinbytes = Format.getframesize ();

int bufferlengthinframes = Line.getbuffersize ()/8;

int bufferlengthinbytes = Bufferlengthinframes * framesizeinbytes;

Byte[] Data=new byte[bufferlengthinbytes];

Reads byte data, and counts

int numbytesread = 0;

if ((Numbytesread = stream.read (data))! =-1)

{

int numbytesrmaining = Numbytesread;

}

Cutting byte data into the right size

Byte[] Newdata=new Byte[numbytesread];

for (int i=0; i

Newdata[i]=data[i];

return newdata;

}

catch (Exception e)

{

return new byte[0];

}

}

That's all the code, including the comments, a speech synthesizer with about 150 lines of code.

Iii. text-to-speech conversion

Specifying the words to be read in the form of a speech element seems overly complex, and if you want to construct an app that can read text (such as a Web page or email), we want to be able to specify the original text directly.

After delving into this problem, I provide an experimental text-to-speech conversion class in the zip file later in this article. Run this class and it will show the results of the analysis. The text-to-speech class can be executed from the command line as follows:

Java com.lotontech.speech.Converter "Hello there"

Output result classes such as:

Hello--H|e|l|oo

There-Dth|aer

If you run the following command:

Java com.lotontech.speech.Converter "I like to read Javaworld"

The output is:

I-II

Like-l|ii|k

To-T|ouu

Read-R|ee|a|d

Java-J|a|v|a

World-W|err|l|d

How does this conversion class work? In fact, my approach is fairly simple, and the conversion process is to apply a set of text substitution rules in a certain order. For example, for the word "ant", "want", "wanted", "unwanted", and "unique", the substitution rule we want to apply might be:

With "|y|ou|n|ee|k|" Replace "*unique*"

With "|w|o|n|t|" Replace "*want*"

With "|a|" Replace "*a*"

With "|e|" Replace "*e*"

With "|d|" Replace "*d*"

With "|n|" Replace "*n*"

With "|u|" Replace "*u*"

With "|t|" Replace "*t*"

For "unwanted", the output sequence is:

Unwanted

Un[|w|o|n|t|] Ed (rule 2)

[|u|] [|n|] [|w|o|n|t|] [|e|] [|d|] (Rules 4, 5, 6, 7)

U|n|w|o|n|t|e|d (after removing the extra characters)

You will see words that contain the letter "wont" and words that contain the letter "ant" in different ways, and you will see that under the special rules, "unique" takes precedence over other rules as a complete word, thus the word "unique" reads "Y|ou ..." instead of "u|n." ...”。

Conclusion: This article provides an easy-to-run speech engine that you can use in your own Java 1.3 application. If you analyze the code carefully, it also provides you with a practical tutorial for playing audio fragments with the Javasound API. To make it really useful, you should consider text-to-speech technology, because this is the real underpinning for the text reading application I mentioned earlier. To improve the effectiveness of this scenario, you must construct a large replacement rule base that carefully adjusts the precedence of the application rules. I hope you have more perseverance than me!

Let Java speak-implementing the speech engine in Java

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.