What are the benefits of adding voice capabilities to applications? Roughly speaking, it is for fun. It is suitable for all interesting applications, such as games. Of course, from a more serious perspective, it also involves the availability of applications. Note: What I want to consider here is not only the inherent limitations of the visual interface, but also the following situations: in some cases, it is inconvenient or even illegal to let your eyes leave the current job. For example, if you have a browser with voice functions, you can go out for a walk or drive to work and listen to your favorite website.
At present, email reader may be a more practical application of speech technology. With the help of JavaMail API, this is all possible. The email reader can regularly check the inbox and then use the voice "You have new mail, wocould you like me to read it to you ?" Attract your attention. In a similar way, we can also consider a wake-up tool with a voice function to connect it to a calendar application: it will promptly remind you that "Dont forget your meeting with the boss in 10 minutes! ".
Maybe you have been attracted by these ideas, or you have a better idea. Now let's continue. First, I will introduce how to enable the voice engine provided in this Article. If you think that the implementation details of the voice engine are too complicated, you can directly use it to ignore its implementation details.
I. Trial Use of the voice Engine
To use this speech engine, you must add the javatalk provided in this article to CLASSPATH. jar file, and then run from the command line (or call from the Java program) com. lotontech. speech. talker class. If you run from the command line, the command is:
Java com. lotontech. speech. Talker "h | e | l | oo"
If called from a Java program, the code is:
Com. lotontech. speech. Talker talker = new com. lotontech. speech. Talker ();
Talker. sayPhoneWord ("h | e | l | oo ");
Now, you may be confused about the "h | e | l | oo" string provided on the command line (or when the sayPhoneWord () method is called. I will explain it below.
The working principle of the Speech engine is to connect small sound samples. Each sample is the smallest unit of human speech (English. These sound samples are called allophone ). Each factor corresponds to one, two, or three letters. From the voice representation of "hello", we can see that the pronunciation of some letter combinations is obvious, but some are not very obvious:
H -- pronounced
E -- pronounced
L -- the pronunciation is obvious, but note that the two "l" are reduced to one "l ".
OO -- should be read as the pronunciation in "hello", and should not be read as the pronunciation in "bot" or "too.
The following is a list of valid Phoneme:
A: such as cat
B: such as cab
C: such as cat
D: such as dot
E: such as bet
F: for example, frog
G: for example, frog
H: such as hog
I: such as pig
J: for example, jig
K: for example, keg
L: such as leg
M: such as met
N: for example, begin
O: such as not
P: such as pot
R: such as rot
S: such as sat
T: such as sat
U: such as put
V: such as have
W: such as wet
Y: such as yet
Z: such as zoo
Aa: such as fake
Ay: such as hay
Ee: such as bee
Ii: for example, high
Oo: such as go
Bb: Expression of B, with different accents
Dd: The change form of d with different accents
Ggg: the form of g variation with different accents
Hh: h variation form with different accents
Ll: l variation form with different accents
Nn: The variation form of n, with different accents
Rr: The change form of r, with different accents
Tt: t variant, with different accents
Yy: the form of y, with different accents
Ar: such as car
Aer: such as care
Ch: such as which
Ck: such as check
Ear: such as beer
Er: such as later
Err: such as later)
Ng: such as feeding
Or: such as law
Ou: such as zoo
Ouu: such as zoo (changyin)
Ow: such as cow
Oy: such as boy
Sh: for example, shut
Th: such as thing
Dth: such as this
Uh: u variation form
Wh: such as where
Zh: for example, Asian
When a person speaks, the voice changes in the entire sentence. The tone change makes the voice more natural and infectious, and makes the questions and statements different from each other. Consider the following two sentences:
It is fake -- f | aa | k
Is it fake? -- F | AA | k
You may have guessed that the method to improve the tone is to use uppercase letters.
The above is what you need to know when using the software. If you are interested in the background Implementation Details, please continue reading.
Ii. Speech Engine implementation
The Speech engine provides only one class and four methods. It uses Java Sound APIs contained in J2SE 1.3. Here, I am not going to fully introduce this API, but you can learn its usage through instances. Java Sound API is not a complex API. Comments in the Code tell you what you must know.
The basic definition of the Talker class is as follows:
Package com. lotontech. speech;
Import javax. sound. sampled .*;
Import java. io .*;
Import java. util .*;
Import java.net .*;
Public class Talker
{
Private SourceDataLine line = null;
}
If you execute Talker from the command line, the following main () method runs as the entry point. The main () method gets the first command line parameter, and then passes it to the sayPhoneWord () method:
/*
* Read the string that represents the pronunciation specified in the command line.
*/
Public static void main (String args [])
{
Talker player = new Talker ();
If (args. length> 0) player. sayPhoneWord (args [0]);
System. exit (0 );
} The sayPhoneWord () method can be called either through the main () method above or directly in a Java program. On the surface, the sayPhoneWord () method is complex, but not actually. In fact, it simply traverses the voice elements of all words (the voice elements in the input string are separated by "|") and plays them out through a sound output channel, an element, and an element. To make the sound more natural, I combine the end of each sound sample with the beginning of the next sound sample:
/*
* Read the specified voice string
*/
Public void sayPhoneWord (String word)
{
// Simulate byte array constructed for the previous sound
Byte [] previussound = null;
// Split the input string into separate Phoneme
StringTokenizer st = new StringTokenizer (word, "|", false );
While (st. hasMoreTokens ())
{
// Construct the corresponding file name for the phoneme
String thisPhoneFile = st. nextToken ();
ThisPhoneFile = "/allophones/" + thisPhoneFile + ". au ";
// Read data from audio files
Byte [] thisSound = getSound (thisPhoneFile );
If (previussound! = Null)
{
// If possible, merge the previous phoneme with the current Phoneme
Int mergeCount = 0;
If (previussound. length> = 500 & thisSound. length> = 500)
MergeCount = 500;
For (int I = 0; I
{
Previussound [previussound. length-mergeCount + I]
= (Byte) (previousSound [previussound. length
-MergeCount + I] + thisSound [I])/2 );
}
// The first Phoneme
PlaySound (previussound );
// Use the truncated current phoneme as the previous Phoneme
Byte [] newSound = new byte [thisSound. length-mergeCount];
For (int ii = 0; ii
NewSound [ii] = thisSound [ii + mergeCount];
Previussound = newSound;
)
Else
Previussound = thisSound;
)
// Play the last phoneme and clear the audio channel
PlaySound (previussound );
Drain ();
}
After sayPhoneWord (), you can see that it calls playSound () to output a single sound sample (I .e. a phoneme), and then calls drain () to clear the sound channel. The following is the playSound () code:
/*
* This method plays a sound sample.
*/
Private void playSound (byte [] data)
{
If (data. length> 0) line. write (data, 0, data. length );
}
The following is the code for drain:
/*
* This method clears the sound channel.
*/
Private void drain ()
{
If (line! = Null) line. drain ();
Try {Thread. sleep (100);} catch (Exception e ){}
}
The sayPhoneWord () method can be called either through the main () method above or directly in a Java program. On the surface, the sayPhoneWord () method is complex, but not actually. In fact, it simply traverses the voice elements of all words (the voice elements in the input string are separated by "|") and plays them out through a sound output channel, an element, and an element. To make the voice more natural