Application of XML in speech synthesis

Source: Internet
Author: User
The Internet and everything related to it now seem to be everywhere. You may have tried to receive a voice call from a telemarketer at night or a prescription notification from a local drug store. Now, there is a new technology that can use speech synthesis combined with xml technology to transmit voice information. The Internet and everything related to it now seem to be everywhere. You may have tried to receive a voice call from a telemarketer at night or a prescription notification from a local drug store. Now, there is a new technology that can use speech synthesis combined with xml technology to transmit voice information.


It is not a new thing to transmit information through voice. It is a communication method that we have been using for thousands of years. In addition, receiving a call from a computer is not a new invention. Many voice technologies have become popular today, from fax machines, automatic dialers to integrated voice response systems (IVR ). Telephone is of course the most common application.

Traditional voice systems use pre-recorded samples, dictionaries, and sounds to create the sounds we hear. However, there are many problems with using this pre-recording method. One of the most common problems is the lack of coherence and changes. If there is only one recorded speech version, and each word or sound has only one sample, it is difficult for a computer to issue a question in a different tone than a normal statement. It is equally difficult to let computers know when to use a certain tone or which tone to pronounce.

To help solve the problem of Speech Synthesis, W3C has created a new working draft for the Speech Synthesis Markup Language (Speech Synthesis Markup Language. This new XML vocabulary allows voice browser developers to control the creation of a speech synthesizer. For example, a developer can include a command into the volume and use it in the speech synthesis mode.

SSML standards are based on Sun's early research project called jspeeck Markup Language (JSML. JSML is based on java Speech API Markup Language. Currently, SSML is a work draft of W3C speech Research Workgroup.

The basic goal of SSML is a Text-To-Speech (TTS) processor. A tts engine obtains a set of text and converts it to speech. Now we have several TTS applications, such as telephone speech synthesis and reply systems and more advanced systems designed for blind people. The inherent uncertainty of the pronunciation of a specific text set is one of the main challenges faced by the existing TTS system. Other common problems are the pronunciation of word categories, such as abbreviated words (such as HTML), spelling and pronunciation of different words (such as subpoena.

The basic element of SSML specifies the text format. For example, for HTML, the SSML language provides a paragraph element that goes further. Because it also provides sentence elements. By specifying the sentence address, including the start address and end address, like specifying a paragraph, the TTS engine can generate speech more accurately.

In addition to the basic format, SSML also provides a function to specify how to send a predefined word or word set. This function is implemented by the "say-as" element. It is a very useful component in SSML. It allows you to specify a template that describes how to pronounce a word or word set. With "say-as", we can specify the pronunciation of the abbreviated words, or the pronunciation of words with different spelling and pronunciation. We can also list the differences between numbers and dates. The "say-as" element includes support for email addresses, currencies, and phone numbers.

We can also provide a speech expression for texts. For example, we can use this method to identify the differences between American English and British English in the pronunciation of potato words.

Several advanced attributes of SSML can help us make the TTS system generate a more user-friendly voice. You can use the voice element to specify the male, female, or neutral voice, and the age of the voice. We can use this element to specify any sound between a 4-year-old boy and a 75-year-old man.

We can also use the "emphasis" element to enclose texts that require emphasis or importance. We can also use the break element to tell the system that the voice should be paused somewhere.

The most advanced feature of SSML is now on its "PRosody" element. Through this method, we can generate the voice of a specific text set in a specified way. We can specify the tone, range, and speed of the voice (words per minute ). We can even specify more details by using the "contour" element. The "contour" element integrates tone and speed. By specifying the "contour" element value of a text set, we can more accurately define how to generate a speech.

The above is the content of XML application in speech synthesis. For more information, see PHP Chinese website (www.php1.cn )!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.