HTML5 Speech Synthesis speech Synthesis API Introduction

Source: Internet
Author: User

In the front-end to achieve speech synthesis, the text will be described, a start to consider using the method of speech synthesis Baidu TTS, and later found that HTML5 itself to support speech synthesis. Directly with HTML5, Baidu's that also have the number of calls limit, configuration also trouble one, about HTML5 voice web Speech API

HTML5 and web Speech related APIs actually have two categories, one is "speech recognition (Speech recognition)", the other is "speech synthesis (Speech Synthesis)", these two nouns sound very tall, actually refers to the difference is " Voice to text ", and" text to Speech ".

And this article is to introduce the "speech synthesis-text to speech." Why is it called "compositing"? Say you Siri pronounce Hello, world! In fact, the pronunciation of the 4 words "You", "Good", "world" and "bounded" are merged together, so called "speech synthesis".

"Speech recognition" and "speech synthesis" look like both positive and negative aspects, should have a mirror temperament, in fact, at least from the compatibility point of view, the two cannot be directly equivalent.

"Speech recognition (Speech recognition)" is currently supported by default on Chrome browser and dead-brother Opera browser, and requires a webkit private prefix:

However, the compatibility of speech synthesis (Speech Synthesis) is much better, such as:

Therefore, this paper mainly introduces the HTML5 Speech Synthesis API which is more applicable in theory. But until then, Speech recognition API is a simple mention.

Speech Recognition API is speech recognition function, need microphone and other audio input device support, in the Chrome browser, can actually add a simple attribute can let some controls have speech recognition function, a line JS do not need to write, I have written this article before: "Progressive use of HTML5 language recognition, so easy!"

is to add an attribute to the input box x-webkit-speech , for example:

<input X-webkit-speech/>

However, I just opened the demo page a test and found that the original microphone (for the previous) actually disappeared .... It seems to have been ruthlessly abandoned by Chrome!

Well, I didn't say anything about it. But one thing is certain, is the original input box that the voice recognition is used in the speech recognition API, so there is a certain commonality, for example, the text content recognition needs Google server return, so the function and the network environment has a great relationship, For example, Google is a wall, or the speed of the network is slow, it is possible to cause the recognition of anomalies.

The basic routines used are as follows:

    1. to create a new instance of SpeechRecognition. Since the browser has not been widely supported so far, the webKit prefix is required:
       var newrecognition = webkitspeechrecognition (); 
    2. The
    3. sets whether to listen or hear the sound before closing the reception. Implemented by setting the continuous property value. General Chat Communication Use the false property value, which can be set to true if you are writing an article, such as the public number, as follows:
       newrecognition.continuous = true; 
    4. to control the opening and stopping of speech recognition, you can use the start () and Stop () methods:
      //Turn on Newrecognition.start ();//Stop Newrecognition.stop (); 
    5. processing the recognized results, you can use some event methods, such as onresult :
       newrecognition.onresult = function (event) {Console.log ( event);} 

      Event is an object, my home computer does not know what reason, unable to successfully return the recognition content, display network errors, possibly because of the wall:

      So, I looked for the approximate data structure from the Internet:

       {.              Results: {0: {0: {confidence:0.695017397403717, transcript: "Hello, World"
      }, Isfinal:true, length:1}, Length:1},..} 

In addition to result events, there are other events, such as,,, and soundstart speechstart error so on.

Second, about the speech synthesis speech Synthesis API

Let's start with the simplest example, if you want the browser to read "Hello, world!" "The sound can be the following JS code:

var utterthis = new window. Speechsynthesisutterance (' Hello, world! '); Window.speechSynthesis.speak (utterthis);

Yes, just a little bit of code is enough, you can run the above two lines of code in your browser's console to see if there is any sound.

The above code appears with two long objects, SpeechSynthesisUtterance and is the speechSynthesis core of the speech synthesis speech Synthesis API.

The first is the SpeechSynthesisUtterance object, which is used primarily to construct speech synthesis instances, such as instance objects in the code above utterThis . We can write the text we want to read directly at the time of construction:

var utterthis = new window. Speechsynthesisutterance (' Hello, world! ‘);

Or use some of the properties of the instance object, including:

    • text– The text content to be synthesized, the string.
    • lang– The language used, strings, for example:"zh-cn"
    • voiceURI– Specify the sounds and services that you want to use, strings.
    • volume– sound volume, interval range is 0 to 1 , default is 1 .
    • rate– speed, value, default is the 1 range is to a multiple of the speaking 0.1 10 speed, for example, 2 twice times the normal speed.
    • pitch– Indicates the speaking pitch, value, range from 0 (minimum) to 2 (maximum). The default value is 1 .

So the code above can also be written:

var utterthis = new window. Speechsynthesisutterance (); utterthis.text = ' Hello, world! ‘;

Not only that, the instance object also exposes a number of methods:

    • onstart– The callback at the start of the speech synthesis.
    • onpause– Callback when speech synthesis is paused.
    • onresume– The callback at the beginning of the speech synthesis.
    • onend– The callback at the end of the speech synthesis.

Next is the speechSynthesis object, the main function is to trigger the behavior, such as read, stop, restore and so on:

    • speak()– can only be received SpeechSynthesisUtterance as the only parameter, the function is to read the synthesized discourse.
    • stop()– immediately terminates the composition process.
    • pause()– Pauses the composition process.
    • resume()– Start the compositing process again.
    • getVoices– This method does not accept any parameters that are used to return the list of voice packs supported by the browser, which is the number of groups, for example, under My Computer, the language pack returned by the Firefox browser is two:

      And in the Chrome browser, the number is amazing:

      Although there is a lot of it, it is a kind of feeling that is useless to people. In my Chrome browser, do not know why, do not read any sound, but the same demo meet, the company's computer can be, I later looked carefully, it is possible (20% possibility) is my home computer win7 version is castrated version, not installed or configured TTS engine.

      The mobile Safari browser will not read either.

      Of these, 17 are Putonghua mainland:

      In addition, getVoices the acquisition is an asynchronous process, so you can enter directly in the console, speechSynthesis.getVoices() return an empty array, it's okay, try a few more times, or make a timer or something.

Iii. What is the use of speech synthesis speech Synthesis API

For blind or visually impaired users, some assistive devices or software are often used to access our web pages, and the principle is to make the user aware of the content by touching or locating certain elements that emit sound.

With the speech synthesis speech Synthesis API, this kind of users, as well as developers themselves, will bring some convenience. First, for visually impaired users, they do not need to install additional software or purchase other devices, they can access our products without hindrance. For developers themselves, our barrier-free construction can become more flexible, not necessarily a fully fit ARIA Accessibility specification (see my previous article "Wai-aria Web application properties full display"), because we can directly allow the browser to synthesize the voice content I want, for example, VoiceOver in the reading of some tags, always accompanied by an "iconic content", even for me as a professional practitioner, the word is a bit jerky and difficult to understand, we can actually put this speech synthesis to themselves, the use of more easily understandable word feedback to the user, I want to experience should be better.

At least I will try to promote the construction of this work in some products in 17.

Another is a two-dimensional code recognition, sometimes the naked eye to see the pain, add a button to let users listen.

Wait ~

HTML5 Speech Synthesis speech Synthesis API Introduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.