Tips for AI Interactive integration with online speech synthesis capabilities

Source: Internet
Author: User

Online speech synthesis is to convert text into sound in a networked scenario, enabling the machine to interact with the human voice. This concept should be better understood, and the following is a combination of the Android online demo to explain the process of synthesis and some of the problems you often encounter.


To the official Website SDK Download Center Download the online command word Recognition SDK (for example, the Android version here), you can find that the directory structure inside the package and dictation is very similar, here to tell you a small secret:


650) this.width=650; "Src=" Http://s1.51cto.com/wyfs02/M00/82/4D/wKiom1dQ4SWy7riCAADYrpG5744844.jpg-wh_500x0-wm_3 -wmp_4-s_3776589910.jpg "title=" 4.jpg "alt=" Wkiom1dq4swy7ricaadyrpg5744844.jpg-wh_50 "/>


Download the SDK package for Android on the official website is identical, why is this? Because dictation, online synthesis, online command word three features in our platform is placed in an SDK package, version is also to do a unified maintenance, the SDK package of the demo is also a collection of this three functions of the demo function (of course, also contains some other features, such as semantic understanding, evaluation), as shown in:


650) this.width=650; "Src=" Http://s5.51cto.com/wyfs02/M02/82/4C/wKioL1dQ4jGwSBprAABYSpM1-4s267.jpg-wh_500x0-wm_3 -wmp_4-s_160139091.jpg "title=" 3.jpg "alt=" Wkiol1dq4jgwsbpraabyspm1-4s267.jpg-wh_50 "/>

The Java files in the red box are from top to bottom: command word recognition demo, dictation demo, Evaluation sub Demo, demo main portal interface, speechutility initialization, synth demo, semantic interpretation sub demo. It may be noted that the command word recognition demo, dictation sub-demo, synthetic sub-demo all have corresponding local functions, these local features are provided by the language Memory app, not the offline SDK, so if you do not have the language app installed on your phone, The local features you see in the demo are not available (error voice not installed 21001), this part wants to have a further understanding of the post:


21001. Explanation: http://bbs.xfyun.cn/forum.php?mod=viewthread&tid=11724&fromuid=33982


Take a look at the Ttsdemo.java inside the call flow, note that before this must not forget the Speechapp.java inside the speechutility initialization.


650) this.width=650; "Src=" Http://s5.51cto.com/wyfs02/M00/82/4D/wKiom1dQ4UKxlTKwAAGAaWSwbSY561.png-wh_500x0-wm_3 -wmp_4-s_3519731359.png "title=" qq picture 20160323092639.png "alt=" Wkiom1dq4ukxltkwaagaawswbsy561.png-wh_50 "/>


* * First is to initialize the composition object * *


MTts = Speechsynthesizer.createsynthesizer (Ttsdemo.this, Mttsinitlistener);


Mttsinitlistener is the initial listener, and it is safer to do the following after the initialization callback succeeds.


* * Then set the relevant composition parameters * *


Empty parameters

Mtts.setparameter (speechconstant.params, NULL);

Set corresponding parameters according to the composition engine

if (Menginetype.equals (Speechconstant.type_cloud)) {

Mtts.setparameter (Speechconstant.engine_type, Speechconstant.type_cloud);

Set up online synth pronunciation people

Mtts.setparameter (Speechconstant.voice_name, voicer);

if (! " Neutral ". Equals (Emot)) {

Current only pronunciation people "AI" support set emotion

"AI" pronunciation people need to pay for use, specifically, please contact: [Email protected]

Mtts.setparameter (Speechconstant.emot, Emot);

}

Set the composition speed

Mtts.setparameter (Speechconstant.speed, msharedpreferences.getstring ("Speed_preference", "50"));

Set composition tones

Mtts.setparameter (Speechconstant.pitch, msharedpreferences.getstring ("Pitch_preference", "50"));

Set Composition Volume

Mtts.setparameter (Speechconstant.volume, msharedpreferences.getstring ("Volume_preference", "50"));



Set the player audio stream type

Mtts.setparameter (Speechconstant.stream_type, msharedpreferences.getstring ("Stream_preference", "3"));

Set Play composition audio interrupt music playback, default to True

Mtts.setparameter (Speechconstant.key_request_focus, "true");

Set audio save path, save audio format support PCM, WAV, set path to SD card please note Write_external_storage permissions

Note: Audio_format parameter requires an updated version to take effect

Mtts.setparameter (Speechconstant.audio_format, "wav");

Mtts.setparameter (Speechconstant.tts_audio_path, environment.getexternalstoragedirectory () + "/msc/tts.wav");


The role of the relevant parameters has been described in more detail in the comments.

* * Final start of synthesis * *

Mtts.startspeaking (text, mttslistener);


Where text is what you need to synthesize, mttslistener is a synthetic listener that contains callbacks that are useful for crafting progress, compositing, and so on.


* * Frequently Asked questions and answers: * *


1, the pronunciation of the parameters can be set to what pronunciation people?


The currently supported online pronunciation list is shown in the following post

"Official" synthetic pronunciation of people online, offline support

http://bbs.xfyun.cn/forum.php?mod=viewthread&tid=12012&fromuid=33982

(Source: Voice Cloud Community)


2, how long can the text be synthesized? What should I do if I want to synthesize text in a very long time?


Synthetic text is a length-limited, one-time maximum of 8,192 bytes of text, and you may be able to convert the maximum number of characters supported according to the encoding format of your text. If the text you want to synthesize is too long, you can divide it by the length limit and then merge or play the audio data in multiple compositions.


3. Is the audio data synthesized on line not compressed? Will it be very expensive to flow?


and dictation when the audio data upload similar, the composition is the server returned audio data is "Speex encoded" compression, compression ratio of about 1:10, that is, when the service side of the synthesized audio 1:10 compression, and then the Client SDK received audio and then decoded into non-compressed raw audio, Then according to the 16K sampling rate of 16-bit sampling precision mono audio to calculate, 1 seconds of audio compression size about 3K, not very large. In addition, the above-mentioned Speex encoding is not a common Speex encoding, so the two ways to compress the audio can not be common.


Tips for AI Interactive integration with online speech synthesis capabilities

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.