Read the paper "TransForm Mapping Using Shared decision Tree Context Clustering for hmm-based cross-lingual Speech Synthesis" (3)

Last Update:2015-04-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

3.1. shareddecisiontreecontextclustering (STC)

STC [one] was originally proposed to avoid generating speaker-biased leaf nodes in the tree construction of a average voic E model.
1. Sure enough, the author here says where the STC technology comes from.
2. And then simply introduced the STC technology is to solve what problem
  1. During the construction of the average voice model tree, avoid the leaf nodes that produce speaker deviations
  2. On the above mentioned "the speaker deviation of the leaf node", we have to look at the reference in detail [11], as well as the previously seen to do self-adaptation of a doctoral thesis, is the previous group will not speak clearly of the doctoral thesis.
In the conventional decision-tree-based context cluster-ing for the average voice model, each leaf node does don't always H Ave The training data of all speakers, and some leaf nodes has only a few speakers ' training data.
1. In the traditional average voice model of the decision-based context clustering technology, each leaf node, not always have the speaker training data, some leaf nodes, only a few speakers of training data.

speaker-biased leaf nodes

On the other hand, in STC, we have use the questions which can is applied to all speakers.
1. For STC, we only use questions that can be applied to all speakers.
2. There is a problem, IBM, Helen, stitching is not all the corpus of Cantonese, and IBM can use the problem, but can not use Helen's?
3. Or, here I understand wrong, the author here refers to both English and Cantonese speakers. He's here to put STC in between different languages.
As a result, every node of the deci-sion tree have the training data of all speakers, which leads to a speaker-unbiased av Erage Voice model.
2. This is called the speaker-unbiased average voice model.

================

3.2. Transform mapping based on language-independent decision tree using STC

To use contextual information in the transform mapping Be-tween different languages, we must consider the language depend Ency of decision Trees.
1. This is also a question I am considering, how to consider the context information in the state mapping build process
2. What is called contextual information, Yu Quanjie, can you give an example yourself?
3. The author gives a hint here on how to think about context when building state mapping,
  1. The language of the decision tree must be considered dependence
In general, near the root node of the decision trees, there is language-independent proper-ties between the-the-language s in terms of basic articulation manners such as vowel, consonant, and voiced/unvoiced sound.
1. In the root node of the decision tree, the language-independent properties of the two languages
2. It's like the basic way of pronouncing:
  1. Vowels
  2. Consonants
  3. Voiceless/voiced
3. Is that the case?
4. As if I had seen the HTS training model file, for example,/trees/.../the following model file, did not find this rule,
5. Or I was wrong to see, this can be seen later
On the other hand, near the leaf nodes, there frequently appear language-dependent properties because some nodes is split Us-ing language-specific questions, e.g., "is the current phoneme diphthong?"
1. At the leaf node, language-related attributes are generally present, as some nodes are split, using language-specific problems
2. For example, is the current phoneme a diphthong? This problem is peculiar to English, and Cantonese is certainly not the problem.
To alleviate the language mismatch in the trans-form mapping between the average voice models, we gener-ate a transform Mapping based on a language-independent de-cision tree constructed by STC.
1. We use STC to build a language-independent decision tree that uses this decision tree to build State mapping
Specifically, we use both av-erage voice models of input and output languages in the Con-text clustering, and the Transf Ormation matrices for the Av-erage voice models is explicitly mapped to each other in the leaf nodes of the language -independent decision Tree.
1. The average voice model of English and Cantonese is put together, when clustering,
2. Language-independent decision trees, leaf nodes, if the state of the two languages is in the same leaf node of the language-independent decision tree, then the two state is considered a pair of mapped leaf nodes.
Con-structing The tree, we split nodes from the root using only the questions so can be applied to all speakers of both Languages.
1. Build tree, what tree, language-independent decision Tree,
2. Building a tree requires a problem set, so what is the problem set?
  1. Problems in the problem set must be able to be applied in two different languages
  2. Which is the problem of sharing two languages
In this study, we control the tree size by introducing a weight to stopping criterion based on the minimum description l Ength (MDL) [13].
1. We control the size of the tree by introducing a weight to the stop principle, based on the MDL
To avoid the effect of the language dependency, a smaller tree was constructed compared with this based on MDL.
1. To avoid the effects of language correlation, a smaller tree is constructed, compared to the MDL-based
Since the node splitting is based on the acoustic parameters of all node, the transform mapping is conducted using both t He acoustic and contextual information, which is more desirable than the conventional State mapping based on KLD.
1. Since the node splitting is based on the acoustic parameters of each node,
2. State mapping is built using acoustic features and contextual factors
3. Wiser than the traditional kld state mapping.
4. Well, the author himself slipped out, inconsistent, here is state mapping, front is transform mapping
An appro-priate size of the tree was experimentally examined in Sect. 4.3.
1. A tree of proper size, in verse 4.3, did an experiment

Reading paper TransForm Mapping Using Shared decision Tree Context Clustering for hmm-based cross-lingual Speech Synthesis "(3)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Read the paper "TransForm Mapping Using Shared decision Tree Context Clustering for hmm-based cross-lingual Speech Synthesis" (3)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Read the paper "TransForm Mapping Using Shared decision Tree Context Clustering for hmm-based cross-lingual Speech Synthesis" (3)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support