3.1. shareddecisiontreecontextclustering (STC)
- STC [one] was originally proposed to avoid generating speaker-biased leaf nodes in the tree construction of a average voic E model.
- Sure enough, the author here says where the STC technology comes from.
- And then simply introduced the STC technology is to solve what problem
- During the construction of the average voice model tree, avoid the leaf nodes that produce speaker deviations
- On the above mentioned "the speaker deviation of the leaf node", we have to look at the reference in detail [11], as well as the previously seen to do self-adaptation of a doctoral thesis, is the previous group will not speak clearly of the doctoral thesis.
- In the conventional decision-tree-based context cluster-ing for the average voice model, each leaf node does don't always H Ave The training data of all speakers, and some leaf nodes has only a few speakers ' training data.
- In the traditional average voice model of the decision-based context clustering technology, each leaf node, not always have the speaker training data, some leaf nodes, only a few speakers of training data.
The experimental results has shown that such speaker-biased leaf nodes degrade the naturalness of the speech synthesized From the adapted model.
- speaker-biased leaf nodes
- On the other hand, in STC, we have use the questions which can is applied to all speakers.
- For STC, we only use questions that can be applied to all speakers.
- There is a problem, IBM, Helen, stitching is not all the corpus of Cantonese, and IBM can use the problem, but can not use Helen's?
- Or, here I understand wrong, the author here refers to both English and Cantonese speakers. He's here to put STC in between different languages.
- As a result, every node of the deci-sion tree have the training data of all speakers, which leads to a speaker-unbiased av Erage Voice model.
- This is called the speaker-unbiased average voice model.
================
================
3.2. Transform mapping based on language-independent decision tree using STC
- To use contextual information in the transform mapping Be-tween different languages, we must consider the language depend Ency of decision Trees.
- This is also a question I am considering, how to consider the context information in the state mapping build process
- What is called contextual information, Yu Quanjie, can you give an example yourself?
- The author gives a hint here on how to think about context when building state mapping,
- The language of the decision tree must be considered dependence
- In general, near the root node of the decision trees, there is language-independent proper-ties between the-the-language s in terms of basic articulation manners such as vowel, consonant, and voiced/unvoiced sound.
- In the root node of the decision tree, the language-independent properties of the two languages
- It's like the basic way of pronouncing:
- Vowels
- Consonants
- Voiceless/voiced
- Is that the case?
- As if I had seen the HTS training model file, for example,/trees/.../the following model file, did not find this rule,
- Or I was wrong to see, this can be seen later
- On the other hand, near the leaf nodes, there frequently appear language-dependent properties because some nodes is split Us-ing language-specific questions, e.g., "is the current phoneme diphthong?"
- At the leaf node, language-related attributes are generally present, as some nodes are split, using language-specific problems
- For example, is the current phoneme a diphthong? This problem is peculiar to English, and Cantonese is certainly not the problem.
- To alleviate the language mismatch in the trans-form mapping between the average voice models, we gener-ate a transform Mapping based on a language-independent de-cision tree constructed by STC.
- We use STC to build a language-independent decision tree that uses this decision tree to build State mapping
- Specifically, we use both av-erage voice models of input and output languages in the Con-text clustering, and the Transf Ormation matrices for the Av-erage voice models is explicitly mapped to each other in the leaf nodes of the language -independent decision Tree.
- The average voice model of English and Cantonese is put together, when clustering,
- Language-independent decision trees, leaf nodes, if the state of the two languages is in the same leaf node of the language-independent decision tree, then the two state is considered a pair of mapped leaf nodes.
- Con-structing The tree, we split nodes from the root using only the questions so can be applied to all speakers of both Languages.
- Build tree, what tree, language-independent decision Tree,
- Building a tree requires a problem set, so what is the problem set?
- Problems in the problem set must be able to be applied in two different languages
- Which is the problem of sharing two languages
- In this study, we control the tree size by introducing a weight to stopping criterion based on the minimum description l Ength (MDL) [13].
- We control the size of the tree by introducing a weight to the stop principle, based on the MDL
- To avoid the effect of the language dependency, a smaller tree was constructed compared with this based on MDL.
- To avoid the effects of language correlation, a smaller tree is constructed, compared to the MDL-based
- Since the node splitting is based on the acoustic parameters of all node, the transform mapping is conducted using both t He acoustic and contextual information, which is more desirable than the conventional State mapping based on KLD.
- Since the node splitting is based on the acoustic parameters of each node,
- State mapping is built using acoustic features and contextual factors
- Wiser than the traditional kld state mapping.
- Well, the author himself slipped out, inconsistent, here is state mapping, front is transform mapping
- An appro-priate size of the tree was experimentally examined in Sect. 4.3.
- A tree of proper size, in verse 4.3, did an experiment
Reading paper TransForm Mapping Using Shared decision Tree Context Clustering for hmm-based cross-lingual Speech Synthesis "(3)