Read the paper "TransForm Mapping Using Shared decision Tree Context Clustering for hmm-based cross-lingual Speech Synthesis" (3)

Source: Internet
Author: User

3.1. shareddecisiontreecontextclustering (STC)

  1. STC [one] was originally proposed to avoid generating speaker-biased leaf nodes in the tree construction of a average voic E model.
    1. Sure enough, the author here says where the STC technology comes from.
    2. And then simply introduced the STC technology is to solve what problem
      1. During the construction of the average voice model tree, avoid the leaf nodes that produce speaker deviations
      2. On the above mentioned "the speaker deviation of the leaf node", we have to look at the reference in detail [11], as well as the previously seen to do self-adaptation of a doctoral thesis, is the previous group will not speak clearly of the doctoral thesis.
  2. In the conventional decision-tree-based context cluster-ing for the average voice model, each leaf node does don't always H Ave The training data of all speakers, and some leaf nodes has only a few speakers ' training data.
    1. In the traditional average voice model of the decision-based context clustering technology, each leaf node, not always have the speaker training data, some leaf nodes, only a few speakers of training data.
  3. The experimental results has shown that such speaker-biased leaf nodes degrade the naturalness of the speech synthesized From the adapted model.
      1. speaker-biased leaf nodes
  4. On the other hand, in STC, we have use the questions which can is applied to all speakers.
    1. For STC, we only use questions that can be applied to all speakers.
    2. There is a problem, IBM, Helen, stitching is not all the corpus of Cantonese, and IBM can use the problem, but can not use Helen's?
    3. Or, here I understand wrong, the author here refers to both English and Cantonese speakers. He's here to put STC in between different languages.
  5. As a result, every node of the deci-sion tree have the training data of all speakers, which leads to a speaker-unbiased av Erage Voice model.
    1. This is called the speaker-unbiased average voice model.

================

================

3.2. Transform mapping based on language-independent decision tree using STC

  1. To use contextual information in the transform mapping Be-tween different languages, we must consider the language depend Ency of decision Trees.
    1. This is also a question I am considering, how to consider the context information in the state mapping build process
    2. What is called contextual information, Yu Quanjie, can you give an example yourself?
    3. The author gives a hint here on how to think about context when building state mapping,
      1. The language of the decision tree must be considered dependence
  2. In general, near the root node of the decision trees, there is language-independent proper-ties between the-the-language s in terms of basic articulation manners such as vowel, consonant, and voiced/unvoiced sound.
    1. In the root node of the decision tree, the language-independent properties of the two languages
    2. It's like the basic way of pronouncing:
      1. Vowels
      2. Consonants
      3. Voiceless/voiced
    3. Is that the case?
    4. As if I had seen the HTS training model file, for example,/trees/.../the following model file, did not find this rule,
    5. Or I was wrong to see, this can be seen later
  3. On the other hand, near the leaf nodes, there frequently appear language-dependent properties because some nodes is split Us-ing language-specific questions, e.g., "is the current phoneme diphthong?"
    1. At the leaf node, language-related attributes are generally present, as some nodes are split, using language-specific problems
    2. For example, is the current phoneme a diphthong? This problem is peculiar to English, and Cantonese is certainly not the problem.
  4. To alleviate the language mismatch in the trans-form mapping between the average voice models, we gener-ate a transform Mapping based on a language-independent de-cision tree constructed by STC.
    1. We use STC to build a language-independent decision tree that uses this decision tree to build State mapping
  5. Specifically, we use both av-erage voice models of input and output languages in the Con-text clustering, and the Transf Ormation matrices for the Av-erage voice models is explicitly mapped to each other in the leaf nodes of the language -independent decision Tree.
    1. The average voice model of English and Cantonese is put together, when clustering,
    2. Language-independent decision trees, leaf nodes, if the state of the two languages is in the same leaf node of the language-independent decision tree, then the two state is considered a pair of mapped leaf nodes.
  6. Con-structing The tree, we split nodes from the root using only the questions so can be applied to all speakers of both Languages.
    1. Build tree, what tree, language-independent decision Tree,
    2. Building a tree requires a problem set, so what is the problem set?
      1. Problems in the problem set must be able to be applied in two different languages
      2. Which is the problem of sharing two languages
  7. In this study, we control the tree size by introducing a weight to stopping criterion based on the minimum description l Ength (MDL) [13].
    1. We control the size of the tree by introducing a weight to the stop principle, based on the MDL
  8. To avoid the effect of the language dependency, a smaller tree was constructed compared with this based on MDL.
    1. To avoid the effects of language correlation, a smaller tree is constructed, compared to the MDL-based
  9. Since the node splitting is based on the acoustic parameters of all node, the transform mapping is conducted using both t He acoustic and contextual information, which is more desirable than the conventional State mapping based on KLD.
    1. Since the node splitting is based on the acoustic parameters of each node,
    2. State mapping is built using acoustic features and contextual factors
    3. Wiser than the traditional kld state mapping.
    4. Well, the author himself slipped out, inconsistent, here is state mapping, front is transform mapping
  10. An appro-priate size of the tree was experimentally examined in Sect. 4.3.
    1. A tree of proper size, in verse 4.3, did an experiment

Reading paper TransForm Mapping Using Shared decision Tree Context Clustering for hmm-based cross-lingual Speech Synthesis "(3)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.