2.1 Decision Tree Marginalization
- Now the basic process of decision tree marginalization has been understood
- briefly describes:
- This decision tree is a hmm synthesis decision tree
- The given Triphone callout is: r-ih+z
- Then, based on the given Triphone annotation, take advantage of the current speech synthesis Model, to infer the model of the speech recognition
- for a given triphone take advantage of the current speech synthesis decision tree, starting from the root node to run down the root node of the
- problem, the right is voiceless it? The sound on the right is obviously Z, it's voiced,
- so go to the left node, and then the question is: is the syllable heavy? Wipe, this problem is not in the context information, how to do?
- since it's not, I'm going to put the left child of the middle node into the final recognition model,
- and then go to the right node, the question is: is fricative on the right? Yes, go to the right leaf node
- Finally, the parameters of the r-ih+z identification model are calculated by combining G1 and G3 together.
- I probably understand how decision tree marginalization is used to make cross-lingual adaptation.
- is not the first to put a language, such as English corpus, training to get average voice model, and then get the decision tree shown.
- Then, to get the model file for another language, you can walk through the decision tree from the root node of English, and then get the Cantonese model file,
- For example, given a Cantonese contextual information,
-jyu+6#sil+x$kei+4&0+0!0+0|0+ ... 0#0^0#0_0#0-0$0&0$0!0$0|
- Then, to traverse the English decision tree, the final Cantonese syllable of the model file, is a linear combination of the parameters of several leaf nodes in English.
- above is just speculation, not necessarily the right
- But what does not understand is the principle?
- Why is it that in the decision tree traversal of the triphone, the child node of the middle node of the problem that is not related to the current Triphone context information is included in the final parameter calculation to identify the Triphone??
- Now it is clear that the edge of the decision tree, how can it be used to say unsupervised intra-lingual speaker adaptive? What is the process?
- M
5, "Speech recognition with Speech synthesis models by marginalising over decision tree leaves" _1