- First of all, why read this article.
- This paper if not wrong should be based on DNN do cross-lingual adaptation, now DNN is still very fire, so if can dnn to do cross-lingual adaptation certainly have a future
- The paper mentions that the training is using the Theano library, this library I have touched a bit before, using the Gtx690gpus for training, that is, the code does not write their own.
- This paper is cross-lingual adaptation for ASR, see if you can draw something from ASR to the crafting side.
- Introduction
First paragraph:
In Cross-lingual automatic speech recognition (ASR), models applied to a target language is enhanced using data from a DI Fferent source language.
- Strongly found that the previous reading of the paper a little, or to a wide range of papers, at least I know now in cross-lingual adaptation synthesis, no one with DNN to do, but in cross lingual recognition, has used DNN to do a lot of people, If you can learn something from ASR, then you can certainly send a good article
- Suppose you now have 1000 sentences in Cantonese, training an ASR model to recognize Cantonese,
- If there are 1000 sentences of English corpus, English is called source language, Cantonese is called target language,
- Then using these 1000 sentences of the English corpus, in the previous training of the ASR model to retrain, will be strengthened models.
- In this scenario, the target language was typically low-resourced:transcribed acoustic training data for the target Langua GE May is difficult or expensive to acquire.
- The target language is very few,
- And it's very difficult to record the training data in the target language.
- That is, the target language is difficult to obtain, only a small amount, but the source language is very easy to obtain.
- The cross-lingual approach is motivated by the fact and the source language data, despite being mismatched to the target, May cap-ture Common properties of the acoustics of speech which is shared across languages, improving the Generalisatio N of the fi-nal models to unseen speakers and conditions.
- How does a cross-lingual approach be inspired?
- The data of the source language, which can be captured to a common acoustic characteristic attribute, is shared by the Kua language,
- Say something, feel the statement is not fluent, that is, source language English and target language Cantonese, although it is a different language, but they still have some acoustic characteristics can be shared, there must be some acoustic characteristics of each language is unique
- This is based on the shared acoustic characteristics between the different languages, which can enhance the universality of the final model.
Second paragraph
-
- Cross-lingual ASR may be viewed as a form of adaptation.
- The cross-language ASR can be thought of as an adaptive one,
- What do you mean?
- Adaptive to a broad concept, the following include
- ASR for cross-language
- Cross-language synthesis
- .....
- In contrast to domain or speaker adaptation, the major problem with cross-lingual adaptation arises from the differences I n phone sets between the source and target languages.
- Compared to domain adaptive or speaker self-adaptation.
- What causes the main problem of cross-language adaptive?
- is caused by differences in the phoneme set of the source and target language languages.
- Even when a universal phone set was used, it had been found that realisation of what are ostensibly the same phone still dif Fers across languages [1].
- Despite the use of a common set of phonemes,
- The following sentence will not translate,
- In this paper, we focus on approaches where source and target languages be assumed not to share a phone set, which is pro Bably a valid assumption when a small number of source lan-guages is used, which is unlikely to provide complete phone Coverage for an arbitrary target language.
- The author's approach is to assume that source and target language do not share a phone set
- Perhaps the author's hypothesis is an effective hypothesis when a small amount of source language is used, in which case it is impossible to provide a complete phoneme overlay for any target language
Third paragraph:
-
- Arguably the simplest approach to the problem of cross-lingual Phoneset mismatch are to define a deterministic mapping bet Ween source and target phone sets [2] which may is estimated in a data-driven fashion [3].
- There are some ways to solve the mismatch of Cross-lingual's phoneme set
- One simple approach is to define a deterministic mapping between the source and target phoneme sets.
- This seems to be a commonly used method in synthesis, as I do now is not the state mapping?
However, this hard mapping leads to a loss of information from the target language acoustics the cannot is represented by A single source language phone.
- However, this mandatory mapping leads to the loss of information
An alternative are to learn a probabilistic mapping, in which the distribution of target phonemes are expressed over a featu Re space comprising source language phone posterior probability estimates, which may be formulated as a product-of-experts Model [4] or as a kl-hmm [5].
- Another method is probabilistic mapping
- The distribution of the target phoneme is represented by a feature space ..... There's no translation back.
- The two practical examples are:
- Generate expert Models
- KL-HMM model
Here, the source languages is viewed as defining a low-dimensional subspace in which to es-timate target language mod Els.
- Source language is considered to be a sub-space that defines a lower dimension to estimate the model of the target language.
- This was the motivation behind the work of [6], where a subspace GMM (SGMM) was used, in which the source languages define a SubSpace of full covariance gaussians.
- This is inspired by the work of "6", using a subspace of GMM
- The source language defines a full covariance Goss space.
- There are a lot of mathematical knowledge involved here.
"Cross-lingual adaptation with multi-task adaptive Networks" (1)