"Cross-lingual adaptation with multi-task adaptive Networks" (1)

Source: Internet
Author: User
Tags dnn to domain

    1. First of all, why read this article.
      1. This paper if not wrong should be based on DNN do cross-lingual adaptation, now DNN is still very fire, so if can dnn to do cross-lingual adaptation certainly have a future
      2. The paper mentions that the training is using the Theano library, this library I have touched a bit before, using the Gtx690gpus for training, that is, the code does not write their own.
      3. This paper is cross-lingual adaptation for ASR, see if you can draw something from ASR to the crafting side.
    2. Introduction

First paragraph:

    1. In Cross-lingual automatic speech recognition (ASR), models applied to a target language is enhanced using data from a DI Fferent source language.

      1. Strongly found that the previous reading of the paper a little, or to a wide range of papers, at least I know now in cross-lingual adaptation synthesis, no one with DNN to do, but in cross lingual recognition, has used DNN to do a lot of people, If you can learn something from ASR, then you can certainly send a good article
      2. Suppose you now have 1000 sentences in Cantonese, training an ASR model to recognize Cantonese,
      3. If there are 1000 sentences of English corpus, English is called source language, Cantonese is called target language,
      4. Then using these 1000 sentences of the English corpus, in the previous training of the ASR model to retrain, will be strengthened models.
    2. In this scenario, the target language was typically low-resourced:transcribed acoustic training data for the target Langua GE May is difficult or expensive to acquire.
      1. The target language is very few,
      2. And it's very difficult to record the training data in the target language.
      3. That is, the target language is difficult to obtain, only a small amount, but the source language is very easy to obtain.
    3. The cross-lingual approach is motivated by the fact and the source language data, despite being mismatched to the target, May cap-ture Common properties of the acoustics of speech which is shared across languages, improving the Generalisatio N of the fi-nal models to unseen speakers and conditions.
      1. How does a cross-lingual approach be inspired?
      2. The data of the source language, which can be captured to a common acoustic characteristic attribute, is shared by the Kua language,
      3. Say something, feel the statement is not fluent, that is, source language English and target language Cantonese, although it is a different language, but they still have some acoustic characteristics can be shared, there must be some acoustic characteristics of each language is unique
      4. This is based on the shared acoustic characteristics between the different languages, which can enhance the universality of the final model.

Second paragraph

    1. Cross-lingual ASR may be viewed as a form of adaptation.
      1. The cross-language ASR can be thought of as an adaptive one,
      2. What do you mean?
      3. Adaptive to a broad concept, the following include
        1. ASR for cross-language
        2. Cross-language synthesis
        3. .....
    2. In contrast to domain or speaker adaptation, the major problem with cross-lingual adaptation arises from the differences I n phone sets between the source and target languages.
      1. Compared to domain adaptive or speaker self-adaptation.
      2. What causes the main problem of cross-language adaptive?
        1. is caused by differences in the phoneme set of the source and target language languages.
    3. Even when a universal phone set was used, it had been found that realisation of what are ostensibly the same phone still dif Fers across languages [1].
      1. Despite the use of a common set of phonemes,
      2. The following sentence will not translate,
    4. In this paper, we focus on approaches where source and target languages be assumed not to share a phone set, which is pro Bably a valid assumption when a small number of source lan-guages is used, which is unlikely to provide complete phone Coverage for an arbitrary target language.
      1. The author's approach is to assume that source and target language do not share a phone set
      2. Perhaps the author's hypothesis is an effective hypothesis when a small amount of source language is used, in which case it is impossible to provide a complete phoneme overlay for any target language

Third paragraph:

    1. Arguably the simplest approach to the problem of cross-lingual Phoneset mismatch are to define a deterministic mapping bet Ween source and target phone sets [2] which may is estimated in a data-driven fashion [3].
      1. There are some ways to solve the mismatch of Cross-lingual's phoneme set
      2. One simple approach is to define a deterministic mapping between the source and target phoneme sets.
      3. This seems to be a commonly used method in synthesis, as I do now is not the state mapping?
    2. However, this hard mapping leads to a loss of information from the target language acoustics the cannot is represented by A single source language phone.
        1. However, this mandatory mapping leads to the loss of information
      An alternative are to learn a probabilistic mapping, in which the distribution of target phonemes are expressed over a featu Re space comprising source language phone posterior probability estimates, which may be formulated as a product-of-experts Model [4] or as a kl-hmm [5].
        1. Another method is probabilistic mapping
        2. The distribution of the target phoneme is represented by a feature space ..... There's no translation back.
        3. The two practical examples are:
          1. Generate expert Models
          2. KL-HMM model
      Here, the source languages is viewed as defining a low-dimensional subspace in which to es-timate target language mod Els.
        1. Source language is considered to be a sub-space that defines a lower dimension to estimate the model of the target language.
    3. This was the motivation behind the work of [6], where a subspace GMM (SGMM) was used, in which the source languages define a SubSpace of full covariance gaussians.
      1. This is inspired by the work of "6", using a subspace of GMM
      2. The source language defines a full covariance Goss space.
      3. There are a lot of mathematical knowledge involved here.

"Cross-lingual adaptation with multi-task adaptive Networks" (1)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.