The migration model of deep learning

Last Update:2018-08-22 Source: Internet

Author: User

Tags generator

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The theme report of "Transfer model of deep learning" shorthand and commentary (iv) Bai Chu of the Red bean Family concern 2017.11.04 22:33* 3275 reading 141 comments 0 like 0

The author presses: machine learning is moving towards a new era of interpretive models based on "semantics". Migration learning is likely to take over the glories of today's oversight learning and significantly enhance the explanatory and adaptive nature of the model: not only in areas with large data, but also in new areas with small data. Profile

The Ccai Sponsor provides a slides download link. The report is divided into four main parts, and the first three parts are summarized and reviewed in this paper.
1. The advantages of migration learning
2. Deep Migration Learning Solutions
3. Three types of migration learning paradigm
4. Migration Learning Application Case 1. The advantages of migration learning

The report concludes with three keywords: small data, reliability, and personalization, the advantages I have summed up in one word-"adaptability".
1. Adapting to small data: migration learning can migrate large data-trained learners into areas with little data. Drop "Big Data help small data" to the ground.
is not very familiar with. In fact, the report of the Dr.fei Sha also pays attention to the problem of "small data Learning", and uses different means of implementation, the latter using multi-task learning, field adaptation and 0 sample learning.
2. Improve reliability: The models trained in migration learning are adaptable and can be migrated to multiple domains without significant performance degradation.
3. Personalized to meet: In fact, is also the embodiment of adaptability. For example, each person's personalized sample may be small data, but on the basis of the migration learning model which is trained by the large data, it can meet the needs of individuation very well.

Since the transfer of learning so good, the most likely to take the "supervision of learning" baton, driving the next round of machine learning boom. Of course, supervised learning will continue to explore new areas, but not supervised learning and reinforcement of learning of course not to be outdone.
DTL-1

Well, before giving a solution, let's be clear about the problem: what is the difficulty of migration learning.
The answer to the report is: Find invariants.
What is invariant?
1. First of all, in the first picture below gives an example of driving a car under different conditions, whether the pilot or the autopilot, need to find out under different conditions, "driving the car" common knowledge (that is, invariant), so as to quickly adapt to new conditions.
2. If extended to the general situation, the second diagram below describes the status of invariant or generic knowledge (knowledge) in migration learning, which is a bridge between the source domain and the target domain learning model (model).
DTL-3
DTL-2

Next, Dr. Yangqiang his understanding and modeling of invariants, which leads to the "deep Learning + migration Learning" solution proposed by their research team. 2. "Deep Learning + migration Learning" solution

Dr Yangqiang that the essence of migration learning is knowledge migration, which can be transformed into parametric migration, which can be a weighted value in the (depth) neural network. 2.1 Pioneer Work

Reference to Dan (Deep adaptation network) [ICML 2015], the earlier in-depth migration learning work drew three main conclusions, as shown in the following figure. The specific quantitative analysis uses the maximum mean difference maximum Mean discrepancy (MMD).
1. Frozen shallow layer (frozen shallow-layer): The shallow layer weights are shared in many fields, and the shallow weight value is a common knowledge with strong adaptability.
2. Distinguish the deep (distinguish Deep-layer for different Domains): think that the deep weights represent the domain knowledge, so the distance between many fields is depicted.
3. Fine-tuning the middle layer (fine-tune intermediate-layer): between shallow common knowledge and deep domain knowledge, by fine-tuning weights to meet the needs of the corresponding field.
DTL-4

2.2 Rethinking the loss of migration (Loss)

After reviewing a series of pioneering work on the quantitative analysis of the depth migration model, the report gives a more in-depth thought on the loss of quantitative transfer learning.
DTL-5

If a universally applicable loss (Loss) is used for quantification, the loss of migration learning in the target domain can be decomposed into two parts: source domain classification loss (sources domain classification Loss) and domain distance loss (domain distance Loss). The quantification of the classification loss in the source domain is a traditional problem, and the distance loss is a new problem, the understanding of this problem not only determines the understanding of the essence of the transfer learning, but also determines the thinking of the quantitative analysis of migration learning, and obtains the different transfer learning paradigm. The report summarizes the three main understandings and corresponding paradigms of existing work on distance loss between domains, as shown in the following illustration:
1. Variance loss (discrepancy loss) and variance based paradigm: direct measurement and minimization of differences between two domains.
2. Combat loss (adversarial loss) and adversarial paradigm: build common feature space based on adversarial objectives (adversarial objective), design domain discriminator.
3. Reconstruction losses (reconstruction loss) and shared paradigm: combining unsupervised learning and supervised learning, building a shared domain between source and target domains (intermediate domain), which carries knowledge migration as an intermediary.
DTL-6

The following section, respectively, describes the above three types of migration learning paradigm. Note: The author does not follow the order of the report. 3. The paradigm of three kinds of migration learning 3.1 based on the difference paradigm

To answer three questions based on the difference paradigm:
1. What is the object of measurement. Edge distribution, joint distribution, and so on.
2. What is the metric? MDD, multicore MMD (MK-MMD), federated Distribution Differences (JDD), and so on.
3. Where to adjust the parameters in the depth neural network. One of the layers or some layers.

The report then gives examples of migration learning and multimodal migration learning in computer vision, natural language processing, and emphasizes the importance of regularization, which can be seen in [Slides] 's 22-25 pages. 3.2 A paradigm based on confrontation

This paradigm is based on the generation of the Combat Network (GAN) Design migration Learning (see here for specific). The basic idea is to design and train two sets of learners: Generator G and classifier D, based on G to generate new samples, based on D-resolution to generate sample quality, through the game between G and D to enhance the performance of G and D. In this paper, two kinds of ideas in the report are as follows: Based on the domain positive inverse mapping and the common feature extraction method. 3.2.1 Based on domain positive inverse mapping

The report, represented by Cyclegan [ICCV 2017], describes the first line of thought:
1. Cross-domain feature space and mapping: the mapping of source domain to target domain is modeled as generator g, and a reverse mapping f is also corresponding.
2. Build a confrontation goal: design and train the classifier in the target domain to detect the transfer learning loss from the source domain to the target domain. The interesting thing about Cyclegan is that it also designs reverse migrations and classifiers from the target domain to the source domain. If the source field is migrated from the Target field, the resulting sample can still achieve better results, which means that the migration effect is very good.
4. Migration Learning Loss metrics: includes the resistance loss (adversarial loss) generated by forward mapping and reverse mapping, and loop consistency loss (cycle-consistency loss). The specific process is shown in the following figure.
The idea of DTL-7 3.2.2 based on common feature extraction

The report describes the second idea as a confrontation field adaptation (adversarial domain adaptation) for the Representative [JMLR 2016]:
1. Dual-Branch Confrontation architecture: The first branch includes feature extractor (feature extractor) and tag predictor (label predictor), which form a standard feedforward structure. The second branch shares the feature extractor of the first branch and accesses a domain classifier (domain classifier) via a gradient reverse layer (gradient reversal layer), so the key is that the gradient is multiplied by a negative constant in reverse BP training, Thus introducing confrontation.
2. Confrontation Training effect: If the gradient reverse layer is not added, the dual-branch architecture is a standard multifunctional network, while reducing the loss of label prediction and domain classification. But because the architecture adds a gradient inverter for the confrontation, the training result is "even if there is a difference in two areas, the dual-branch counter network is almost identical to the output of the feature extractor". As a result, the feature extractor (feature extractor) outputs a domain-independent invariant.

The specific schema is shown in the following figure.
! [DTL-8]
(https://gitlab.com/zb14zb14/blog_attachment/raw/cc6a42171e5f477db16a1e132cef287aefcb317b/Pictures/Deep% 20transfer%20learning/dtl-8. JPG) 3.3 Based on shared paradigm

On the basis of shared paradigm, Dr. Yangqiang the work of his laboratory [KDD 2015] in more detail, the author added [ECCV 2016] 3.3.1 migration based on intermediate domain

The basic architecture is shown in the following figure, including the input and output definitions, as well as the intermediate domain selection and knowledge migration two steps, see [KDD 2015].
1. Input: source domain, target domain, and candidate intermediate areas
2. Output: Forecast results in target areas
3. Intermediate domain Selection: The candidate and selection strategy in the intermediate field is related to the problem. For example, the source domain is the image data, the Target field is the text data, then the candidate intermediate domain may be the text, the image and the two mixed fields. One possible approach is to crawl partially labeled images on Flickr as an intermediary, helping the source domain migrate to the target domain.
4. Knowledge Migration: There is still a distribution offset (distribution shift) in the selected intermediate domain compared to the source and target areas, so [KDD 2015] proposes an operable nonnegative matrix decomposition method that can be used to achieve feature clustering (feature Clustering) and label transfer (label propagation). This method can eliminate the distribution deviation in the intermediate domain and establish the transfer relationship between the source domain feature and the target domain output.
DTL-11 3.3.2 Migration based on shared features

The sharing in [KDD 2015] can be considered to be the migration relationship between the source domain and the target domain from the shared data in the intermediate domain, and finally obtain the appropriate parameter quality. In contrast, the shares in [ECCV 2017] can be considered to be directly based on shared parameters in the middle domain, mining the migration relationship, and improving the quality of shared parameters on the basis of refactoring data, as shown in the following figure.
The architecture includes two branches: one is based on supervised learning of the label prediction, the other is based on unsupervised learning of data refactoring, and two branches sharing parameters. Although this dual-branch architecture is similar to the dual-branch confrontation architecture, there is a fundamental difference between the former based on shared thinking and the latter based on the idea of confrontation.
Summary of DTL-10 3.4

The report is very informative, here is a brief review of the three metric loss (Loss) ideas and corresponding migration learning paradigms: a paradigm based on difference, a paradigm based on confrontation, and a paradigm based on sharing. If you want to use a simpler language to understand the various areas of the above three paradigms in the interpretation of the relationship, perhaps can be called "differences in the same", "the same in the difference between" and ... "find some intermediary." 4. Migration Learning Application Cases

The Forth part introduces several application cases made by Dr. Yangqiang's students:
1. Large consumer finance and cross-domain public opinion analysis
2. Internet Car Classification

Ccai public text has been elaborated, in addition to the 36-45 pages of slides, please refer to and consult Dr. Yangqiang's specific thesis. Main reference documents

[ICML 2015] Learning transferable features with deep adaptation networks
[KDD 2015] Transitive Transfer Learning
[JMLR 2016] Domain-adversarial Training of Neural networks
[ECCV 2016] Deep reconstruction-classification networks for unsupervised domain adaptation
[ICCV 2017] unpaired image-to-image translation using Cycle-consistent adversarial

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More