NOTES: Deep transfer network:unsupervised domain adaptation

Source: Internet
Author: User

This article presents a migration network that uses MMD to constrain the marginal distribution and conditional distribution on two domains at the same time.
Specifically, MMD is used to constrain the extracted feature distributions on two domains (source and target domains), so that the distribution of the features on the two domains is the same as possible, the distribution is called marginal distribution, and the SOFTMAX classification results on two domains are constrained by MMD , so that the distribution of the two classification results is as similar as possible, this distribution is called conditional distribution. These two aspects should be and this article long M, Wang J, Ding G, et al. Transfer Feature learning with joint distribution adaptation[c]//proceedings of The IEEE international Conference on computer Vision. The idea of 2013:2200-2207 is basically the same, except that this article uses a traditional approach, where deep learning is used.
As this article does not use the standard convolutional network structure, so the final result and the previous deep learning method is not comparable, the last part of the experiment is also used in the traditional shallow layer method comparison, not with the latest based on deep learning method.
The network structure of the article method is represented in the text as follows:

The first is the feature extraction layer, that is, the first l-1 layer in the diagram, the last layer is the classification layer, the output is the probability of each class. In this paper, the output of the L-1 layer and the classifier is measured based on the MMD distribution loss of the source and target domains. In the feature distribution, the distribution difference on two domains is measured by the marginal mmd in the objective function, as follows:

The H (L-1) represents the output of the L-1 layer of the network layer respectively.
Adding a MMD loss to the output layer of the classifier makes the two domains as consistent as possible on the conditional distribution (conditional distribution). Define the following conditional MMD:

where q is a vector that corresponds to all the output of a class.
Finally, with the standard classification loss in the network structure, the objective function of the whole network is given as follows:

which

Then this paper gives a gradient-based optimization method for this objective function:

The following three points need to be noted:
1. If the entire sample is used in each gradient method, the efficiency will be very low when the number of samples is large, so the gradient method based on Mini-batch is adopted in this paper.
2. When the minibatch structure randomly extracts half of the source domain and half of the sample on the target domain, the sample number of the two sample sets is changed by the sample copy method, because the sample number in two domains is different.
3. When constructing conditional MMD, it is necessary to use the label information of the source domain and target domain, because there is no ground-truth on the target domain, this label is predicted by a simple classifier. This network is the currently trained network, and as each network update ends, the label is updated until it converges or reaches the maximum number of cycles.
Analysis: The three parts of objective function in this paper: Standard classification loss, marginal MMD loss based on feature distribution, conditional MMD loss based on classification result, the first two parts are more common in the network of constructing domain invariant feature. The third loss is to measure the difference between the vector distributions of each type of output, and the smaller the difference, the more similar the conditional distribution on two domains. For the third loss here is a simple comparison of the relationship between it and the soft label loss mentioned in the simultaneous deep Transfer Across Domains and Tasks article.
Another comparison between the conditional MMD loss and the previous soft label structure: The Softlabel scenario is based on the monitoring information on the target domain, where possible to save the source domain on the target domain to learn the relationship between categories (relationship Between classes), the final loss is constructed with a cross-entropy (the more similar the two distributions, the smaller the cross-entropy loss of the two). This document is based on the conditional MMD loss based on the Softmax output scalar, which requires that the samples on the source and target domains have the same distribution on the Softmax output on the same class. According to this idea, we need the monitoring information of the sample in the target domain, the method of the article is to construct a pseudo label on the target domain with a base classifier, and construct conditional MMD loss according to this label respectively.

NOTES: Deep transfer network:unsupervised domain adaptation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.