NOTES: Unsupervised domain adaptation by backpropagation

Last Update:2016-03-31 Source: Internet

Author: User

Tags generative adversarial networks

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article is an effort to address domain adaptation applications in conjunction with the Anti-network framework and deep learning technologies. In particular, three parts are trained in this framework: one is feature extractor, which is used to extract features, usually composed of convolutional layers and pooling layers, and the other is the label classifier, using the full join Layer + the logistic classifier The third in the general classifier does not appear, that is, and feature extractor form a network framework against the classifier domain classifier, it is also a classifier, by the full-join layer + cross-entropy classifier composition. The activation function of the full join layer is the Relu function. To the antibody now for the domain classifier loss in the training phase two opposite requirements. Specifically: For domain adaptation applications, we want the Web-learned features to represent a domain-invariant (domain invariant) feature, which requires that dimain classifier not be able to properly classify domains, which requires domain The classifier loss is the largest; on the other hand, when training for domain classifier, we must ask the classifier to classify it as correctly as possible, that is, the domain classifier has the lowest classification loss. This confrontational framework first appeared in Goodfellow's article generative adversarial networks, which was aimed at the application of image generation, in order to train a generation model to learn the distribution of samples, In the framework, a discriminant model is introduced to distinguish whether a sample is generated from a model or is derived from a real distribution, and the framework of the article is interesting to look closely at. The framework for this article is given below.

Where the green part is feature extractor; The blue part is the label classifier; the red part is domain classifier
The following two aspects of the model and the optimization algorithm to introduce this article.
First, the model
First, we introduce the structure of the model and its relationship.
Domain adaptation has two domains in its application: one contains a large number of tag information, called the source domain, and the other has only a small number or even no label, but contains the sample we want to predict, called the target domain. So, according to common sense, we can train the discriminant model through the general machine learning method in the source domain. However, due to the dataset bias on the source and target domains, this discriminant model cannot be ported directly to the target domain. How to migrate the discriminant model from the source domain to the target domain under the condition of minimizing the loss of discriminant model is the problem that domain adaptation to solve, also known as migration Learning (transfer learning). With regard to this problem, there are generally shared-classifier assumptions: If you can learn a common feature representation space on the source and target domains, then the discriminant model that the source domain feature learns can also be used in the feature of the target domain in this feature space. Therefore, the domain adaptation problem tends to be transformed into the problem of finding a public feature representation space, that is, the invariant feature of learning domains (domain invariant feature). This article is to use the framework of the anti-network to learn the invariant characteristics of the domain.
Specifically, if you learn to get a domain classifier, it can differentiate between different domains. The hypothesis of learning invariant features is that, in a well-trained domain classifier, if the characteristics of different domains cannot be distinguished on this classifier, that is, the classification loss of this classifier is very large, then this feature can be regarded as the invariant feature. An extreme example is if the source domain and the target domain are completely coincident in this space, then all domain classifier will fail in common sense, which is equivalent to the effect of a random classifier.
On the other hand, for label classification, we want to let the learned features have the label classification information as far as possible, that is, to minimize the classification loss of label classifier.
In fact, when training domain classifier, it is required to minimize the classification loss, and the requirement to obtain the invariant characteristics, which requires the maximization of the classification loss, which is a mutual confrontation requirement, can be expressed as follows:

which

Where Theta_f represents a feature extraction parameter, theta_y represents a label classifier classifier, Theta_d represents a parameter for domain classifier, l_y represents a label classifier classifier, L_ D represents the classifier for domain classifier. n represents the number of all samples, d_i represents the field label, and 0 represents the source domain.
The following describes how to optimize this function in the standard gradient descent method.
Second, optimize
For the problems in (2) and (3) above, you can update the network parameters in the following ways:

This differs from a fixed one in the network to update the other process, in a loop to update the network parameters at the same time. Where Mu is the speed of learning, lambda represents a hyper parameter. If you do not use lambda parameters, the author indicates that the characteristics of the training will be minimized by the domain classifier loss, which means that you cannot learn the invariant features.
To make the above expression conform to the standard direction propagation, the author defines an intermediate function, which has two unequal valence representations in the forward and reverse processes:

The corresponding loss function is expressed as:

This allows you to use the standard SGD method for reverse propagation.

NOTES: Unsupervised domain adaptation by backpropagation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More