Deep convolution against Generation network (Dcgan)

Last Update:2016-12-26 Source: Internet

Author: User

Tags svm generative adversarial networks

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article is a paper note for reference [1].

convolutional Neural Network has a good performance in all tasks of supervised learning, but it is less in unsupervised learning field. The algorithm presented in this paper will have a combination of CNN in supervised learning and Gan in unsupervised learning.

Under the non-CNN condition, Lapgan has achieved good results in the field of image resolution enhancement.

Instead of seeing this article as an extension of CNN, consider it as an extension of Gan to the CNN field. and Gan basic algorithm, can refer to the anti-neural network .

Gan does not need specific cost function advantages and learning process can learn a good feature representation, but Gan training is very unstable, often causes the generator to produce meaningless output. And the contribution of the paper is:

Set a series of restrictions on the network topology of CNN so that it can be trained stably.
Using the obtained characteristic representation to classify the image, the better effect is obtained to verify the expressive ability of the generated image feature representation.
The filter studied by Gan was analyzed qualitatively.
Shows the vector calculation characteristics of the generated feature representation.

Model structure

The following changes need to be made to the model structure:

Replace the pooling layer convolutions with convolutions generator in fractional-strided instead of strided convolutions on discriminator.
Use Batchnorm on both generator and discriminator.
- Solve the problem of poor initialization
- Help gradients propagate to each layer
- Prevents the generator from converging all the samples to the same point.
- Applying bn directly to all layers results in sample oscillation and model instability, which can be prevented by not using bn in the generator output layer and the discriminator input layer.
Remove the full connection layer
- Global pooling increases the stability of the model, but it hurts the convergence rate.
In generator, all layers except the output layer use Relu, and the output layer uses Tanh.
Use Leakyrelu on all layers of the discriminator.

Dcgan's generator network structure:

Among them, the conv layer here is four fractionally-strided convolution, in other paper may also be called deconvolution.

Training details

Preprocessing link, the image scale to tanh [-1, 1].
Mini-batch training, batch size is 128.
All parameters are initialized by a normal distribution (0, 0.02)
The slope of the Leakyrelu is 0.2.
Although the previous Gan used momentum to speed up the training, Dcgan used the Adam optimizer to tune the parameters.
Learning rate=0.0002
Reduce the momentum parameter beta from 0.9 to 0.5来 to prevent shocks and instability.

Lsun

After a cycle of training (online learning) and a convergent model, the results are as follows:

This shows that Dcgan does not generate/cross-fit high-quality images through memory training data.

Dcgan Capabilities Verification

In order to verify the validity of the characteristic representation of Dcgan, the feature representation is input into the L2-SVM, and the classification results are compared with other unsupervised learning algorithms.

To do this, use all of the CNN features of all layers as input using the generator trained on the imagenet-1k, dropping each layer's CNN feature using Max-pooling to 4x4, then unfolding to form a 28672-dimensional vector, Input into the L2-SVM.

Comparison of effects on mnist datasets:

Comparison on SVNH datasets:

Roaming hidden Space

Explore how hidden space affects the generation of final images by slowly adjusting the initial vectors. In this way, you can explore how the image features are folded into the hidden space, and you can tell whether the images are actually learning the semantic features or just remembering the pictures (if there are sharp changes).

From, you can see some gradual changes, such as the sixth line, gradually have a window. In line four, the TV fades away.

Discriminator Filter

By analyzing the filter, we can see that in the study of the characteristics of the house, Gan did learn the characteristics of bed, window and so on.

On the left is the random filter, the right side is the learned filter, visible, the right side of the filter is still meaningful.

Semantic Mask

In the hidden space, assuming that you know which variables control an object, then the stiffness of the variables in the block will be able to create a picture of an object disappear?

The experiment in the paper is this: first, generate 150 pictures, including windows and windowless, and then use a logical bottom regression function to classify, for the weight of not 0 characteristics, think it and the window. Block it and get a new generated image.

Vector arithmetic

Similar to Word2vec, does the image have similar characteristics and can be added and subtracted in the hidden space to get a new image?

Experiments show that the use of a single image is not stable, using three pictures will be more stable.

You can see that the single picture is not stable, and three pictures can learn the features such as expression and sunglasses.

What's more, a stable vector can be learned to perform some kind of transformation, for example, azimuth transformation.

Summarize

The main contribution of this paper seems to be simple, but in fact the workload is very large, fully demonstrate the author's outstanding skill in the reference Dafa.

But I think, the greater contribution lies in the author's research on the effect, the generation model is difficult to distinguish between good and bad, and this paper through the exploration of hidden space, analysis of the network, comparative features such as the performance of a series of means, proved that the Dcgan algorithm is really a powerful algorithm.

Reference

[1]. Unsupervised representations learning with deep convolutional generative adversarial Networks

Deep convolution against Generation network (Dcgan)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More