Generate a Confrontation network (Gans) The latest genealogy: For you to uncover Gans's past and present life

Source: Internet
Author: User
Tags abs generator numeric numeric value pytorch

Author: Guimperarnau

compile: Katherinehou, Katrineren, Shanliu, da jie June, Chantianbei

The generation of the Confrontation Network (GAN) once proposed on the scenery Unlimited, is yannlecun as "Ten years of machine learning field the most interesting ideas."

The idea of Gan "left and right" is almost universally known, but as with the convolution Neural Network (CNN), the development of Gan has evolved into many different forms.

today, digest bacteria to take stock of the Gan family of the various members of the characteristics.

their list is as follows:

1.DCGANs

2.ImprovedDCGANs

3.ConditionalGANs

4.InfoGANs

5.WassersteinGANs

6.ImprovedWGANs

7.BEGANs

8.ProGANs

9.CycleGANs

Note that this article will not contain the following

• Complex technical analysis

• Code (but with code links)

• Detailed list of studies

(You can click on the link below https://github.com/zhangqianhui/AdversarialNetsPapers)

Want to know more Gans related content can also leave a message to digest bacteria Oh ~

Introduction to Gans

If you are familiar with Gans, you can skip this part of the story.

Gans was first proposed by Iangoodfellow, consisting of two networks, a generator and a discriminator . They train at the same time and compete in the minimization of maximum algorithms (Minimax). The generator is trained to deceive the discriminator to produce lifelike images, and the discriminator learns not to be fooled by the generator in training.

A survey of the principles of Gan training

First, the generator generates an image by extracting a noise vector z from a simple distribution (such as a normal distribution) and an ascending sampling (upsample) vector. In the initial cycle, these images look very noisy. The discriminator then gets the fake images and learns to identify them. The generator then receives feedback from the discriminator via the reverse-propagation algorithm (backpropagation), and is gradually doing better when generating the image. We finally want the pseudo image distribution to be as close to the true image as possible. Or, simply put, we want the pseudo image to look as real as possible.

It is worth mentioning that, because the Gans is optimized by the MINIMAX algorithm, the training process can be very unstable. But you can use some "tips" to get a more robust training process.

In the following video, you can see the training evolution of the images generated by Gans.

Code

If you are interested in the basic implementation of Gans, you can see the link to the code:

TensorFlow (HTTPS://GITHUB.COM/ERICJANG/GENADV_TUTORIAL/BLOB/MASTER/GENADV1.IPYNB)

Torch and Python (pytorch) (https://github.com/devnag/pytorch-generative-adversarial-networks;https://medium.com/@ DEVNAG/GENERATIVE-ADVERSARIAL-NETWORKS-GANS-IN-50-LINES-OF-CODE-PYTORCH-E81B79659E3F)

Torch and Lua (https://github.com/lopezpaz/metal)

Although these are not the forefront of the content, but they are very helpful to grasp the idea.

Next I will describe some of the progress and types of Gans that have occurred in recent years in a rough chronological order.

Deep convolution generation against network (Deepconvolutionalgans,dcgans)

Dcgans is the earliest important development of GAN structure. Dcgans is more stable in terms of training and producing higher-quality samples.

Thesis Link: https://arxiv.org/abs/1511.06434

Dcgan's authors focus on promoting the frame structure of the first generation of Gan. They found:

The normality of batches (Batch) must be done in both networks.

A completely hidden connection layer is not a good idea.

Avoid pooling, just convolution

It is useful to modify the linear unit (Relu) activation function.

Dcgans is still being brought up so far, as they are one of the main benchmarks for the practice and use of Gans.

Shortly after the publication of this paper, different implementations are available in theano,torch,tensorflow and Chainer , which can be tested on any dataset that you are interested in. So, if you're having a weird dataset generated after that, you can totally blame these guys.

The Dcgans usage scenario is as follows:

You want to perform better than the basic Gans (this is necessary). The base Gans is suitable for simple datasets, but Dcgans is much stronger than this.

You are looking for a solid baseline method to compare your new GAN algorithm.

From now on, unless specifically stated, the types of all the Gans that I will describe next are assumed to have a dcgan structure.

Enhanced depth convolution generation against network (Improveddcgans)

a series of techniques for upgrading previous Dcgan. For example, this upgraded baseline method can generate better high-resolution images.

Thesis Link: https://arxiv.org/abs/1606.03498

One of the main problems of Gans is convergence. Convergence is not certain, and although the Dcgan is structured, the training process can be very unstable.

In this paper, the authors propose different enhancement methods for the GAN training process. Here are some of the following:

Feature matching: They propose a new objective function, rather than having the generator deceive the discriminator as much as possible. This objective function requires the data generated by the generator to match the statistics of the real data. In this case, the discriminator is used only to specify which statistics are worth matching.

Historical average: When updating parameter values, take into account their past values as well.

Single-sided label smoothing: This one is very simple: just convert your discriminator's target output value from [0 = False image, 1 = True image] to [0 = False image, 0.9 = True image]. Yes, this will improve the training effect.

Virtual Batch regularization: Avoid relying on data from the same lot by using statistics calculated from other batches. This is a very expensive calculation, so it is only used in generators.

All of these techniques make the model perform better when generating high-resolution images, which is one of the Gans weaknesses.

as a contrast , look at the difference in performance between the original Dcgan and the promoted Dcgan on the 128x128 image:

These are all pictures of dogs. As you can see, Dcgan behaves badly, and with Improveddcgan you see at least some of the features that contain the dog. This also illustrates another limitation of Gans--the formation of structural content.

The Improveddcgans use scenario is as follows

Generate higher-resolution images

Conditional-generation Warfare Network (Conditionalgans,cgans)

conditional Generation vs. the network uses additional label information to generate higher-quality images and to control the rendering of pictures.

Thesis Link: https://arxiv.org/abs/1411.1784

Cgans is an extension of the GAN framework. We use conditional information y to describe some of the characteristics of the data. Let's say we're dealing with facial images, and y can be used to describe hair color or gender. These properties are then inserted into the generator and the authenticator.

A conditional generation network using facial feature information as shown above

There are two interesting areas of conditional generation against networks:

1. As you continue to provide more information to the model, Gan learns to explore this information and then produce a better sample.

2, we use two methods to control the rendering of the picture, in the absence of Cgan all picture information using Z-code. Under Cgan, we add the condition information y, so z and y encode the different information.

For example, let's say y encodes the handwritten number 0-9. Z encodes other variables that can be numeric in style such as size, thickness, rotation angle, and so on. )

Mnist (mixednationalinstituteofstandardsandtechnologydatabase, Simple Machine vision DataSet) The difference between Z and y in a sample is as shown above. Z is the row, Y is the column; Z is the style of the number, Y is encoding the number itself.

Recent research Results

In this field there are a lot of interesting articles, I introduce 2:

1, use the generation to fight the network to learn to draw at a specified location

(thesis: https://arxiv.org/abs/1610.02454; code: HTTPS://GITHUB.COM/REEDSCOT/NIPS2016): In this paper, the author designed a method described in this article to tell Gan what to draw, Use boxes and markers to tell the location of Gan painting body. The following illustration:

Stack type gan

(Original: https://arxiv.org/abs/1612.03242; code: Https://github.com/hanzhanggit/StackGAN)

This article is similar to the previous one, where the author uses 2 Gan networks (phase 1 and Phase 2) to improve the quality of the picture. Phase 1th is used to obtain low-resolution images that contain the "basic" concept of the picture. Phase 2nd uses more detail and higher resolution to refine the 1th phase of the picture.

As far as I know, this article is one of the best models for producing high-quality pictures, please look at the image below.

The usage scenario for the conditional generation network is as follows:

1, you have a labeled training set, want to improve the quality of the resulting picture

2, you want to the image of some of the characteristics of the fine control, such as a set of locations to generate a certain size of a red bird.

Maximum Information generation countermeasure network (Infogans)

Gans can encode part of a meaningful image feature of the noise vector z in unsupervised mode. such as the rotational encoding of a number.

Thesis Link: https://arxiv.org/abs/1606.03657

Have you ever thought about what information is encoded in the input noise z in gan? In general, it uses a very "noisy" way to encode different types of features of a picture. For example, take a position of the z-vector and insert a value between 1 and 1. This is the training model for the Mnist dataset shown in the following figure.

The image on the left is when the z interpolation is 1, and the lower right is interpolated to 1.

In the image above, the resulting picture appears to be slowly turning from 4 to "Y" (much like a mixture of 4 and 9).

That's what I said before. Use a "noisy" way to encode: a position in Z is a parameter of multiple features of the image.

In this case, the position changes the number itself (to some extent, from 4 to 9) and his style (from bold to italic).

However, you cannot define the exact meaning of Z's position.

If you use some position of Z to represent the only and restricted information, just like the conditional information y in Cgan.

For example, the first position is the numeric value between 0-9 to control the number, and the second position to control the rotation of the number, which is what the author wants to say.

Interestingly, unlike Cgans, they do not need tag information to implement unsupervised methods.

they do this by splitting the z vector into two parts: C and Z .

C encode the semantic features of the data distribution

Z encodes all unstructured noises distributed

How to encode these features in C.

By changing the loss function to avoid c being simply ignored by Gan. So they used an information theory regularization to ensure that the interaction between C and generator distributions [Z1] (mutualinformation).

In other words, if C changes, the resulting image will change. This causes you to be unable to explicitly control what type of information will be introduced into C. But every location in C has a unique meaning.

As shown in the following illustration:

The first position of C encodes the category of the number, and the second position encodes the direction of rotation.

However, the price of not using tag information is that these encodings are valid only for very simple datasets, such as the numbers in the Mnist library.

And you also need to manually set each location in C. For example, the author of the article needs to define the first position of C as an integer of 0-9 to correspond to the 10 class number of the DataSet. You would think that this is not without supervision, because you need to manually give the model some details.

you may need to use the Infogans scene as follows:

1, the dataset is not too complex

2, you want to train a Cgan model but missing tag information

3, you want to know the data set of the main meaningful image features and control them

Build-up Warfare network (Wassersteingans)

Modify the loss function to introduce Wasserstein distance, so the loss function of Wassgans is associated with the image quality. At the same time, the stability of training is improved rather than dependent on the structure.

Thesis Link: https://arxiv.org/abs/1701.07875

Gans often have convergence problems, so you don't know when to stop training. In other words, loss functions have nothing to do with image quality, which is a big problem.

Because:

You need to constantly look at the samples to confirm that the model training is correct.

You don't know when to stop training because you don't converge.

And there's no numerical indication of how well you can adjust the parameters.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.