Dry Goods | Existing work of generative adversarial Networks (GAN)

Source: Internet
Author: User
Tags nets generative adversarial networks
Dry Goods | Existing work of generative adversarial Networks (GAN)Original 2016-02-29 small S program Yuan Daily program of the Daily


What I want to share with you today is some of the work in image generation. These work are based on a large class of models, Generative adversarial Networks (GAN). From the model name can even see some development trajectory: Gan->cgan->lapgan->dcgan->gran->vaegan and so on. Therefore, in today's sharing, I will try to comb the paper and the relationship between the papers, the main difference. Papers to be covered are:

1. "Generative adversarial Nets"

2. "Conditional generative adversarial Nets"

3. Deep generative Image Models using a Laplacian pyramid of adversarial Networks

4. "Unsupervised representation Learning with Deep convolutional generative adversarial"

5. "Autoencoding beyond pixels using a learned similarity metric"

6. "Generating Images with recurrent adversarial Networks"


GAN

Since it is the image generation, it is necessary to consider generative models. The generative models that have emerged in the past two or three years are mainly attributable to two categories. One is based on variational Autoencoder (VAE), whose advantage is to change the method of reconstruction error based learning which is easy to fit in Autoencoder. The goal of learning is to change the nature of a predetermined priori distribution as much as possible. It can be seen, however, that this need to "assume a priori distribution" is still limited. The second type of generative models,generative adversarial Networks (GAN) avoided the problem better. GAN[1] Inspired by the Nash equilibrium in game theory, it contains a pair of models--a generative model (G) and a discriminant model (D). In a classic analogy, the model (G) is like a thief, as much as possible to improve his means of stealing to deceive as a police discriminant model (d), and D also need to train their own eyes as far as possible to prevent being deceived. So, the real learning process becomes a competitive process between a generative model (g) and a discriminant model (d), which is randomly taken from a real sample and a "False sample" generated by a generation model (g), allowing the discriminant model (d) to determine whether it is true. So, embodied in the formula, is the following such a Minmax form.


As can be seen from the above description, GAN's competitive approach no longer requires a hypothetical data distribution, which means we do not have to formulate p (x), but directly to the sampling, so that the theory can fully approximate the real data. This is also the biggest advantage of GAN.

However, everything has a price. The disadvantage of this way of not requiring a prior modeling is that it is too free, and for larger pictures, more pixel situations, the simple GAN approach is less manageable. In Gan[1], the update process for each learning parameter is set to D update K-back, and G is updated 1 times, also for similar considerations.


Cgan


In order to solve the problem that Gan is too free, a very natural idea is to add a little restraint to Gan, so there is Conditional generative adversarial Nets (Cgan) [2]. The improvement of this work is very straightforward, which is to add conditional variable Y to the modeling of D and G respectively. Later, this approach proved to be very effective.


Lapgan

On the other hand, in order to improve the freedom of Gan, there is another idea is not to let Gan complete all tasks at once, but a part of a generation, multiple times to produce a complete picture. Sounds familiar. Yes, that was the thought of a job DRAW the DeepMind fire last year. DRAW pointed out that we humans in the completion of a picture is not necessarily a complete, why do we ask the machine can do it. So DRAW used a model of sequential VAE, let the Machine a little "write" out a number. So Lapgan[3], which Facebook and others have put forward, is using the idea to make improvements on GAN basis. LAPGAN[3] This job has both project page and open source code, and it's a great job to focus on.


In the way of implementing sequential version, Lapgan[3] This work is a way of Laplacian pyramid decades ago, and therefore named Lapgan.

The main operation of this way is downsample and upsample, and the advantage is that each time only consider the sample and generate the residual difference between the learning effect, to a certain extent and residual network thought is the same. The approximation and learning of residuals are relatively easier. So, in this thought, there is the following Lapgan learning process:


In this figure, when the image is a larger pixel, the Laplacian pyramid process is required, and in each process step (each pyramid level), the D is transmitted only to the compare of the residuals. On the other hand, when the pixel is small enough, that is, the rightmost step, the process of upsample and downsample is no longer required, and the transfer to D is an unprocessed sample and a generated image. Facebook points out that such a sequential approach reduces the content of every Gan needed to learn, and thus increases the ability of Gan to learn. It is noteworthy that Lapgan in fact is also Lapcgan, are conditional. In addition, each step of GAN is independently trained. At the same time, the paper concludes with a number of engineering experiences, all in their project page.


Dcgan

Dcgan [4] This paper appears to be not very innovative, but in fact its open source code is now used and borrowed the most frequency. All this must be attributed to the more robust engineering experience shared by this work than Lapgan. In other words, dcgan,deep convolutional generative adversarial Networks, the work [4], points to many of the architectural designs that are important to GAN's precarious learning style and the specific experience of the network for CNN. Point of view:


For example, they suggest that since the previous strided convolutional networks can theoretically achieve the same functions and effects as the pooling CNN, then strided convolutional as a The generator G of fully differentiable is more controllable and more stable in GAN. For example, it was stated in Facebook's Lapgan that Batch normalization (BN) was used in the D of GAN to cause the whole learning collapse, but in Dcgan the BN was successfully used on G and D. These engineering breakthroughs are undoubtedly the main reasons why more people choose to dcgan this work as base.


On the other hand, they also have many contributions to visualize generative models. For example, they learned the way of interpolate space in the ICLR 2016 paper "Generating sentences from a continuous", which shows the hidden states in the generated picture. , we can see the gradual evolution of the image process.


At the same time, they also say that Vector arithmetic are used in images and have some of the following results:


GRAN

The penultimate paper to be recommended today [5] also has a lot of similarities with DRAW. It has been said that sequential version can be considered when improving GAN. The advantage of sequential models is that the next model can make use of the results from the previous step and make changes in the previous results, similar to a conditional way. In order for Gan to have this sequential ability, this paper [5] will combine Gan and lstm, called GRAN, to divide it into the process of step by step. Each step, like the Lstm cell, has a c_t, which is determined by what to draw, which is the content of the painting, as well as the output of each step, as in lstm hidden. , there are also h_{c,t}. Unlike the simple lstm, it is decided that each cell content, which is not just hidden states, but also a "hidden of prior", is a priori, h_z, which belongs to the generative model G in GAN. H_z and H_{c,t} splicing (concatenate) together determine the update--what to draw of the current step.


And, because the perfect use of the gradient of convolution or the nature of the convolution, the reformed GRAN can take each derivative as a decoding process, and each convolution and other operations become encoding Process, you can therefore correspond to the decoder and encoder parts of the DRAW. and Gan and DRAW the biggest difference is that Gan in the calculation of loss in hidden space, and DRAW is in the original input space.

In the experiment, the paper does prove that the sequential models (multi-step) produces a better image than single step. However, the evaluation of generative models is still vague, so the experimental results are not very good compared with previous GAN or related methods. The next point is that one of the innovations in this paper is to propose a method for the evaluation of GAN, a special generative models. Methods used to evaluate generated image Parzen Windows or manual evaluation have their own drawbacks. This paper suggests that, since it is a competitive model, two groups of GAN can be "competitive" in the evaluation. Each other for the judges, each other for the contestants. The following map is also very meng ...

Finally, to say that the shortcomings of the model at this stage to see the scalability is not so good, although it is a step by step, in the final experiment with only a few discrete step to do the experiment, [1,3,5].

Vaegan

It's all about reducing the freedom of GAN by becoming sequential versions. But there is also a work [6], is "the opposite", the Gan Secondary school out of the feature used in the VAE reconstruction objective, thus combining the advantages of Gan and VAE. So, this job is called Vaegan.


Specific, because in the previous reconstruction objective, all use is element-wise distance metrics. This kind of metrics actually is not good for many hidden feature/space study. The fundamental idea of this paper is to use discriminator D in GAN as a learned similarity measure to replace/make up for this reconstruction me in objective similarity Asure component. D learning the similarity measure can be considered as a measurement in high-level representation. It can be seen that the idea is still very scalable.



Today's paper all introduced, in fact, based on the game theory of competition, but also the development of adversarial autoencoder and other work, but the idea of the same, we are interested in their own to read again.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.