Deep learning Review Week 1:generative adversarial Nets

Source: Internet
Author: User
Tags image to text nets generative adversarial networks

Adit Deshpande

CS undergrad at UCLA (' 19)

Blog about Resumedeep Learning Review Week 1:generative adversarial Nets

starting this week, I'll be doing a new series called Deep learning the Review. Every couple weeks orso, I'll be summarizing and explaining the papers in specific subfie LDS of deep learning. this week I-ll begin with generative adversarial Networks.


According to Yann LeCun, "adversarial training is the coolest thing since sliced bread". I ' m inclined to believe so because I don ' t think sliced bread ever created this much buzz and excitement within the deep l Earning community. In this post, we'll be looking at 3 papers, which built on the pioneering work of Ian Goodfellow in 2014.

Quick Summary of Gans

               i briefly mentioned Ian Goodfellow ' s generative adversarial network paper in one of my prior blog posts, 9 deep learning Papers Should Know about.  The basic idea of these networks was so you had 2 models, a generative model and a Discriminat Ive model. The discriminative model has the task of determining whether a given image looks natural (an image from the dataset) or Lo OKs like it had been artificially created. The task of the generator is to create natural looking images this is similar to the original data distribution . This can is thought of as a zero-sum or minimax of the player game. The analogy used in the paper are that the generative model are like "a team of counterfeiters, trying to produce and use FA Ke currency "while the discriminative model was like" the police, trying to detect the counterfeit currency. The generator is trying to fool the DiscriminaTor while the discriminator are trying to not get fooled by the generator. As the models train through alternating optimization, both methods is improved until a point where the "counterfeits is Indistinguishable from the genuine articles ". 

Laplacian Pyramid of adversarial Networks


So, one of the most important uses of adversarial networks are the ability to create natural looking images after training The generator for a sufficient amount of time. These is some samples of what, the generator outputted in Goodfellow ' s paper.

As can see, the generator worked well with digits and faces, but it created very fuzzy and vague images when using the CIFAR-10 DataSet.

In order to fix this problem, Emily Denton, Soumith Chintala, Arthur Szlam, and Rob Fergus published the paper titled "Dee P generative Image Models using Lapalacian Pyramid of adversarial Networks ". The main contribution of the paper is a type of network architecture this produces high-quality generated images that Is mistaken for real images almost 40%of the time when assessed by human evaluators.


Before getting into the paper, let's think about the job of the generator in a GAN. It has to produce a large, complex, and natural image, which is good enough to convince a trained discriminator. Not such a easy task in one shot. The authors combat this is by using multiple CNN models to sequentially generate images in increasing scales. As Emily Denton said in Hertalk on Lapgans,

The approach of this paper are to build a Laplacian Pyramid of generative models. For those that aren ' t familiar, a Laplacian pyramid was basically an image representation that consists of a series of filt Ered images at successively sparser densities (more info for those interested). The idea is, each of the pyramid representation contains information, about the image at a particular scale. It is a sort of decomposition of the original image. 

Let's review what the inputs and outputs is of a simple GAN. The generator takes in an input noise vectors from a distribution and outputs an image. The discriminator takes in this image (or a real image from the training data) and outputs a scalar describing how "real" The image is. Now, let's look at a conditional GAN (Cgan). Everything remains the same, except that both the discriminator and the generator receive another piece of information as An input. This information was likely in the form of some sort of class label or another image.

Network Architecture

The authors propose a set of convnet models and that each layer of the pyramid would has a convnet associated with it. The change is the traditional GAN structure are that instead of have just one generator CNN that creates the whole image, We have a series of cnns that create the imagesequentiallyBy slowly increasing the resolution (aka going along the pyramid) and refining images in a coarse to fine fashion. Each level have its own CNN and are trained on the other components. One is a low resolution image and the other are a noise vector (which was, the only input in traditional Gans). This is where the idea of Cgans come to play as there is multiple inputs. The output would be a generated image, then upsampled, and used as input to the next level of the pyramid. This method was effective because the generators in each level was able to utilize information from different resolutions I n order to create more finely grained outputs in the successive layers.

Generative adversarial Text to Image Synthesis


               this paper was Released just this past June and looks to the task of  converting text descriptions into images . For example, the input to the network could was "a flower with pink petals" and the output was a generated image that Contai NS those characteristics. So this task involves the main components. One is utilizing forms Of natural language processing to understand the input description and the other is a Generati ve network that's able to output an accurate and natural image representation. One note, the authors make is, the task of going from text to image was actually a lot harder than that's going fr Om image to text (remember Karpathy ' S paper). This is because of the incredible amount of pixel configurations and because we can ' t really decompose the task into just Predicting the next word (the the-that-image to text works).  


The approach the authors take is training a GAN, that's conditioned on text features created by a recurrent text encoder ( Won ' t go too much into this, but here's the paper for those interested). Both the generator and the discriminator use these features on points in their respective network architectures. The enables the correlation between the input description and the generated image.

Network Architecture

Let's look at the generator first. We have our noise vectors z along with a text encoding as the inputs to the network. Basically, this text encoding is a-a-encapsulating information about the-input description in a-it can then Be concatenated to the noise vector (see image below for a visualization). Deconv layers is then used to transform the input vector into the synthetic image.

The discriminator takes in an image, passes it through a series of conv layers (with batchnorm and leaky relus). When the spatial dimensions finally get to 4×4, the network performs a depth concatenation with that text encoding we were Talking about earlier. After this, there was 2 more conv layers and the output was (as always) a score for the realness of the image.


One of the interesting things about the "This" is the "the" and "it has" to be trained. If you think closely on the task at hand, the generator have to get both jobs right. One is that it have to generate natural and plausible looking images. The images must correlate to the given text description. The discriminator, thus, must also keep these, things to account, making sure that "fake" or unnatural images is Rej Ected as well as images that mismatch the text. In order to create these versatile models, the authors train with three types of data: {Real image, right text}, {fake IMA GE, right text}, and {Real image, wrong text}. With so last training data type, the discriminator must learn to reject mismatched images (even if they look very Natura L).

Super Resolution using Gans


As a testament to the type of rapid innovation it takes place in this field, the team at Twitter Cortex released this PA Per only a couple weeks ago. The model being proposed in this paper is a super-resolution generative adversarial network, or Srgan ('ll we ever run OU T of these acronyms?). The main contribution is a brand new loss function (better than plain old MSE) that enables the network to recover Rea Listic textures and fine grained details from images that has been heavilydownsampled.


Let's first take a look at the this new perceptual loss function is introduced. This loss function can is divided into the parts, the adversarial loss and the content loss. From a high level, the adversarial loss encourages images. Look natural (look like they ' re from the distribution) and The content loss makes sure that the new resolution image have similar features to the original low res image.

Network Architecture

               okay, now let ' s get into the specifics. Let's start off with a high resolution version of a given image and then a lower resolution version. We want to train our generator so, given the low resolution image, we can has an output that's as close to the high R ES version as possible. This output is called a super-resolved image. The discriminator is then being trained to distinguish between these images. Same old same old, right? The Generator network architecture uses a set of B residual blocks that contain relus and batchnorm and conv layers. Once the low res image passes through those blocks, there is both deconv layers that enable the increase of the resolution . Then, looking at the discriminator, we had eight convolutional layers that leads into a sigmoid activation function which Outputs the probabilities of whether the image is real (high res) or artificial (super res).  

Loss function 

               now let's look at that New loss function. It is actually a weighted sum of individual loss functions. The first is called a content loss. Basically, it's a Euclidean distance loss between the feature maps (in a pretrained Vgg network) of the new reconstructed Image (output of the network) and the actual high res training image. From what I understand, the main goal was to ensure, the content of the the and the images are similar by looking at their resp Ective feature activations after feeding them into a trained convnet (Comment below if anyone have other ideas!). The other major loss function, the authors defined is the adversarial loss. This one was similar to the normally expect from Gans. It encourages outputs that's similar to the original data distribution through negative log likelihood. A regularization loss caps off the trio of functions. With this novel loss function, the generator makes Sure to output larger res images, natural and still retain a similar pixel space when compared to the low res ve rsion. 

Quick General side N ote: Gans use a largely unsupervised training process (all your need to a dataset of real images, no label s or anything). This means, we can take advantage of a lot of the unstructured image data which is available today. After training, we can use the output or intermediate layers as feature extractors the can is used for other classifiers, Which now won ' t need as much training data to achieve good accuracy.

Paper that I couldn ' t get to, but still insanely cool: Dcgans. The authors didn ' t do anything crazy. They just trained a really really large convnet, but the trick was that they had the right hyperparameters to really make T He training work (aka Batchnorm, Adam, leaky relus).

Gans that could change the fashion industry:



Tweetwritten on September 30, 2016

(turn) deep learning Review Week 1:generative adversarial Nets

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.