Conditional Generative adversarial Nets

Source: Internet
Author: User
Tags nets processing text

This article is the expansion of Gans, in the generation and discrimination, taking into account the additional conditions Y, in order to carry out more "fierce" confrontation, so as to achieve better results.

As we all know, Gans is a Minmax process:

    

Gan's original model has many shortcomings that can be improved, the first one is "model is not controllable". From the above introduction to Gan can be seen, the model with a random noise as input. Obviously, it is difficult to control the structure of the output. For example, using pure Gan, we can train a generator: Enter random noise and produce a picture that says 0-9 of a number. However, in real-world applications, we often want to create a "designated" image.

In this paper, by introducing conditional y, the objective function of optimization is changed into:

    

The structure of the conditional-generation adversarial network is given:

The noise z and condition Y are fed into the generator at the same time, the cross-domain vectors are generated, and the nonlinear functions are mapped to the data space.

The data x and condition Y are sent to the discriminant at the same time, generating the cross-domain vectors, and further judging the probability that X is the real training data.

Experiment

1.MNIST Data Set Experiment

In the mnist, a numeric category label is the constraint, resulting in a corresponding number based on the category label information.

The input to the generated model is a 100-dimensional, uniformly distributed noise vector, and condition y is the one hot code for the category label. Noise z and label y are mapped to the hidden layer (200 and 1000 unit) respectively, and all 1200 unit are connected before being mapped for the second time. Finally, a single-channel image of 784-D (28*28) is output with a sigmoid layer.

The discriminant model maps the input image x to a maxout layer with 240 unit and 5 pieces, and maps y to a maxout layer with 50 unit and 5pieces. All hidden layers are then mapped to 240 unit and 4pieces maxout layer, then through the sigmoid layer. The final output is the probability that the sample x comes from the training set.

Automatic image labeling experiment on 2.Flickr datasets (Mir flickr-25k)

first, a convolution model is trained as a feature extractor on the complete imagenet dataset (21,000 labels). For the expression of words (the original is world representation, the individual is considered a clerical error, it should be word representation), the author uses yfcc100m data set User-tags, titles and descriptions, Use Skip-gram to train a 200-d word vector. In training, words with a word frequency of less than 200 are ignored, and the final dictionary size is 247465.  

The experiment was based on the Mir Flickr dataset, using the above model to extract images and text features. For the sake of evaluation, for each image we generated 100 of the label samples, for each generated label using the cosine similarity to find 20 closest words, and finally selected 10 of the most common words.

In the experiment, the best-performing generator is to receive 100-D Gaussian noise to map it to a 500-dimensional relu layer, while mapping the 4096-dimensional image eigenvector to a 2000-dimensional relu hidden layer, and then the above two representations join together to map to a 200-dimensional linear layer, Finally, the 200-dimensional copy-label text vector is output by this layer. (Noise + image)

The discriminant is composed of 500 and 1200-D relu hidden layers for processing text and images. The Maxout layer is a 1000-cell and 3spieces connection layer used to process input data for the final sigmoid layer. (Text + image)

Note:

1. One hotcode

There are three characteristics of the following attributes:

    • Gender: ["Male", "female"]
    • Area: ["Europe", "US", "Asia"]
    • Browser: ["Firefox", "Chrome", "Safari", "Internet Explorer")
for a sample, such as ["Male", "us", "Internet Explorer"], we need to digitize the characteristics of this categorical value, the most direct way, we can take the way of serialization: [0,1,3]. However, such feature processing cannot be directly put into the machine learning algorithm. treatment Method of One-hot encoding for the above problem, the gender attribute is two-dimensional, the same, the region is three-dimensional, the browser is thinking, so that we can use the One-hot encoding way to the above sample "[" Male "," US "," Internet Explorer "]" code, "male [1,0], the same as "US" corresponds to [0,1,0], "Internet Explorer" corresponds to [0,0,0,1]. The result of the complete feature digitization is: [1,0,0,1,0,0,0,0,1]. The result is that the data becomes very sparse.

2.maxout (parametric k=5)

So this is why the use of maxout, the number of parameters into a K-fold increase in the reason. Originally we only need a set of parameters is enough, after the use of maxout, it is necessary to have K-group parameters.

Conditional Generative adversarial Nets

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.