"This article are a continuation from a wizard ' s Guide to Autoencoders:part 1, if you haven ' t read it but are familiar wit H The basics of autoencoders then continue on. You'll need to know a little bit about probability theory which can is found here. "
Part 1:autoencoder?
We are off Part 1 by passing a value (0, 0) to our trained decoder (which has 2 neurons in the input) and finding its out Put. It looked blurry and didn ' t represent a clear digit leaving us with the conclusion ' the output of the encoder H (Also Known as the latent code) is not distributed evenly in a particular space.
So, we main aim in here part is to force the encoder output to match a given prior distribution, this required distr Ibution can be a normal (Gaussian) distribution, uniform distribution, gamma distribution .... This is should then cause the latent code (encoder output) to is evenly distributed over the given prior distribution, which Would allow we decoder to learn a mapping from the prior to a data distribution (distribution of mnist images ). If you understood absolutely the above paragraph.
Let's say you ' re in college and have opted to take up Machine Learning (I couldn ' t-I-another course:p) as one of Y Our courses. Now, if the course instructor doesn ' t provide a syllabus guide or a reference book then, what'll you study for your Ls? (Assume your classes weren ' t helpful).
You could is asked questions from any subfield of ML, what would your do? Makeup stuff with what for you know??
This is what happens if we don't constrain our encoder output to follow some distribution, the decoder cannot a learn ing from any number to an image.
But, if you are are given a proper syllabus guide, you can then go just the through materials the before and I ll exam An idea of the what to expect.
Similarly, if we force the encoder output to follow a known distribution like a Gaussian, then it can learn to spread the Latent code to cover the entire distribution and learn mappings any gap. Good or bad?
We now know this autoencoder has two parts performing, each completely a opposite task. Two people of similar nature can never get alone, it takes two to opposites. -ram Mohan
The encoder which is used to get a latent code (encoder output) from the "input with" constraint that the dimension of T He latent the code should is less than the input dimension and secondly, the decoder that takes in this latent code and tries To reconstruct the original image. Autoencoder Block Diagram
Let ' s to how the encoder output is distributed when we previously implemented our Autoencoder (checkout Part 1): Encoder Histogram and distribution
From the distribution graph (which are towards the right) we can clearly to that we encoder ' s output distribution be all Over the place. Initially, it appears as though the distribution is centred in 0 with most of the values being negative. At later stages during training the negative samples are distributed farther away from 0 when compared to the positive one S (also, we might not even get the same distribution if we run the experiment again). This is leads to large amounts of gaps at encoder distribution which isn ' t a good thing if we want to use our decoder as A generative model. But, why are these gaps a bad thing into have in our encoder?
If we give an input so falls in is gap to a trained decoder then it ' ll give weird looking images which don ' t represent Digits at it output (I know, 3rd time).
Another important observation that's made is this training a autoencoder gives us latent codes with similar images (for Example all 2s or 3s ...) Being far from the Euclidean spaces. This is the for example, can cause all 2s in our dataset to is mapped to different regions into space. We want the latent code to have a meaningful representation by keeping images of the similar of-close digits. Some thing like this:a good 2D distribution
Different colored regions represent one class of images, notice how the same colored-regions-close to one are.
We can solve some to the above mentioned problems at adversarial autoencoders.
An adversarial autoencoder are quite similar to a autoencoder but the encoder is trained in a adversarial to manner It to output a required distribution.
Understanding adversarial autoencoders (AAES) requires knowledge-generative adversarial Networks (Gans), I have writ Ten a article on Gans which can be found Here:gans N ' Roses
"This article assumes the reader is familiar with neural Networks and using TensorFlow. If not, we would request you to...medium.com
If you are already know about Gans here's a quick recap (feel to skip this section if you remember the next two points): Discriminator Generator Gans have two neural, a nets and a generator. Generator, generates fake images. We train the discriminator to tell apart real images from our datasets with the fake ones generated by our generator. The generator initially generates some random noise (because it ' s weights to be random). After the training we discriminator to discriminate the random noise and real images, we'll connect our generator to our dis Criminator and Backprop only through the generator with the constraint of that discriminator output should is 1 (i.e, the Discriminator should classify the output of the generator as real images). We'll again train our discriminator to now apart the new fake images from our generator and the real ones from our DA Tabase. This is followed by training the generated to generate better fake looking. We ' ll continue This process until the generator becomes so good at generating fake images, the discriminator is no longer to TE ll real images from fake ones. At the end, we'll be left with a generator which can produce real looking fake images given a random set of numbers as its Input.
Here's a block diagram of the adversarial autoencoder:aae block diagram x→input image q (z/x) →encoder output given Inpu T x z→latent code (fake input), z is drawn from Q (z/x) z ' →real input with the required distribution P (x/z) →decoder ou Tput given z D () →discriminator x_→reconstructed image
Again, our main are to force the encoder to output values which have a given prior-distribution (this can be normal, gamma .. Distributions). We'll use the encoder (Q (z/x)) as our generator, the discriminator to tell if the samples are from a prior distribution (p (z)) or from the output of the encoder (z) and the decoder (P (x/z)) to the original input image.
To architecture could is used to impose a prior distribution on the encoder output, lets Have a look at the how we go about training a AAE.
Training an AAE has 2 phases:reconstruction phase:
We ll train both the encoder and the decoder to minimize the reconstruction loss (mean squared error between the input and The decoder output images, checkout Part 1 for more details). Forget that's discriminator even exists in this phase (I ' ve greyed out of the parts that aren ' t required to this phase). Reconstruction Phase
As usual we'll pass inputs to the encoder which'll give us our latent code, later, we'll pass this latent code to the de Coder to get back the input image. We ' ll Backprop through both the encoder and the decoder weights so reconstruction loss'll be reduced. Regularization phase:
In this phase we'll have to train the discriminator and the generator (which are nothing but our encoder). Just forget that the decoder exists. Training the Discriminator
We train the discriminator to classify the encoder output (z) and some random input (z ', this would have our required Distribution). For example, the random input can is normally distributed with a mean of 0 and standard deviation of 5.
So, the discriminator should give us is output 1 if we in random inputs with the desired distribution (real values) a nd should give us an output 0 (fake values) when we are in the encoder output. Intuitively, both the encoder output and the random inputs to the discriminator should the have size.
The next step would be to force the encoder to output latent code with the desired distribution. To accomplish this we'll connect the encoder output as the input to the discriminator:
We ' ll fix the discriminator weights to whatever they are currently (make them untrainable) and fix the target to 1 at the Discriminator output. Later, we pass in images to the encoder and find the discriminator output which are then used to find the loss (Cross-entro PY cost function). We ' ll Backprop only through the encoder weights, which causes the encoder to learn the required distribution and produce O Utput which ' ll have that distribution (fixing the discriminator target to 1 should cause the encoder to learn the required Distribution by looking at the discriminator weights).
Now that's theoretical part of the way, let's have a look at how we can implement this using TensorFlow.
Here's the entire code for Part 2 (It's very similar to what we ' ve discussed into Part 1): Naresh1318/adversarial_autoencode R
Adversarial_autoencoder-a Wizard ' s Guide to adversarial autoencodersgithub.com
As usual we have our helper dense ():
I Haven ' t changed the encoder and the decoder Architectures:encoder Architecture decoder architecture
Here ' s the discriminator Architecture:discriminator architecture
It's similar to my encoder architecture, the input shape is Z_dim (batch_size, Z_dim actually) and the output has a shape of 1 (batch_size, 1).
Note This I ' ve used the prefixes E_, D_ and dc_ while defining to dense for the layers, encoder and decoder R respectively. Using These notations help us collect the weights to be trained easily:
We now know this training an AAE has two parts, the being reconstruction (we'll phase our train to Autoencoder struct the input) and the regularization phase (the discriminator is trained followed by the encoder).
We'll begin the reconstruction phase by connecting my encoder output to the decoder input:
I ' ve used Tf.variable_scope (Tf.get_variable_scope ()) Each time I call any of our defined architectures as it ' ll allow us T o share the weights among all function calls (this happens only if reuse=true).
The loss function as usual is the Mean squared Error (MSE), which we ' ve come across in Part 1.
Similar to what "we did in part 1, the optimizer (which ' ll update the weights to reduce the loss[hopefully]") is implemented As Follows:i couldn ' t help it:P
That's it for the reconstruction phase, and next we move in to the regularization phase:
We'll be the train