vae--is autoencoder encoded output obeys normal distribution.

Last Update:2018-08-13 Source: Internet

Author: User

Tags mul pytorch

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Fancy explanations for Autoencoder and vae What is an automatic encoder

The Automatic encoder (Autoencoder) is initially used as a compression method for data, which has the following characteristics:

1) high correlation with the data, which means that the automatic encoder can only compress data similar to the training data, this is actually more obvious, because the use of neural network extraction features are generally highly related to the original training set, the use of human face training out of the automatic encoder in the compression of natural animal pictures is the performance will be relatively poor, Because it only learns the characteristics of the face, but not to learn the nature of the picture characteristics;

2) After compressing the data is lossy, this is because in the process of dimensionality reduction inevitably to lose information;

By the year 2012, it has been found that using an automatic encoder in a convolutional network for pre-level training can train deeper networks, but soon people find that a good initialization strategy is much more effective than the laborious level-by-layer pre-training, which appeared in the 2014 batch Normalization technology is also a deeper network can be effectively trained, by the end of 15, through the residual (ResNet) we can basically train any depth of neural network.

So now the main application of automatic encoder has two aspects, the first is data denoising, the second is to visualize the dimensionality reduction. However, the automatic encoder also has a function is to generate data.

We've talked about Gan before, and it has some advantages over Gan, but it also has some drawbacks. Let's talk about what the advantages are compared to GAN.

1th, we use GAN to generate images there is a very bad drawback is that we generate images using random Gaussian noise, which means we do not generate any of our specified type of picture, that is, we can not decide which random noise to use to produce the image we want, Unless we can try the initial distribution all over again. But using an automatic encoder, we can get the encoding of this type of image by the encoding process of the output image, which is equivalent to knowing the corresponding noise distribution of each image, and we are able to generate the image we want by selecting specific noise.

2nd, this is not only the advantages of generating networks, but also has some limitations, which is to generate the network through the process of confrontation to distinguish between "real" pictures and "fake" pictures, but the resulting picture is just as true as possible, but this does not guarantee that the content of the picture is what we want, in other words, It is possible to generate the network as much as possible to generate some background patterns so that it is as true as possible, but there are no actual objects.

Structure of the Automatic encoder

First we give the general structure of the automatic encoder

From the above figure, we can see two parts, the first part is the encoder (Encoder), the second part is the decoder (Decoder), the encoder and decoder can be any model, usually we use the neural network model as encoder and decoder. The input data is reduced to one code by a neural network, and then another neural network is decoded to obtain a generated data that is exactly the same as the input data, and then the parameters of the encoder and decoder in this network are trained by comparing the two data and minimizing the differences between them. When this process is finished, we can take out the decoder and randomly pass in a code, hoping that the decoder can generate a data that is about the same as the original data, the example of which is the hope to generate a similar image.

Can this matter be realized? In fact, we can use Pytorch to implement an automatic encoder.

First, we build a simple multilayer perceptron to implement.

ClassAutoencoder(Nn.Module):Def__init__(Self):Super(Autoencoder,Self).__init__()Self.Encoder=Nn.Sequential(Nn.Linear(28*28,128),Nn.ReLU(True),Nn.Linear(128,64),Nn.ReLU(True),Nn.Linear(64,12),Nn.ReLU(True),Nn.Linear(12,3))Self.Decoder=Nn.Sequential(Nn.Linear(3,12),Nn.ReLU(True),Nn.Linear(12,64),Nn.ReLU(True),Nn.Linear(64,128),Nn.ReLU(True),Nn.linear (12828 *28), nn. Tanh () ) def forward ( selfx): x = Span class= "BP" >self. Encoder (x) x = . Decoder (x) return x

Here we define a simple 4-layer network as the encoder, the middle using the Relu activation function, the final output dimension is 3-dimensional, the definition of the decoder, input three-dimensional encoding, output a 28x28 image data, especially to note that the last use of the activation function is Tanh, This activation function converts the final output to between 1 and a, because the image we have entered has been transformed between -1~1 and the output must correspond to it.

The training process is also relatively straightforward, and we use the minimum mean square error as a loss function to compare the difference between the generated picture and the original image at each pixel point.

At the same time, we can replace multilayer perceptron with convolutional neural network, which has better effect on image feature extraction.

class autoencoder(nn.Module):    def __init__(self):        super(autoencoder, self).__init__()        self.encoder = nn.Sequential(            nn.Conv2d(1, 16, 3, stride=3, padding=1),  # b, 16, 10, 10            nn.ReLU(True),            nn.MaxPool2d(2, stride=2),  # b, 16, 5, 5            nn.Conv2d(16, 8, 3, stride=2, padding=1),  # b, 8, 3, 3            nn.ReLU(True),            nn.MaxPool2d(2, stride=1)  # b, 8, 2, 2        )        self.decoder = nn.Sequential(            nn.ConvTranspose2d(8, 16, 3, stride=2),  # b, 16, 5, 5            nn.ReLU(True),            nn.ConvTranspose2d(16, 8, 5, stride=3, padding=1),  # b, 8, 15, 15            nn.ReLU(True),            nn.ConvTranspose2d(8, 1, 2, stride=2, padding=1),  # b, 1, 28, 28            nn.Tanh()        )    def forward(self, x):        x = self.encoder(x)        x = self.decoder(x)        return x

The NN is used here. Convtranspose2d (), which can be seen as an inverse of convolution, can be seen in a sense as a deconvolution.

We use convolutional network to get the final generated picture effect will be better, specific picture effect I will no longer here, you can see the picture on our GitHub display.

Variational Automatic encoder (variational autoencoder)

The Variational encoder is an upgraded version of the Automatic encoder, which is similar in structure to an automatic encoder, and is composed of encoders and decoders.

Recall what we did in the Auto encoder, we need to type in a picture, then encode a picture to get an implied vector, which is better than random random noise, because it contains the information of the original image, and then we decode the image corresponding to the original picture with the hidden vector.

But then we can't actually generate the image arbitrarily, because we have no way to construct the hidden vector ourselves, we need to enter the code through an image we know what the implied vector is, then we can solve the problem by means of the variational automatic encoder.

In fact, the principle is particularly simple, just need to add some restrictions in the coding process, forcing its generated by the implicit vector can be roughly followed by a standard normal distribution, which is the most different from the general automatic encoder.

So that we can create a new picture is very simple, we just need to give it a standard normal distribution of the random implied vector, so that the decoder will be able to generate the image we want, and do not need to give it a original image encoded first.

In the actual situation, we need to make a trade-off between the accuracy of the model and the implied vector following the standard normal distribution, that is, the accuracy of the model refers to the similarity between the image generated by the decoder and the original image. We can let the network make this decision for ourselves, it is very simple, we just need to make a loss of both, and then add them as the total loss, so that the network can choose how to make this total loss down. In addition, we want to measure the similarity of the two distributions, how to see the mathematical derivation of the previous Gan, you know there will be a thing called KL divergence to measure the similarity of the two distributions, here we are using KL Divergence to represent the loss of the difference between the implied vector and the standard normal distribution, and the other loss is still represented by the mean square error of the generated picture and the original picture.

We can give a formula for KL divergence.

Here the Variational encoder uses a technique called "re-parameterization" to solve the calculation problem of KL divergence.

Instead of producing an implied vector each time, we generate two vectors, one for the mean, one for the standard deviation, and then the two statistics to synthesize the hidden vectors, which is also very simple, with a standard normal distribution first by the standard deviation plus the mean value on the line, Here our default encoding after the implied vector is subject to a normal distribution. This time we want to make the mean as close as possible to 0, the standard deviation as close as 1. And the thesis has a detailed derivation of how to get this loss formula, interested students can go to see the derivation

Here is the implementation of Pytorch

Reconstruction_function=Nn.Bceloss(Size_average=False)# MSE LossDefLoss_function(Recon_x,X,Mu,Logvar):"""Recon_x:generating ImagesX:origin ImagesMu:latent meanLogvar:latent Log Variance"""BCE=Reconstruction_function(Recon_x,X)# loss = 0.5 * SUM (1 + log (sigma^2)-mu^2-sigma^2)Kld_element=Mu.Pow(2).Add_(Logvarexp ()) . Mul_ (-1) . Add_ (1) . Add_ (logvar) kld = torch. Sum (kld_element) . Mul_ (-0.5) # KL divergence return bce + kld

In addition, the Variational encoder allows us to generate implicitly variable randomly, and it can improve the generalization ability of the network.

Finally, the code implementation of VAE

ClassVae(Nn.Module):Def__init__(Self):Super(Vae,Self).__init__()Self.Fc1=Nn.Linear(784,400)Self.Fc21=Nn.Linear(400,20)Self.Fc22=Nn.Linear(400,20)Self.Fc3=Nn.Linear(20,400)Self.Fc4=Nn.Linear(400,784)DefEncode(Self,X):H1=F.Relu(Self.Fc1(X))ReturnSelf.Fc21(H1),Self.Fc22(H1)DefReparametrize(Self,Mu,Logvar):Std=Logvar.Mul(0.5).Exp_()IfTorch.Cuda.Is_available():Eps=Torch.Cuda.Floattensor(Std.Size()).Normal_()Else:Eps=Torch.Floattensor(Std.Size()).Normal_()Eps=Variable(Eps)ReturnEps.Mul(Std).Add_(Mu)DefDecode(Self,Z):H3=F.Relu(Self.Fc3(Z))ReturnF.Sigmoid(Self.Fc4(H3def forward (self x): mulogvar Span class= "o" >= self. Encode (x) z = . Reparametrize (mulogvar) return self. Decode (zmu Logvar

VAE results are much better than ordinary automatic encoders, and here's the result

The disadvantage of VAE is also obvious, he is directly calculated to generate the image and the original picture of the mean square error rather than Gan as opposed to learn, which makes the resulting picture will be a little blurred. There is now some work to combine vae and Gan, using the VAE structure, but using a confrontation network for training, specifically refer to this paper.

Content reference to: Kvfrans blog

This code has been uploaded to GitHub.

vae--is autoencoder encoded output obeys normal distribution.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More