Bai Yue, Hudy variational self-encoder (VAE) and Generation countermeasure Network (GAN) are two kinds of methods for unsupervised learning in complex distribution. Recently, Google brain Ilya Tolstikhin and other people put forward another new idea: Wasserstein from the encoder, which not only has some of the advantages of VAE, but also combined with the characteristics of GAN structure, can achieve better performance. The study's paper "Wasserstein auto-encoders" has been accepted by the upcoming ICLR 2018 conference, to be held in Vancouver on April 30.
The field of representation learning (representation learning) was initially implemented by a supervised method, and the data set with oversized annotations was used to obtain the outstanding results. The models that were previously generated by unsupervised methods often used probabilistic approaches to deal with low dimensional data. In recent years, these two methods have been gradually combined. In the new field of intersection, a mature method of variational automatic encoder (VAE) [1] appears, although the theory is mature, the fuzzy sample is generated when applied to the natural image. By contrast, the generation of the Confrontation Network (GAN) [3] is more prominent in the visual quality of the image sampled by the model, but its disadvantage is that there is no encoder, it is harder to train, and there is a problem with "mode collapse", and the final model cannot capture all the changes in the real data distribution. In previous studies, researchers have analyzed many of the GAN structures and the vae and GAN combinatorial structures, but we have yet to find a uniform framework to properly combine the advantages of Gan and VAE.
The work of Google's brain is based on the theoretical analysis put forward by the people of L. Mescheder [11]. Based on Wasserstein GAN and vegan, we generate modeling from the perspective of optimal transmission (ot:optimal transport). The optimal transmission cost [5] is a method of measuring the distance between probability distributions and is weaker than other methods (including F-gain (f-divergences) associated with the original GAN algorithm). This is very important in the application, because in input space X, the data is usually supported by a low dimensional manifold. Therefore, a stronger distance concept, such as capturing the f gain of the density ratio between distributions, is often the largest and does not provide a useful gradient for training. In contrast, some people say that OT will perform better [4, 7], although in the implementation of its GAN class, it is necessary to add constraints or regular items to the target.
In this article, our goal is to minimize the actual (but unknown) distribution of the data in PX, the implicit variable model (latent codes) z∈z by the hidden code (a priori distribution), and the data point x∈ (x| Z) generation model PG (x| Z) between the OT Wc (PX, PG). Our main contributions are as follows (see Figure 1):
Wasserstein Automatic encoder (WAE), a new regularization automatic encoder family (algorithm 1,2 and Equation 4), can minimize any cost function C of the best transfer Wc (PX,PG). Similar to VAE, the target of Wae is composed of two components: C-c-reconstruction cost and a regularization matrix, which is used to punish the distribution contradictions of two distributions and coded data points in Z:pz, i.e. QZ: = EPX [Q (z| X)]. When the C is the cost of the square, DZ is the GAN target, Wae and [2] in line with the confrontation from the encoder.
Wae through Cost square c (x, y) = | | x−y| | 2 are evaluated on mnist and Celeba datasets. The researcher's experiment showed that Wae maintained the good properties of VAE (training stability, encoder-decoder architecture and a good latent manifold structure), while generating better quality samples, close to the samples produced by GAN.
We propose and test two different normalized matrix DZ (PZ,QZ). A confrontation training based on GAN and hidden spaces (latent space) Z, the other using the maximum mean difference (maximum mean discrepancy), can be well used to match the high dimensional standard normal distribution pz[8].
Finally, the theoretical considerations for deriving vegan targets in the from optimal transport to generative modeling:the cookbook Wae [11] may be interesting in themselves. In particular, theorem 1 indicates that in the case of a model, the original form of Wc (PX,PG) is equivalent to an optimization probability encoder Q (Z | X) optimization problem.
This article is structured as follows. In the second part, we review a new automatic encoder formula, which is used to calculate the OT between PG and the implicit variable model derived from PX and [11]. The ultimate constraint optimization problem (Wasserstein Automatic encoder target) is relaxed. Two different regularization matrices are obtained, and the Wae-gan and WAE-MMD algorithms are obtained. The third part discusses related work. The forth part is the result of the experiment and concludes with the direction of future work.
Figure 1:vae and Wae minimize two items: a regular matrix that reconstructs costs, punishes PZ, and the difference between distributions caused by encoder Q. Different input samples for PX x,vae Q (z| x = x) matches the PZ. As shown in figure (a), each of the red balls matches the PZ (the white graphic in the figure). The red ball begins to cross, which is when the problem begins to rebuild. Conversely, as in figure (b), Wae makes continuous mixing (continuous mixture) QZ: =∫q (z| X) DPX matches the PZ (green ball in the figure). Therefore, different samples of hidden code have the opportunity to stay away from each other and thus better rebuild.
Algorithm 1. Wasserstein Automatic encoder and the algorithm based on GAN penalty (Wae-gan). Algorithm 2. Wasserstein Automatic encoder and MMD penalty based algorithm (WAE-MMD).
Figure 2: VAE (left column) trained on the Mnist dataset, WAE-MMD (middle column) and Wae-gan (right column). In test rebuild, the odd row corresponds to the actual test point.
Figure 3: VAE (left column) trained on the Celeba dataset, WAE-MMD (middle column) and Wae-gan (right column). In test rebuild, the odd row corresponds to the actual test point.
The FID score of the sample in the table 1:celeba (the smaller the number the better).
Thesis: Wasserstein auto-encoders
Thesis Link: https://arxiv.org/abs/1711.01558
Absrtact: We propose an Wasserstein automatic encoder (WAE)--A new algorithm for building a data distribution generation model. Wae The penalty form of the Wasserstein distance between the model distribution and the target distribution is minimized, and the different regularization matrices [1] used by the variational Automatic encoder (VAE) are derived. This regularization matrix encourages the coded training distribution to match the previous. We have compared our algorithms with several other techniques, indicating that it is a generalization against the Automatic encoder (AAE) [2]. Our experiments show that Wae has many characteristics of VAE (training stability, encoder-decoder architecture, good latent manifold structure), while generating a better sample of quality measured by FID scores.
Https://www.jiqizhixin.com/articles/google-brain-Wasserstein-Auto-Encoders