This blog is reproduced from a blog post, introduced Gan (generative adversarial Networks) that is the principle of generative warfare network and Gan's advantages and disadvantages of analysis and the development of GAN Network research. Here is the content.
1. Build Model 1.1 Overview
Machine learning methods can be divided into generation methods (generative approach) and discriminant methods (discriminative approach), and the models learned are called generative models (generative Model) and discriminant models (discriminative mode) [1 Hangyuan Li]. The generation method uses the observed data to learn the joint probability distribution P (X, Y) of the sample and the label, and the trained model can generate new data that conforms to the sample distribution, which can be used for supervised learning and unsupervised learning. In the supervised learning task, the conditional probability distribution P (x,y) is derived from the Bayesian formula by using the joint probability distribution ✓ (y| X), the model is predicted, and the typical models are naive Bayesian, mixed Gaussian model and hidden Markov model. The unsupervised generation model depicts the distribution characteristics of sample data by learning the essential characteristics of real data, and generates new data similar to training samples. The parameters of the generated model are much smaller than the amount of training data, so the model can discover and effectively instantiate the nature of the data so that the data can be generated. The generative model occupies the main position in unsupervised depth learning and can be used to capture higher-order dependencies of observed or visible data without target class label information. The depth generation model can effectively generate samples by sampling from the network, such as limited Boltzmann machines (restricted Boltzmann Machine, RBM), depth belief network (Deep belief Network, DBN), depth Boltzmann machine (Deep Boltzmann Machine, DBM) and generalized de-noising self-encoder (generalized denoising autoencoders). In the past two years, the popular generation model has been divided into three kinds of methods [OpenAI first research]: Generate the countermeasure network (gan:generative adversarial Networks)
The two-person zero-sum game in Gan-inspired game theory, pioneered by [Goodfellow et al, NIPS 2014], contains a generative model (generative) and a discriminant model (discriminative models D). The generation model captures the distribution of sample data, and the discriminant model is a two classifier to determine whether the input is a real or a generated sample. The optimization process of this model is a "two-yuan Minimax game (minimax two-player game)" problem, the training fixed one side, update the other model parameters, alternating iterations, so that each other's error maximization, finally, G can estimate the distribution of sample data. Variational self-encoder (vae:variational autoencoders)
This problem is formalized in the framework of the probabilistic graph model (probabilistic graphical models)--in the probabilistic graph model, we maximize the lower limit (lower bound) on the logarithmic likelihood of the data. Autoregressive model (Autoregressive models)
The autoregressive model, PIXELRNN, trains the network by modeling the conditional distribution of each individual pixel, given the preceding pixel (left or upper). This is similar to inserting the pixel of an image into the CHAR-RNN, but the RNN runs both horizontally and vertically in the image, not just the 1D sequence of characters. Classification of 1.2 generative models [blockbuster | Yoshua Bengio Deep Study Summer course]
Total observation Model (fully observed Models)
The model directly observes the data without introducing any new non observable local variables. Such models can directly compile the relationship between observation points. For directional graph models, it is easy to expand into large models, and because logarithmic probabilities can be computed directly (no approximate computation is required), parametric learning is also very easy. For the non directional model, parametric learning is difficult because we need to compute normalized constants. Generation in the full observation model is slow. The following figure shows the different full observation generation models [pictures from Shakir Mohamed's show]:
Transform model (Transformation Models)
The model uses a parameterized function to transform a non observed noise source. Easy to Do (1): sample from these models (2): Only expected if the final distribution is not known. They can be used in large classifiers and convolution neural networks. However, it is difficult to use these models to maintain reversibility and extend to the general data type. The following figure shows the different transformation generation models [pictures from Shakir Mohamed's show]:
Implicit variable model (latent Variable Models)
In these models, a non observable local random variable representing the hidden factors is introduced. It is easy to sample from these models and add levels and depths. You can also use the margin probability for scoring and model selection. However, it is difficult to decide which implicit variable is associated with an input. The following illustration shows a different model of implicit variable generation [pictures from Shakir Mohamed's show]:
The application of 1.3-generation model
We need to generate (generative models) models so that we can move from the associated input to the output, perform semi-supervised classification (semi-supervised classification), Data manipulation (semi-supervised Classification), fill in the blanks (filling in the blank), image Repair (inpainting), de-noising (denoising), one-shot generation [Rezende et al, ICML 2016], and more applications. The following figure shows the progress of the build model (noting that the longitudinal axis should be a negative logarithm probability) [the picture comes from Shakir Mohamed's show]:
According to 2016 SCALEDML conference Iiya Sutskever's Speech report "Recent progress of the generation model", the generation model mainly has the following functions: structured prediction, structured prediction (for example, output text); much more robust Prediction, more robust predictive Anomaly detection, anomaly detection model-based RL, model-based reinforcement Learning
Generation model future speculation can be applied in the field: Really good feature learning, very good characteristics of learning exploration in RL, study in the reinforcement of learning inverse RL, reverse-enhanced learning Good dialog that Actually works, the real use of the dialogue "Understanding the World", "Understanding Worlds" Transfer learning, Migration Learning 2. The thought and training method of generative antagonism network 2.1 gan
Gan[goodfellow Ian,gan] Inspired by the two-person zero-sum game in game theory (two-player game), pioneered by [Goodfellow et al, NIPS 2014]. In the two-person zero-sum game, the interest of the two-bit game is zero or a constant, that is, the other party has the gain, the other side will lose. The two-bit players in the GAN model are composed of the generative model (generative models) and the discriminant models (discriminative model) respectively. The model G captures the distribution of sample data, and the discriminant model is a two classifier that estimates the probability of a sample from training data rather than generating data. G and D are generally nonlinear mapping functions, such as multilayer perceptron, convolution neural network and so on. As shown in Figure 2-1, the left image is a discriminant model, when input training data x, expect output high probability (close to 1); the lower half of the right figure is a model, and the input is a random noise z that obeys a simple distribution (for example, a Gaussian distribution), and the output is a generated image of the same size as the training image. To the discriminant model D input to generate samples, for d the expected output low probability (judged to generate samples), for the generation of model G to try to deceive D, so that the discriminant model output high probability (misjudged as a real sample), thus creating competition and confrontation.
gan model has no loss function, the optimization process is a "two Yuan Minimax game (minimax two-player game)" Problem:
This is about value functions (value function) that discriminate network D and generate network G. The Training network d allows the maximum probability to be divided into the Training sample label (maximizing log D (x)), training network G minimize log (1–d (G (z))), i.e. maximizing D loss. In the training process, a fixed one, update the other network parameters, alternating iterations, so that each other's error maximization, finally, G can estimate the distribution of sample data. The generation model G implicitly defines a probability distribution PG, and we want PG to converge to the real distribution of the data pdata. The paper proves that this minimax game has the optimal solution when the PG = pdata, that is, the Nash equilibrium is achieved, at this time the model G recovers the distribution of training data, and the accuracy of discriminant model D equals 50%.
Fig. 2-2 Generation counter network algorithm process advantages and disadvantages of 2.2 gan
Compared to other generative models, the generation counter network has the following four advantages "OpenAI Ian Goodfellow's Quora question and answer": depending on the actual results, they appear to produce better samples than other models (the image is sharper and clearer). Generating a confrontational network framework can train any type of generator network (theoretically-in practice, it is difficult to train a generation network with discrete outputs with reinforce). Most other frameworks require that the generator network have some specific function forms, such as the output layer being Gauss. It is important that all other frameworks require a generator network spread over 0 quality (Non-zero mass). Generation of adversarial networks can learn to generate points only on thin manifold that are close to the data. There is no need to design a model that follows any kind of factorization, and any generator network and any discriminator can be useful. There is no need to use the Markov chain to sample repeatedly, not to infer in the learning process (inference), to avoid the problem of approximate calculation of the tricky probability.
A sample is less run time than PIXELRNN. GAN produces one sample at a time, and pixelrnn needs to produce a sample of the samples at a time.
Compared with VAE, it has no lower limit of change. If the discriminator network is perfectly fit, the generator network will be perfectly restored to the training distribution. In other words, the various adversarial generation networks are asymptotically consistent (asymptotically consistent), while VAE has a certain bias.
Compared with the depth Boltzmann machine, there is neither a lower limit for change nor a tricky partitioning function. Its samples can be generated at once, rather than by repeatedly applying the Markov chain operator (Markov chain operator).
Compared with GSN, its samples can be generated one at a time rather than repeatedly using Markov chain operators.
There is no limit to the size of the latent code compared to Nice and real NVE. The main problems existing in Gan are: Solving the problem of non-convergence (no convergence).
At present, the basic problem is: all the theories think that GAN should have excellent performance on Nash equilibrium (Nash equilibrium), but the gradient drop can guarantee Nash equilibrium only in the case of convex function. When both sides of the game are represented by neural networks, it is possible, without actually achieving a balance, to keep the adjustment to their own strategy "OpenAI Ian Goodfellow's Quora". Difficult to train: crash problem (collapse problem)
The GAN model is defined as a minimax problem with no loss function, and it is difficult to tell whether progress is being made during the training process. The learning process of Gan may have a crash problem (collapse problem), the generator starts to degenerate, always generates the same sample point, cannot continue to learn. When the generation model crashes, the discriminant model also points similar directions to similar sample points, and the training cannot continue. "Improved techniques for training Gans" does not need to be modeled beforehand, the model is too free and uncontrollable.
Compared with other generative models, the competitive mode of Gan no longer requires a hypothetical data distribution, that is, formulate p (x) is not required, but a distribution is used to sample the sampling directly, so that the real data can be fully approximated theoretically, which is the biggest advantage of Gan. However, the disadvantage of this method that does not require prior modeling is that it is too free, and for larger pictures, more pixel cases, the simple GAN approach is less manageable. In Gan[goodfellow Ian, Pouget-abadie J], the update process for each learning parameter was set to D update K-back, and G was updated 1 times, also for similar considerations. 3. Conditional generation confrontation network, the thought of Conditional generative adversarial Networks 3.1 Cgan
The above analysis suggests that, compared to other generative models, the competitive approach of Gan no longer requires a hypothetical data distribution, that is, formulate p (x) is not required, Instead, it uses a distribution to sample sampling directly, so that the real data can be fully approximated theoretically, which is also the biggest advantage of Gan. However, the disadvantage of this method that does not require prior modeling is that it is too free, and for larger pictures, more pixel cases, the simple GAN approach is less manageable. In order to solve the problem that Gan is too free, a natural idea is to add some constraints to Gan, so there is conditional generative adversarial Nets (Cgan) "Mirza M, Osindero S. Conditional ". This work proposes a conditionally constrained Gan, which introduces the condition variable y (conditional variable y) in the modeling of the Model (D) and discriminant model (G), and uses additional information y to add conditions to the model, which can guide the data generation process. These conditional variable y can be based on a variety of information, such as category labels, part of the data used for image repair [2], from different modal (modality) data. If the condition variable y is a category label, it can be seen as an improvement in the Cgan of pure unsupervised GAN into a supervised model. This simple and straightforward improvement has proven to be very effective and is widely used in subsequent related work [3,4]. The work of Mehdi Mirza et al. is to generate images of the specified category on the Mnist DataSet with category labels as conditional variables. The author also explores the application of Cgan in Multimode learning for automatic image tagging, and generates the tag word vector of the image on Mir Flickr25000 DataSet with the image feature as the conditional variable. 3.2 Conditional generative adversarial Nets 3.2.1 Generative adversarial Nets
Generative adversarial nets is a new method of training generative model proposed by GOODFELLOW[5, which includes two "confrontation" models: the Generation model (G) is used to capture data distribution, The discriminant model (D) is used to estimate the probability that a sample is derived from real data rather than from a sample. To learn to generate distributed PG on the real DataSet X, generate model G to construct a mapping function g (z;θg) from a priori distributed Pz (z) to the data space. The input of the discriminant model D is a real image or an image is generated, and D (x;θd) outputs a scalar indicating the probability that the input sample comes from a training sample rather than a sample generation.
Model G and D are trained at the same time: fixed discriminant model D, adjusting G's parameters to minimize the expectation of log (g (z)), 1−d model G, and adjusting D parameters to maximize the expectation of LOGD (X) + log (1−d (g (z))). This optimization process can be summed up as a "two Yuan Minimax game (minimax two-player game)" Problem:
3.2.2 Conditional Adversarial Nets
A conditional-generated confrontation network (Cgan) is an extension of the original Gan, and the generator and the discriminant add additional information y to the condition that Y can make arbitrary information, such as category information, or other modal data. As shown in Figure 1, the conditional Gan is achieved by conveying additional information y to the discriminant model and generating model as part of the input layer. In the generating model, the combination of the transcendental input noise P (z) and the conditional information y is combined to form the joint hidden layer representation. The confrontation training framework is quite flexible in the form of the representation of the hidden layer. Similarly, the objective function of conditional Gan is a two-person minimax game with conditional probability (Two-player minimax game):
The network structure of Cgan
3.3 Experimental 3.3.1 Mnist data set experiment
On the mnist, the condition of the category label (One-hot Code) for the training condition Gan, can be based on the label condition information, generate the corresponding number. The input of the generated model is the 100-D uniform distribution of the noise vector, and the condition variable y is one hot code for the category label. The noise z and label y are mapped to the hidden layer (200 and 1000 units), and all units are combined before mapping to the second layer. Finally, there is a sigmoid generation model output (784 D), the 28*28 one-channel image.
The input of the discriminant model is the 784-D image data and the condition variable y (one hot Code of the category label), and the output is the probability of the sample coming from the training set.
3.3.2 Multimode Learning for automatic image labeling
Automatic callout Image: Automated tagging of images, using multiple label predictions. Using conditional Gan, the distribution of tag-vector on image characteristic condition is generated. DataSet: MIR Flickr 25,000 DataSet, language model: Training a Skip-gram model with a 200-d word vector.
"Generate model input/output"
100-D =>500 dimension of noise data
Image feature 4096 D =>2000 dimension
These units are all mapped together to the 200-D linear layer,
Output generated word vector (200-D word vector)
"Discriminant model input/output"
Input:
500-D word vector;
1200-D image features
??? The generation and discriminant conditions enter Y, and the dimensions are different??? One is a 4096-dimensional image feature, the other is a dimension's vector _???
As shown in Figure 2, the first column is the original image, the second column is user-labeled tags, the third column is generated by the model G generated tags.
3.4 Future Works
1. Propose more complex methods, explore the details of Cgan and analyze their performance and characteristics in detail.
2. Each tag currently generated is independent of each other and does not reflect richer information.
3. Another legacy of the direction is to build a joint training scheduling method to school language model