The model and theory development of Gan-depth learning

Last Update:2018-08-22 Source: Internet

Author: User

Tags nets generative adversarial networks

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the past year or two, the emerging generation model generative adversarial Networks (GAN) has made a big progress in generating tasks. Although there are many problems such as the instability of GAN in the course of being presented, the researchers have improved it from the aspects of model, training technique and theory respectively. This article aims to comb these related work.

Although most of the time, supervised learning is more effective than unsupervised. But in the real world, there are relatively few data annotations (labels) that are needed to monitor learning. So researchers have never given up on exploring better unsupervised learning strategies, hoping to learn from the vast amount of representation data about this real-world representation (or even knowledge) to better understand our real world.

There are many ways to evaluate unsupervised learning, in which the generation task is the most direct one. Only when we can generate/create our real world can we show that we understand it completely. However, the generation model (generative models) that the build task relies on often encounters two major difficulties. First of all, we need a lot of prior knowledge to model the real world, including the choice of what the prior, what kind of distribution and so on. and the quality of modeling directly affects the performance of our generation model. Another difficulty is that real-world data is often complex, and the amount of computing we use to fit the model is often very large and even unbearable.

And over the past year or two, there has been a new and exciting model that has been very good at avoiding these two major difficulties. This model is called generative adversarial Networks (GAN), which is presented by [1]. In the original GAN paper [1], the author uses game theory to explain the idea behind the Gan frame. Each GAN frame contains a pair of models-a generative model (G) and a discriminant model (D). Because D exists, it makes the G in GAN no longer need prior knowledge and complex modeling of real data, but also can learn to approximate real data, finally let the data produced by it to reach the true point--d can not be separated--thus G and D achieve some kind of Nash equilibrium. [1] The authors have given a metaphor in their slides: in GAN, the Model (G) and discriminant model (D) are the thieves ' relationship with the police. G-generated data, the goal is to deceive the police as a discriminant model (D). In other words, G as a thief, as far as possible to improve their means of stealing, and D as the police should be as much as possible to improve their business level to prevent being deceived. Therefore, the learning process under the GAN framework becomes a competitive process between a generative model (g) and a discriminant model (D), which is randomly taken from a real sample and a "False sample" generated by a generated model (g), let the discriminant model (D) determine whether it is true. So, embodied in the formula, is the following such a Minmax form. However, although GAN no longer needs to be modeled, this advantage also brings some trouble. That is, although it uses a noise z as a priori, the generation model uses this z and is uncontrollable. In other words, Gan's learning mode is too free, making Gan's training process and training results are often not very controllable. In order to stabilize GAN, later researchers presented many training techniques and improvement methods from the angle of heuristic, model improvement and theoretical analysis respectively.

For example, in the original GAN thesis [1], the update process for each learning parameter was set to D update K-back, and G was updated 1 times to reduce G's "freedom" considerations.

Another heavyweight study of GAN training techniques is Deep convolutional generative adversarial Networks (Dcgan) [6]. [6] summed up many of the network structure design for GAN and the training experience for CNN network. For example, they replaced the pooling layer in the traditional CNN with strided convolutional networks, which turned the generation model in Gan (G) into fully differentiable, which made the GAN training more stable and Controllable.

To improve the stability of the training, another natural angle is to change the learning method. To turn a pure unsupervised GAN into a semi supervised or supervised one. This can add a little restraint to the practice of GAN, or add a little bit of goal. The Conditional generative adversarial Nets (Cgan) proposed in [2] is a very direct model change, introducing Conditional variable y in the modeling of the Model (G) and discriminant model (D), which is A label for the data. As a result, Cgan can be seen as an improvement in turning unsupervised GAN into a supervised model. This simple and straightforward improvement has proven to be very effective and is widely used in subsequent related work.

The third way to improve the freedom of GAN is to be more similar to the first. Since it is too difficult to control the study of Gan, we would like to disassemble, do not let Gan one time to learn all the data, but let Gan step by step to complete the learning process. In the case of a picture generation, don't let the build model in GAN (G) generate an entire picture each time, but let it generate part of the picture. This thought can be regarded as a variant of DeepMind's famous work DRAW. DRAW's paper [3] begins by saying that we humans rarely do a single piece of work when drawing a picture. Since we humans are not like this, why should we hope that the machine can do it. The Lapgan of the thesis [4] is based on this idea, which turns the learning process of GAN into sequential "sequential". Specifically, Lapgan adopted the laplacian pyramid implementation of the "serialization", and therefore named to do Lapgan. It is worth mentioning that this Lapgan also has a "residual" learning of the idea (and later the fire ResNet is also a bit related). In the learning sequence, Lapgan continuously downsample and upsample operations, and then in each pyramid level, only the residuals are passed to the discriminant model (D) for judgment. Such a combination of sequential + residuals can effectively reduce the content and difficulty that Gan needs to learn, thus achieving the purpose of "auxiliary" Gan learning.

Another work based on sequential thought to improve GAN comes from the GRAN in [5]. Unlike Lapgan [4] Each sequential step (pyramid level) is independently trained, GRAN combines GAN with lstm, so that each of the steps in sequence can be learned and generated to take full advantage of the results of the previous step. Specifically, each step of the GRAN has a cell,c_t like Lstm, which determines the content and results of each step, and h_{c,t in GRAN, as lstm, represents hidden states. Since it is a combination of lstm and Gan, then the introduction of Lstm, it is Gan. The GRAN of the Model (G) in GAN is modeled as the hidden of prior h_z, then the H_z and H_{c,t} are spliced (concatenate) and passed to each step c_t.
Finally, the method of improving the training stability of GAN is more close to the essence and the latest research results. This is the so-called OpenAI recently one of the five major breakthroughs in Infogan [7]. The starting point of Infogan [7] is that since the freedom of Gan is due to having only one noise Z, it is not possible to control how Gan uses this Z. So we try to find a way to make a fuss about "How to use Z". So, [7] Z was dismantled, it is considered that the generation model (G) of GAN can be divided into two kinds: (1) Noise z which cannot be compressed, (2) and a group of implicitly variable c_1, c_2, ..., c_l, and C. The idea here is that when we learn to generate images, there are many controllable, meaningful dimensions of the image, such as the thickness of the stroke, the direction of the illumination of the picture, and so on, which is C, and the rest does not know how to describe Z. In this way, [7] actually hopes that Gan can learn more disentangled data representations (representation) by dismantling a priori, so that both the learning process of Gan can be controlled, and the results obtained are more interpretive. To introduce this c, [7] utilizes the modeling method of mutual information, i.e. C should and generate model (g) based on z and C images, i.e. G (Z,C), highly correlated--mutual information is large. Using this more detailed implicit variable modeling control, Infogan can say that the development of GAN is a step forward. First, they demonstrate that C in Infogan has a real help in the training of GAN, which allows the model (G) to produce results that are more consistent with real data. Second, they use the natural properties of C to control the dimensions of C so that Infogan can control the changes in a given semantic dimension of the resulting picture.

In practice, however, Infogan is not the first to introduce information-theoretic perspectives into the GAN framework. This is because, before Infogan, there is a work called F-gan [8]. Moreover, GAN itself can be explained from the perspective of information theory. As stated at the beginning of this article, in the original Gan thesis [1], the author explains the thought of Gan through game theory. However, the data and real data generated by the GAN model (G) can be seen as two sides of a coin. When a coin is thrown to the front, we show a real data sample to the discriminant model (D), or, conversely, a "fake" sample generated by the build model (G). The ideal state of GAN is that the discriminant model (D) is almost identical to the judgment of the coin, that is, the data produced by the model (G) exactly conforms to the real data. So what the GAN training actually does is to minimize the mutual information between the coin and the real data. The smaller the mutual information, the less information the discriminant model (D) can obtain from the observation, the more likely it is to guess the result as "random". Given this understanding of Gan from an information perspective, is it possible to make further modifications to Gan? In fact, it is possible. For example, modeling for mutual information can be further generalized as an optimization target based on divergence. This discussion and improvement can be found in the paper [8],f-gan.

These improvements to GAN are done almost in just 1.5 hours, especially in the last six months. The biggest reason is that GAN, compared with the previous generative models, cleverly converts the "true and false" sample into a recessive label, thus realizing a "unsupervised" generative model training framework. This kind of thought can also be regarded as a kind of deformation of skip-gram in Word2vec. In the future, not only GAN's more improvements are worth looking forward to, but the development of unsupervised learning and generative models is equally worthy of attention. References:

1. "Generative adversarial Nets"

2. "Conditional generative adversarial Nets"

3. "Draw:a Recurrent neural Network for Image Generation"

4. Deep generative Image Models using a Laplacian pyramid of adversarial Networks

5. "Generating Images with recurrent adversarial Networks"

6. "Unsupervised representation Learning with Deep convolutional generative adversarial"

7. "Infogan:interpretable representation Learning by information maximizing generative adversarial Nets"

8. "F-gan:training generative neural samplers using variational divergence"

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More