I haven't updated my blog for a long time, but it seems that every time I update my blog I say that. Recently, there are some new experiences of life, work, the new environment is finally adapted to know how to put the relationship between work and life, how to work efficiently while still able to continue to do their own interests. Psychologically, I still do that simple technical boy, life attitude is idealistic but can face reality ...
Suddenly think of this is a technical blog, or not to say their own things, say a lot of confrontation network bar. introduced the prosperity and development of discriminative model
More recently, the breakthrough in many areas of deep learning must not be more than I have to say. But everyone seems to have found the reality that the depth of learning to achieve breakthrough progress is seemingly a discriminative model.
The so-called discriminative can be simply thought of as a classification problem, such as giving a picture to determine what animal is in the picture, and then, for example, to determine the corresponding text for a given voice.
In the discriminative model, there are many effective methods, such as reverse communication, dropout,piecewise linear units and other techniques. generative Model
In fact, this paper is very early to see, but I am the generation model in the AI position has been not particularly intuitive feelings. Have only recently slowly understood.
In detail, generating models can do something out of the way. For example, the image of the high-definition, cover up a part of the image to repair, and then or draw a portrait of the human face, rendering it into lifelike photos and so on.
One more layer, the ultimate creation of the model is to create something by discovering the rules in the data, which corresponds to the real AI. Think of a person, he can see, hear, smell the world, this is called discriminative, he can also say, painting, think of some new things, this is the creation. Therefore, the generation model I think is the AI in the recognition of the development of the task is quite mature after another stage of AI development. Circumstance
But now, the generation model has not yet experienced a good depth of learning, in the discriminative model, the results are springing up, but in the generation model, but not so. The reasons are as follows: in maximum likelihood estimation and related strategies, it is very difficult to simulate the piecewise linear units on the generation model of many probabilistic simulations.
Then, is not the generation model can not borrow the depth of learning development of the East Wind. I can only say, sometimes, have to curve the nation. against the network Basic Ideas
Suppose there is a probability distribution m, which is relative to us is a black box. To understand what's in this black box, we've built two things G and d,g are another probability distribution that we know perfectly well, D is used to distinguish whether an event is produced by the one unknown in the black box or by the G we set ourselves.
Constantly adjust G and D until D can't differentiate the events. In the tuning process, you need to: optimize g to make it as confusing as possible. Optimize d to make it as capable as possible of distinguishing between counterfeit things.
When D cannot distinguish the source of an event, it can be assumed that G and M are the same. So we get to know what's in the black box. A simple example illustrates
And look at the top four pictures a,b,c,d. The black dotted line represents some of the data produced by M, the red line represents our own simulated distribution g, and the Blue line represents the classification model D.
A diagram shows the initial state, and B shows that it keeps g motionless and optimizes d until the classification is the highest accuracy rate.
The C diagram shows that D stays the same and optimizes g until the highest degree of confusion. The d figure shows that after multiple iterations, the G is finally able to completely produce the data you and M have, thus thinking that G is M. formalized
The process described in the above example is formulated and the above formula is obtained. D (x) in the formula indicates that x is the probability of distributing m, so the time to optimize D is to make V (d,g) the largest, and to optimize G is to minimize the V (d,g).
where X~pdata (x) indicates that x is taken from the true distribution.
Z~pz (z) indicates that z is derived from the distributions we simulate. G represents the generation model, D represents the classification model.
The above is the training process of G and D. In each iteration, the gradient drops K to train D, and then the gradient drops once to train G, because D's training is a time-consuming operation, and in a limited set, the number of training times is too easy to fit. Proof
The idea in this paper is as described above, but it is interesting to have two proofs to theoretically demonstrate the rationality of the Confrontation network. Proposition One
The first proof is that when G is fixed, D has the only optimal solution. The true description is as follows:
The proof is as follows: first, the transformation of V (g,d)
For any a,b∈r2 \ {0, 0}, the following formula is optimal at A/(a+b).
Get the card. theorem One
According to proof one, you can transform the steps of maximizing D in V (g,d).
And get the theorem.
Directly into the pg=pdata can be-log4, when into the Pg!=pdata, get
Proposition Two
The original text of Proposition Two is as follows:
The proof of this theorem requires the use of a seemingly obvious theorem of a convex function, that is, the derivative of a function at its maximum value can be found by the secondary derivative of the upper bounds of the convex function. This theory applies to G and D, where G is invariant, and D is a convex function that has a unique optimal value, and thus can be obtained. But because I am not familiar with convex optimization theory, I do not understand this place thoroughly. Experiment
Early in the training, D can easily distinguish between G and m in different samples, which will saturate, so use LOGD (g (z)) instead of log (1-d (g (z)), which can provide a better gradient for early learning.
The experiment is to fit Guassian Parzen Windown, the specific details skip. The results are as follows:
Strengths and Weaknesses
Advantages: Markov chain does not need, only need to spread on the back. The generation network does not need to be updated directly with the sample, which is a possible advantage. The ability to communicate against the network is stronger, and the model based on Markov chain needs to be distributed more fuzzy to mix the different modes.
Disadvantage: For the generation model, there is no direct expression, but by some parameters control. D need to sync with g very well.
The comparison of the various build models is as follows:
Reference Ian J. Goodfellow. Generative adversarial Nets. Depth | OpenAI Ian Goodfellow's Quora question and answer: a roaring machine learning the progress of Generation against network Gan (II.)--Original Gan