Paper Link Summary
The purpose of our research on the condition against the network is to use it as a general solution to the problem of "translation" of pictures. These networks learn not only the mapping from input images to output images, but also the loss functions that train this mapping. This makes it possible to resolve this general approach to problems that normally require completely different loss function formulas. We have shown that this method is useful for synthesizing pictures from tag graphs, for reconstructing targets from edge graphs, and for coloring problems on pictures. As a community, we no longer need to artificially construct the mapping function, and this work has found that we can draw reasonable results without artificially constructing the loss function.
Many problems in image processing, computer image synthesis and computer vision can be viewed as a "translation" problem from an input picture to a corresponding output picture. Just as a concept can be expressed in English or French, a scene may be expressed in RGB images, gradient fields, edge graphs, a semantic tag graph, and so on. In the metaphor of automatic language translation, we define the automatic conversion of pictures to pictures to convert the possible representation of a scene into another when enough training sets have been given. content
A) objective
: Traditional pictures to the "transformation" of the picture usually requires the artificial construction of complex and reasonable loss function, for different problems must adopt a specific mechanism, although their background is from pixel to pixel mapping (PIX2PIX). However, Gan is a structure that does not need to construct a complex loss function, and it automatically learns the mapping from input to output pictures. Therefore, the application of this to the picture "translation" problem, you can implement a generalization of the model.
b) Results contribution
: It is proved that conditional Gan can produce reasonable results in a wide range of problems, a simple framework that can achieve good results is presented, and the choice of important structure is analyzed.
c) Objective function
: This is similar to the ordinary Cgan,
G network goal is to minimize the objective function, D network goal is to maximize the objective function.
The authors also consider applying the traditional loss function to the entire model, as the previous method proved to be effective, but the authors did not use the L2 norm, but instead used the L1 norm, which is the difference between the input and output, the temperature mentioned that the high frequency structure can be modeled. Get the final objective function:
d) Network structure
: Unlike the traditional Encoder-decoder network structure, in this paper, the author uses the u-net structure as the generator, differs from the traditional structure is that all the information flow does not all pass through all layers, And this allows input and output to have a low-level structure similar (not mirrored) can get good results. (Don't know why)
On the discriminant, the author puts forward a kind of patchgan structure, which only makes the adjustment to the structure of small block scale. And in the NXN batch, when N is much smaller than the size of the picture can also be a good result, but with fewer parameters, fast training and can be applied to any picture. (It's a bit like the CNN structure of the D network I've set up, and it's a high level feature of the input.)
e) Training Process
: Alternate training D network and G network, and the same number of steps. The number of times this training problem can vary depending on the specific task, each paper has different training parameters.
f) Experimental results
: The author used a lot of data sets for training tests, the results are better than the original ~
(g) Finally, the authors used the control variable method to test the pros and cons of Gan and Cgan,l1+gan and Gan respectively, and finally obtained the best results of L1+cgan. Under analysis, L1 can make the distribution of output narrower than the input, resulting in an averaging result. And Cgan can make the result more sharp and not blurred, make the output closer to the real, to achieve.
h) The appendix part of the following temporarily did not see, but the concrete code also temporarily did not realize ~ 2. Experience
A) Why Gan can be applied to multiple scenarios. Because of the previous task, we want to generate specific output, we must use professional knowledge to construct a specific loss function, the result is not satisfactory, the output is very vague, not to produce sharp, real output. But Gan only makes clear a high grade goal, namely whether directly to the output to be able to distinguish with the real data, thus automatically obtains one to satisfy the target the loss function.
B) In that part of the goal function construction, the author proposes to include the traditional loss as a goal. This is similar to the principle of pre training with MLE before training. The D network is similar to CNN with Patchgan D networks. began to understand the role of these institutions in the model. Now I understand some: pre-training can make the result near the real distribution, use in advance training can accelerate the training process, because the process of Gan is too slow, too difficult, and CNN is learning local characteristics, avoid too many parameters, avoid the fit.