This paper mainly expounds the understanding of the Generative Warfare network, first discusses what is the confrontation sample, and its relationship with the network, and then explains each component of the network, then combines the algorithm flow and code implementation to explain how to implement and execute the algorithm. Finally, the result is very bad, but it is very interesting by giving a denoising network based on the counter network rewriting.
"Reprint please indicate the source" Chenrudan.github.io
This paper mainly expounds the understanding of the Generative Warfare network, first discusses what is the confrontation sample, and its relationship with the network, and then explains each component of the network, then combines the algorithm flow and code implementation to explain how to implement and execute the algorithm. Finally, the results of a denoising network based on counter network rewriting are given, although the effect is very bad, but some places are very interesting. 1. Against sample 2. Generation against the network Gan 3. Code Explanation 4. Run Instance 5. Summary 6. Reference 1. Confrontation sample (adversarial examples)
In the 14, when Szegedy studied the nature of the neural network, it was found that for an already trained classification model, a few minor changes in the training focus sample would result in the model giving a false classification result, which, although disturbed, could not be identified, And the samples that lead to the false classification are called counter samples, and they use such samples to invent the confrontation training (adversarial training), which can improve the generalization ability of the model by training the normal samples and training their own counter samples. [1]. As shown in the following illustration, the model thinks that the 57.7% probability of the input picture is panda before it is disturbed, but after the addition, the person looks like there is no change, but the model thinks that 99.3% of the possible gibbons.
Fig. 1 The generation of the confrontation sample (Fig source [2])
The problem at first glance looks like a fitting, and in the Goodfellow in 15 [3] It was mentioned that the model's lack of fitting could also lead to a confrontation sample, because the phenomenon is that the input has undergone a certain degree of change resulting in the output is incorrect, such as the following figure one, the upper and lower is the cross fitting and the lack of fit caused by the confrontation sample, Where the green O and X represent the training set, the red O and X are the confrontation samples, it is obvious that the input changes in the case of an improper fit can also cause the classification to be incorrect (in fact, I think it's a bit strange because the confrontation sample depicted in the diagram is not necessarily the same as the original sample, it feels like a man-made thing, Rather than the feedback of the real data. In [1] The author thinks that this phenomenon may be due to the non-linear and Goodfellow of neural networks, but it gives a more accurate explanation, that is, the problem of the counter sample is caused by the linear property of the model, which is because the wTx wTx exist dot multiplication, when x Each dimension of x changes x˜=x+ηx~=x+η, and adds a larger and wtx˜=wtx+wtηwtx~=wtx+wtη to the result of the dot multiplication, which may change the predicted result. For example, in [4] Given an example, assuming that now use logical regression to do two classification, the input vector is x=[2,−1,3,−2,2,2,1,−4,5,1] x=[2,−1,3,−2,2,2,1,−4,5,1], the weight vector is w=[−1,−1,1,−1,1,−1, 1,1,−1,1] w=[−1,−1,1,−1,1,−1,1,1,−1,1], the point-multiply result is-3, the probability of the class prediction to be 1 is 0.0474, if the input is changed to xad=x+0.5w=[1.5,−1.5,3.5,−2.5,2.5,1.5,1.5 , −3.5,4.5,1.5] xad=x+0.5w=[1.5,−1.5,3.5,−2.5,2.5,1.5,1.5,−3.5,4.5,1.5], Then the probability of a class prediction of 1 becomes 0.88, because the input is changed in each dimension, resulting in inconsistent results.
Fig. 2 The cross/low fit leads to a confrontation sample (Fig source [3])
If we think that the counter sample is caused by the linear property of the model, then we can construct a method to generate the adversary sample, that is, how to add the perturbation on the input, Goodfellow gives a construction method fast gradient sign method[2], where J J is the loss function, Then take a derivative of the input x x, θθ is the model parameter, and the Εϵ is a very small real number. Figure 1 is the ϵ=0.007ϵ=0.007.
Η=ϵsign (▽XJ (Θ,x,y)) (1) η=ϵsign (▽XJ (Θ,x,y)) (1)
This construction method has more examples in [4], here intercepts two examples to illustrate, uses the imagenet picture scaling to the 64*64 to train one layer of perceptron, the input is 64*64*3, the output is 1000, the weight is 64*64*3* 1000, after training, take the weight matrix corresponding to an output category of a row of 64*64*3, restore this line to the 64*64 picture as the second column in the following figure, and then use Formula 1 to figure out the third row from the original picture in the first column, you can see the first line from the forecast for the fox into a prediction for the goldfish, The second line becomes the forecast for the school bus.
Figure 3 Constructing a confrontation sample (Fig source [4])
In fact, it is not only a pure linear model will appear in this case, convolution network convolution is actually linear operation, so there are predictive instability, relu/maxout even the middle part of the sigmoid is actually a linear operation. Since it is possible to construct a counter sample of its own, it can be used to train the model to make the model more capable of generalization. Thus [2] given a new objective function, which is the following, it is equivalent to adding some interference to the input, and the experimental results show that the trained model is more resistant to the impact of the counter sample.
J˜ (θ,x,y) =αj (θ,x,y) + (1−α) J (Θ,x+ϵsign (▽XJ (Θ,x,y))) (2) j~ (θ,x,y) =αj (θ,x,y) + (1−α) J (Θ,x+ϵsign (▽XJ (Θ,x,y))) (2)
There is no direct relationship between the counter sample and the generation counter network, the confrontation network is to learn the intrinsic expression of the sample to generate a new sample, but the existence of the counter sample shows that the model does not learn some internal expression or distribution of the data. It is possible to learn some specific patterns enough to complete the classification or return to the goal. The construction method of Formula 1 only makes a very small change in the direction of the gradient, but the model cannot be properly classified. In addition, a phenomenon was observed, using multiple classifiers of different structures to learn the same data, often mistakenly divided the same counter sample into the same class, which appears that all classifiers are disturbed by the same changes. 2. Generation vs. Network Gan
14 Goodfellow proposed generative adversarial nets is the generation of confrontation network [5], it is to solve the problem is how to learn from the training samples of new samples, training samples are pictures to generate new pictures, training samples are articles on the output of new articles and so on. If you can know the distribution of the training samples p (x) p (x), then we can randomly sample the new samples in the distribution, most of the generation models use this idea, gan is learning from the random variable Z to the training sample X x mapping relationship, where the random variable can choose to obey the too distribution, Then we can get a generation network G (z;θg) g (z;θg) composed of multilayer perceptron, the input of the network is a one-dimensional random variable, the output is a picture. How to make the output of a forged image look like a training sample, Goodfellow uses a method that is followed by a network of multilayer perceptron D (x;θd) d (x;θd), the input of which is to randomly select a real sample or generate a network output, Output is the input image from the real sample pdata pdata or the probability of generating a network PG PG, when the discriminant network can be very good to identify the input is not a real sample, but also through the gradient to explain what kind of input more like a real sample, so through this information to adjust the generation network. So G G needs to make its output as real as possible, while D D will be as far as possible to distinguish it from the real sample. The next figure on the left is the probability interpretation of the GAN algorithm, and the right side is the model composition.
Fig. 4 gan algorithm block diagram (Fig Source [6])
Gan optimization is a minimax game problem, the ultimate goal is to generator output to discriminator when it is difficult to judge is true or forged, that is, to maximize the judgment of D D, minimization of the output of G g as a forgery probability, the formula is as follows. In the paper [5], the following formula is converted to the Jensen-shannon divergence to prove that the global minimum value can be obtained only when the Pg=pdata Pg=pdata, that is, the generation network can completely restore the real sample distribution, and prove that the lower form can converge. (Algorithmic process paper is very clear, here is not said, followed by the combination of code to explain.) )
MINGMAXDV (d,g) =ex∼pdata (x) [Logd (x)]+ez∼pz (z) [Log (1−d (G (z))]] (3) MINGMAXDV (d,g) =ex∼pdata (x) [Logd (x)]+ez∼pz (z) [ Log (1−d (G (z)))] (3)
The above is about the most basic Gan introduction, the first I read the paper after a few questions, 1. Why not direct learning G g, that is, directly learning a z Z to an X x. 2. G G is specifically trained. 3. In training, Z Z and x x are one by one corresponding relationships. You can probably give an explanation after understanding the code. 3. Code Interpretation
This part mainly combines TensorFlow implementation code [7], algorithm flow and the following change chart [5] to explain how to use Dcgan to generate handwritten pictures.
The black dotted line in the figure below is the Gaussian distribution of the real data, the green Line is generated by network learning to forge the distribution, the Blue line is judged by the network to determine the probability of the real picture, superscript x horizontal lines represent the Gaussian distribution x sampling space, the horizontal Z line represents the uniform distribution of z sampling space. It can be seen that G G is learning from Z's space to X's space mapping relationship.
Fig. 5 All probability distributions of Gan at run time (Fig source [5])
A. Initial situation
D d is a convolution neural network in which the variable name is D and one of the layers is constructed as follows.
1
2 3
4 5
6 7 8
W = tf.get_variable (' W ', [4, 4, C_dim, Num_filter], initializer =tf.truncated_normal_initializer (Stddev=stddev)) DC
Onv = tf.nn.conv2d (Ddata, W, strides=[1, 2, 2, 1], padding= ' SAME ') biases
= tf.get_variable (' biases ', [num_filter],
Initializer=tf.constant_initializer (0.0))
Bias = Tf.nn.bias_add (dconv, biases) Dconv1 = Tf.maximum (bias, Leak*bias) ...
G G is an inverse convolution neural network, and the variable name is G, one of which is constructed in the following way.
1
2
3
4
5
6 7 8 9 10
W = tf.get_variable (' W ', [4,
4, Num_filter, num_filter* 2],
Initializer=tf.random_normal_initializer (Stddev=stddev))
Deconv = Tf.nn.conv2d_transpose (Gconv2, W,
output_shape=[batch_size, s2, S2, Num_filter],
strides=[1, 2, 2,
1]) biases = tf.get_variable ( ' Biases ', [num_filter], initializer=tf.co
Nstant_initializer (
0.0)) bias = Tf.nn.bias_add (Deconv,
biases) Deconv1 = Tf.nn.relu (bias, Name=scope.name)
...
The network input of G g is a random variable with a Zdim Zdim dimension obeying the -1~1 uniformly distributed, where it takes 100.
1
2
Batch_z = Np.random.uniform (
-1,
1, [Config.batch_size, Self.z_dim])
. Astype (Np.float32)
The network input of D d is a batch 64*64 picture, which can be either handwritten data or a batch output of G G.
This process can refer to a state of the graph above, the discriminant curve is not stable enough, both networks have not been trained well.
B. Training discriminant Network
The loss function of discriminant network is composed of two parts, one is the loss that the real data is 1, and the other is the loss of the output self.g of G G, and the loss function which needs to be optimized is defined as follows.
1
2 3
4 5
6
7 8 9
Self. G = Self.generator (self.z) self. D, self. D_logits = Self.discriminator (self.images) self. D_, self. D_logits_ = Self.discriminator (self.
G, reuse= True)
Self.d_loss_real = Tf.reduce_mean (Tf.nn.sigmoid_cross_entropy_with_logits ( Self. D_logits, Tf.ones_like (self. D)) Self.d_loss_fake = Tf.reduce_mean (tf.nn.sigmoi D_cross_entropy_with_logits (self. D_logits_, Tf.zeros_like (self. D_)) self.D_loss = Self.d_loss_real + self.d_loss_fake
Then a batch real data batch_images, and a random variable batch_z as input, to perform session update D D parameters.
1
2
3
4
5
6
# Update discriminator on real
D_optim = Tf.train.AdamOptimizer (flags.learning_rate,
beta1=flags.beta1). Minimize (D_loss, var_list=d_vars) ...
OUT1 = Sess.run ([D_optim], feed_dict={real_images:batch_images,
noise_images:batch_z})
This step can be compared to Figure B, the discriminant curve gradually stabilized.
C. Training Generation Network
The build network does not have a separate objective function, it updates the network gradient source is to identify the network to forge a picture of the gradient, and is in the setting of fake pictures of the label is 1, keep the discriminant network unchanged, then discriminant network to forge a picture of the gradient is towards the real picture change direction.
1
2
Self.g_loss = Tf.reduce_mean (Tf.nn.sigmoid_cross_entropy_with_logits (
self. D_logits_, Tf.ones_like (self. D_)))
Then use the same random variable batch_z as the input update
1
2
3
4
G_optim = Tf.train.AdamOptimizer (config.learning_rate, beta1=config.beta1)
. Minimize (Self.g_loss, var_list= Self.g_vars) ...
Out2 = Sess.run ([G_optim], feed_dict={noise_images:batch_z})
This step can be compared to Figure C, pg PG Curve in the gradual move closer to the real distribution. And after the network training is completed can see the PG PG curve and pdata pdata overlap together, and at this time discriminant network has been difficult to distinguish between real and forged, so the value is fixed at 12 12.
So for my previous question, 2 already have the answer, for 1, why not directly learn G g? This is because it is not possible to determine Z Z and x x one by one correspondence, just like the following figure, two correspondence, if you want to be sure who is wrong, then you have to add some prior information, even directly to the real sample estimates, Then it's not the same as the other way. and question 3, in training, Z Z and x x are one by one correspondence? I'm starting to think about this because it's not clear if a 100-dimensional noise variable corresponds to a picture of a handwritten variable, but now it should be understood that at the level of training there is not a one by one counterpart, or even both in training D D are separated, but there may be such a relationship in the distribution.
Figure 6 Z and x map (figure source [8]) 4. Run instance
I was going to use Gan to run a noisy network, based on [7] code changed the input, from a 100-dimensional noise vector into an input picture, while the front of the generator network into a convolution network, and then connected to the original inverse convolution, has become a denoising network, Here I do not have much time to carefully adjust the network layer, parameters, etc., just try it, the effect is not particularly good. The code is in [9]. First, I use the read_stl10.py to add the Gaussian noise with a mean 0 variance of 50 to the Stl10 dataset, which is compared with the following.
Figure 7 Adding Gaussian noise before and after contrast
Then the implementation of the network, will get the following denoising effect, from left to right are added noise input pictures, the corresponding generator network output picture, has a corresponding clean picture, the effect is not particularly good, contour can learn a little, but this color has not learned.
Fig. 8 denoising compared to 5. Summary
When we first started searching for information, we found a confrontation sample, think of the relationship with the network, it looked, and later read Goodfellow's paper found that there is no relationship, but still wrote some content, because the existence of this thing is still worth understanding, and the idea of confrontation network is really awesome, It transforms a unsupervised problem into a supervised, it's more like a learn way to learn how data should be generated, not find, but training is also a problem, and in my experience it's particularly easy to fit, and there's a sense of confrontation inside, Because the generator input is good and bad, overall is a great algorithm, very much looking forward to the next study. 6. Reference
[1] intriguing properties of neural networks
[2] explaining and harnessing adversarial examples
[3] Adversarial examples
[4] Breaking Linear classifiers on imagenet
[5] Generative adversarial Nets
[6] Quick Introduction to Gans
[7] Carpedm20/dcgan-tensorflow
[8] Generative adversarial Nets in TensorFlow (part I)
[9] chenrudan/deep-learning/denoise_dcgan/