Study Notes: GAN002: DCGAN, gan002dcgan
Ian J. Goodfellow Thesis: https://arxiv.org/abs/1406.2661
Two networks: G (Generator), generate network, receive random noise Z, generate sample through noise, G (z ). D (Dicriminator): identifies the network and determines whether the sample is true. input sample x and output D (x) indicate the true probability of x. If the sample is 1,100% real, if the sample is 0, it indicates that it cannot be a real sample.
During the training process, generate the Network G and generate the actual sample as much as possible to cheat the network D. the discriminative network D should try to separate the samples generated by G and the actual samples. Ideally, G generates sample G (z), making it difficult for D to determine whether it is true or false, and D (G (z) = 0.5. In this case, model G is generated to generate samples.
Mathematical formula: minG maxDV (D, G) = Ex ~ Pdata (x) [logD (x)] + Ez ~ Pz (z) [log (1-D (G (z)]
Binary. X real sample, z input GNET noise, G (z) GNET generation sample. Network D (x) D determines whether a real sample has a probability. The closer it is to 1, the better. D (G (z) D network determines the true probability of samples generated by G network. G network, D (G (z) as big as possible, V (D, G) as small as min_G. D network, the larger D (x), the smaller D (G (x), the larger V (D, G), max_D.
1. x sampled from data-> Differentiable function D-> D (x) tries to be near 1
2. Input noise z-> Differntiable function G-> x sampled from model-> D tries to make D (G (z) near 0, G tries to make D (G (z) near 1
Train D and G by the random gradient descent method.
Algorithm 1 Minibatch stochastic gradient descent training of genegative adversarial nets. the number of steps to apply to the discriminator, k, is a hyperparameter, we used k = 1, the least expensive option, in our experiments.
For number of training iterations do
For k steps do
Sample minibatch of m noise samples {z (1),..., z (m)} from noise prior pg (z)
Sample minibatch of m examples {x (1),..., x (m)} from data generating distribution pdata (x)
Update the discriminator by ascending its stochastic gradient:
End
Sample mninbatch of m noise samples {z (1),..., z (m)} from noise prior pg (z)
Update the generator by descending its stochastic gradient:
End
The gradient-based updates can use any standard gradient-based learning rule. We used momenttum in our experiments.
Step 1 train D, V (G, D), the larger the better, the higher the (Increase) gradient (ascending ). Step 2 train G, V (G, D) the smaller the better, the lower the gradient (descending ). Alternate.
DCGAN principle. Https://arxiv.org/abs/1511.06434. Alec Radford, Luke Metz, Soumith Chintala, Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Replace G and D with convolutional neural networks (CNN ). Cancel all pooling layers. GNET uses transposed convolutional layer for sampling, and D networks are added to stride convolution to replace pooling. D and G are both batch normalization. Remove the FC layer and enable full-convolution networks. GNET uses ReLU to activate the function, and the last layer uses tanh. D. Use LeakyRelu to activate the function.
G network. Project reshape 100 z-> 4X4X1024-> 8X8X512-> 16X16X256-> 32X32X128-> 64X64X3
Use DCGAN to generate an animation Avatar:
Http://qiita.com/mattya/items/e5bfe5e04b9d2f0bbd47.
Raw data collection. Http://safebooru.donmai.us. Http://konachan.net.
Crawler code:
Import requests
From bs4 import BeautifulSoup
Import OS
Import traceback
Def download (url, filename ):
If OS. path. exists (filename ):
Print ('file exists! ')
Return
Try:
R = requests. get (url, stream = True, timeout = 60)
R. raise_for_status ()
With open (filename, 'wb ') as f:
For chunk in r. iter_content (chunk_size = 1024 ):
If chunk: # filter out keep-alive new chunks
F. write (chunk)
F. flush ()
Return filename
Except t KeyboardInterrupt:
If OS. path. exists (filename ):
OS. remove (filename)
Raise KeyboardInterrupt
Failed t Exception:
Traceback. print_exc ()
If OS. path. exists (filename ):
OS. remove (filename)
If OS. path. exists ('imgs') is False:
OS. makedirs ('imgs ')
Start = 1
End = 8000
For I in range (start, end + 1 ):
Url = 'HTTP: // konachan.net/post? Page = % d & tags = '% I
Html = requests. get (url). text
Soup = BeautifulSoup (html, 'html. parser ')
For img in soup. find_all ('img ', class _ = "preview "):
Target_url = 'HTTP: '+ img ['src']
Filename = OS. path. join ('imgs', target_url.split ('/') [-1])
Download (target_url, filename)
Print ('% d/% d' % (I, end ))
Avatar interception:
Https://github.com/nagadomi/lbpcascade_animeface.
Encapsulation:
Import cv2
Import sys
Import OS. path
From glob import glob
Def detect (filename, cascade_file = "lbpcascade_animeface.xml "):
If not OS. path. isfile (cascade_file ):
Raise RuntimeError ("% s: not found" % cascade_file)
Cascade = cv2.CascadeClassifier (cascade_file)
Image = cv2.imread (filename)
Gray = cv2.cvtColor (image, cv2.COLOR _ BGR2GRAY)
Gray = cv2.equalizeHist (gray)
Faces = cascade. detectMultiScale (gray,
# Detector options
ScaleFactor = 1.1,
MinNeighbors = 5,
MinSize = (48, 48 ))
For I, (x, y, w, h) in enumerate (faces ):
Face = image [y: y + h, x: x + w,:]
Face = cv2.resize (face, (96, 96 ))
Save_filename = 'your s-0000d.jpg '% (OS. path. basename (filename). split ('. ') [0], I)
Cv2.imwrite ("faces/" + save_filename, face)
If _ name _ = '_ main __':
If OS. path. exists ('faces') is False:
OS. makedirs ('faces ')
File_list = glob ('imgs/*. jpg ')
For filename in file_list:
Detect (filename)
Training:
Https://github.com/carpedm20/DCGAN-tensorflow.
Model. py:
If config. dataset = 'mnist ':
Data_X, data_y = self. load_mnist ()
Else:
Data = glob (OS. path. join ("./data", config. dataset, "*. jpg "))
Create an animation folder in the data folder and put the image in it. Specify -- dataset anime during running.
Python main. py -- image_size 96 -- output_size 48 -- dataset anime -- is_crop True -- is_train True -- epoch 300 -- input_fname_pattern "*. jpg"
GAN Thesis: https://github.com/zhangqianhui/AdversarialNetsPapers.
SGD optimization. Objective functions determine and monitor learning outcomes. J (D) identifies the network target function and cross entropy (cross entropy) function. The D on the left identifies the real data, and the D on the right generates the noise data. J (G) generates the network target function.
The smallest and largest game. Equilibrium (NASH equilibrium), J (D) saddle point (saddle point ).
Real data and model-generated pseudo data (model distribution z ing ). Learn D to differentiate data and model distribution. Data and model distributions are added as denominator, and molecules are real data distributions. Target, D infinitely close to constant 1/2. Pmodel, Pdata infinitely similar. After the model is generated and the source data is fit, you cannot learn it again. The constant y = 1/2 is used to evaluate the derivation forever 0.
Non-Saturating ). G camouflage success rate indicates the target function, and the balance is not determined by the loss (loss. After D is perfect, G can continue to be optimized.
DCGAN (Deep Convolutional Generative Adversarial Network), reverse CNN.
Convolutinoal filter convolution filter (filter) is used to filter and convert images into various styles. Different filters convert images into different styles. Different styles are expressed by different features of the original image. Feature learning.
DCGAN creates images. Returns a set of feature values to an image.
At each filter layer, CNN extracts important features of a large image to reduce the image size step by step. DCGAN scales up the features of small images (small arrays) to form new images. The initial small data input by DCGAN is noise data. The image RGB matrix, which can be added or subtracted from a vector. Men wearing sunglasses-men without sunglasses + women without sunglasses = women wearing sunglasses. NLP, word2vec, king-man + woman = queen. After adding or subtracting a vector/matrix, the image is restored to the image represented by "image meaning. NLP, word2vec, vector corresponds to meaningful words; DCGAN, matrix corresponds to meaningful pictures.
Statistics, JS distance (minimax), KL distance, and divergence (divergence) equation. Create a target function. DKL (P | Q) = S ∞-∞ p (x) log (p (x)/q (x) dx.
The GAN neural network is constructed to optimize the model using the SGD-like method. The target function is important. Q noise data distribution. P target distribution. Calculate the Maximum Likelihood (Maximum Likelihood) to minimize the KL distance.
KL [P | Q] = SPlog (P/Q) dx = SPlogPdx-SPlogQdx
Both P and Q take x as the variable, and P is the real data classification. SPlogPdx is a constant. SPlogQdx only logQ is a variable. Proportional to-logQ.
KL [P | Q] =-Constant-S another constant · logQdx
Q, P (x | θ ). P model, θ parameter. Negative maximum likelihood.
The GAN-like algorithm minimizes any f-divergence equation.
In the face of infinite data, you can learn the true data distribution P. Reality: data is limited. KL formula theory, KL (P | Q), Q fitting real data P, greatly interpreting all P connotation (overgeneralization ). Multi-mode (multimodal), with insufficient data, and incomplete KL (P | Q) coverage. KL (Q | P), undergeneralization. First, the coverage is large, and then the coverage is small.
The G target function is transformed into the maximum likelihood. J (G) Derivation to obtain the maximum likelihood expression. Maximal Likelihood runs the fastest.
GAN can generate (replay) samples and convert them into a Reinforcement Learning model ). Shanghai Jiao Tong University SeqGAN [Yu et al. 2016.
Label the data to GAN. Learning condition probability p (y | x) is much easier than p (x) alone. Some labels can greatly improve GAN training performance and semi-supervised learning. Semi-supervised learning, three types of data, real unlabeled data, tag data, noise generation data. Target functions, supervised methods, and unsupervised methods. Label smoothing (smooth) converts 0 and 1 discrete labels into smoother 0.1 (beta), 0.9 (alpha), and so on. False data distribution of the beta Coefficient of molecular mixing. We recommend that you keep the label 0 for false data. one label is smooth, and the other label is not smooth. one-sided label is amoothing ). Smooth, The GAN discriminant function does not provide a large gradient signal (gradient signal) to prevent algorithms from moving towards extreme sample traps.
Batch Norm: Take a Batch of data, normalize (normalise, subtract the average value, divided by the standard deviation ). Data is more concentrated, not too big or small, and learning efficiency is higher. The same batch of (batch) data is too similar, unsupervised GAN, and easy to be biased. It is considered that the data is the same, and the final generation model is mixed with many other features.
Reference Batch Norm. a batch of data (fixed) is taken as the Reference data set R. The new data Batch is normalized Based on the r average and standard deviation. R is not good, the effect is not good, or R is over-fitting.
Virtual Batch Norm, taking R, normalization of new data x, adding x to R to form virtual batch V, using the average value and standard deviation of V to standardize x, greatly reducing R risks.
Balance G and D. Usually it is used against the network, and the discriminative Model D wins, and D is deeper than G. Write the objective function in a non-saturated game to ensure that G can continue learning after D is completed.
GAN problems. Non-convergence, it is easy to find only the local advantages, non-global advantages, or cannot converge at all. Mode collapse, minmaxV (G, D) is not equal to maxminV (G, D). If maxD is placed in the inner ring, the algorithm can converge to its proper position. If maxG is placed in the inner ring, the algorithm is redirected to the clustering area, and the global distribution is invisible. Reverse KL, conservative loss (loss ).
Minibatch GAN, the original data is divided into small batch, which ensures that the data sample is not put into a small batch. The data is rich enough to avoid mode crashes.
Space comprehension error. The image is 3D in 2D format. Image samples generate images with poor spatial expression.
Unrolled (do not roll) GAN. Do not roll the discriminant model D from each step, store K times of D, and choose the best one based on the loss (loss.
Scientific evaluation and quantification criteria are not allowed.
Discrete output, not differentiate ). Williams (1992) REINFORCE. Jang et al. (2016) Gumbel-softmax. Train with continuous numeric values, set the range, and output discrete values.
Strengthen the learning connection, unable to converge, a limited number of steps, poor behavior is simpler and more effective.
PPGN (Plug and Play Generative Models Plug-and-Play generation model), Nguten et al, 2016. Generate a new State-of-the-art in the model field (currently the best ).
GAN uses supervised learning to estimate complex objective functions to generate a model. GAN uses its own real and false samples for comparison. The high-dimensional continuous non-convex finding of the Nash equilibrium remains to be studied.
References:
Https://zhuanlan.zhihu.com/p/24767059
Http://www.sohu.com/a/121189842_465975
Welcome to paid consultation (150 RMB per hour), My: qingxingfengzi
Groups can be organized to promote each other. I have always believed that good interaction can make each other grow faster and help more people enter the fields they are interested in. I am creating a group to learn GAN together. We give priority to each learning progress every day. Add me, I will pull you into the group, please note when adding: Join the GAN Daily Report Group.