How to Train a GAN? Tips and tricks to make Gans
Transferred from: Https://github.com/soumith/ganhacks
While at generative adversarial Networks (Gans) continues to improve the fundamental stability of these models, W E use a bunch of tricks to train them and make them stable day to day.
Here is a summary of some of the tricks.
Here's a link to the authors of this document
If you find a trick that's particularly useful in practice, please open a pull Request to add it to the document. If we find it to be reasonable and verified, we'll merge it in.
1. Normalize the Inputs
- Normalize the images between-1 and 1
- Tanh as the last layer of the generator output
2: A modified loss function
In GAN papers, the loss function to optimize G min (log 1-D)
are, but in practice folks practically usemax log D
- Because the first formulation have vanishing gradients early on
- Goodfellow et. Al (2014)
In practice, works well:
- Flip labels when training generator:real = fake, fake = Real
3:use a spherical Z
- Dont sample from a Uniform distribution
- Sample from a Gaussian distribution
- When doing interpolations, does the interpolation via a great circle, rather than a straight line from point A to point B
- Tom White ' s sampling generative Networks have more details
4:batchnorm
- Construct different mini-batches for real and fake, i.e. each mini-batch needs to contain only all real images or all gene Rated images.
- When batchnorm are not a option use instance normalization (for each sample, subtract mean and divide by standard Deviatio N).
5:avoid Sparse Gradients:relu, Maxpool
- The stability of the GAN game suffers if you have sparse gradients
- Leakyrelu = Good (in both G and D)
- For downsampling, Use:average Pooling, conv2d + Stride
- For Upsampling, Use:pixelshuffle, convtranspose2d + Stride
- pixelshuffle:https://arxiv.org/abs/1609.05158
6:use Soft and Noisy Labels
- Label smoothing, i.e. if you have both target labels:real=1 and fake=0, then for each incoming sample, if it was Real, then Replace the label with a random number between 0.7 and 1.2, and if it's a fake sample, replace it with 0.0 and 0.3 (for Example).
- Make the labels the noisy for the discriminator:occasionally flip the labels when training the discriminator
7:dcgan/hybrid Models
- Use Dcgan when can. It works!
- If you cant use Dcgans and no model are stable, use a hybrid model:kl + gan or VAE + gan
8:use stability tricks from RL
- Experience Replay
- Keep A replay buffer of past generations and occassionally show them
- Keep checkpoints from the past of G and D and occassionaly swap them off for a few iterations
- All stability tricks this work for deep deterministic policy gradients
- See Pfau & Vinyals (2016)
9:use the ADAM Optimizer
- Optim. Adam rules!
- Use of SGD for discriminator and ADAM for generator
10:track Failures Early
- D loss goes to 0:failure mode
- Check norms of gradients:if they is over things is screwing up
- When things is working, D loss have low variance and goes down over time vs have huge variance and spiking
- If loss of generator steadily decreases, then it's fooling D with garbage (says Martin)
11:dont balance loss via statistics (unless you had a good reason to)
- Dont try to find a (number of g/number of D) schedule to Uncollapse training
- It ' s hard and we've all tried it.
- If You do try it, there is a principled approach to it, rather than intuition
For example
while lossD > A: train Dwhile lossG > B: train G
12:if You has labels, use them
- If you had labels available, training the discriminator to also classify the samples:auxillary Gans
13:add noise to inputs, decay over time
- Add some artificial noise to inputs to D (Arjovsky et al., Huszar, 2016)
- http://www.inference.vc/instance-noise-a-trick-for-stabilising-gan-training/
- Https://openreview.net/forum?id=Hk4_qw5xe
- Adding Gaussian noise to every layer of generator (Zhao et al Ebgan)
- Improved Gans:openai code also have it (commented out)
: [Notsure] Train discriminator More (sometimes)
- Especially when you have noise
- Hard to find a schedule of number of D iterations vs G iterations
: [Notsure] Batch discrimination
16:discrete variables in Conditional Gans
- Use an embedding layer
- Add as additional channels to images
- Keep embedding dimensionality Low and upsample to match image channel size
Authors
- Soumith Chintala
- Emily Denton
- Martin Arjovsky
- Michael Mathieu
(turn) How to Train a GAN? Tips and tricks to make Gans