experimental data : Cat-dog Two classification, training set: 19871 validation set: 3975
Experimental model : resnet-18
batchsize: 128*2 (one K80 to eat 128 photos)
the problem : the training set accuracy can reach 0.99 loss=1e-2-3, but the validation set accuracy 0.5,loss is very high, try a number of initial learning rate (0.1-0.0001) are not
solve the above problem : Take the warm up method, a little help to the above problem
Training ResNet, because not finetune, very easy to fit, paper "Deep residual Learning for Image recognition" in the experiment of CIFAR10 with a trick is warm up (warm-up), The first is to use a small learning rate (0.01) for training, training 400iterations after the study rate adjustment to 0.1 start formal training.
At first, I used four initial learning rate lr=0.1; 0.01; 0.001; 0.0001; Then the LR is reduced by 1000 iterations per second. However, after all the four initial learning rates have been tried, it is found that the accuracy of the validation set are all 0.5~0.6, while the training set can reach 0.99. and ResNet adopted the batch normalization, in Caffe, batch normalization there is a "pit", is the setting of the use_global_stats problem. Training time is closed, testing time is to open, deploy is also to open.
For the training set accuracy up to 0.99, verify set accuracy is 0.5+ situation, I think is batch normalization problems, and then all kinds of bn go, and finally tried the warm up, the network in the validation set of the loss has declined.
To see the warm up of the loss, respectively, using 0.01 0.001 0.001 0.0001,gamma=10;