Talk about how to train a well-performing deep neural network
Deep learning fires, the state of the art of each data set is constantly refreshed, to the release of open source code, there is a universal can brush ranking rhythm.
But do not think of the brush data so simple, otherwise we go to which hair paper, how bread where eat = = but I do not want to send paper want to occupy the pit brush data How to do, see Cifar10 are 95%, I this with the small demo of Caffe bring 78% results, Caffe, are you sure you're not kidding me?
Caffe did not lie to you. = = Today I would like to introduce you how to brush a performance close to the paper neural network
CNN, for example, is basically divided into three steps:
The first step is to use the leaky relu,dropout (see blog.kaggle.com/2015/01/02/ cifar-10-competition-winners-interviews-with-dr-ben-graham-phil-culliton-zygmunt-zajac/)
The second step, data disturbance, the data will be shifted up and down, zoom in and out, Pan green, redness, anti-color and so on, do a lot of reasonable disturbance,
The third step, the fixed step learning, until the training is not moving, to find a high-precision solverstate as a starting point, the learning rate will be reduced training, supposedly reduced to 1e-4 training almost
In fact, when you study more found that the real improvement in performance is the second step, the other can only be said to be icing on the cake, the data disturbance is fundamental, of course, this also reveals the classifier itself defects.
Of course, someone asked, you network structure has not yet, this well, paper and experiment contact more, oneself naturally will design, I think network structure is not the main, because CNN's fatal flaw other classifiers also have, to solve can only say is all together solve.
Mnist I data disturbance to the results of the brush to 99.58%, the structure is very simple rough no brain, cifar10 disturbance too little is just 88%, long 90% should be very easy, ImageNet, hehe, Look at Jin Lianwen teacher Weibo on Baidu on the imagenet on the comments you know what I want to say.
(Baidu brushes the indicator to 4.58%, the main job is (1) more silver (144 GPU Clusters) (2) A larger network (6 16-tier 212M Model Integration) (3) more data (each picture is composed of tens of thousands of changes)-Jin Lianwen)
Talk about how to train a well-performing deep neural network