Transferred from: http://blog.csdn.net/u010402786/article/details/70141261
Preface
What is the fine tuning of the model? |
Train with someone else's trained network model, if you have to use the same network as someone else, because the parameters are based on the network. Of course the last layer can be modified, because our data may not have 1000 classes, and only a few. Change the name of the output category and layer on the last layer. Training with other people's parameters, modified networks, and their own data, so that parameters adapt to their own data, such a process, often referred to as fine-tuning (fine tuning).
is the network parameter updated during fine tuning? |
Update, the finetune process is equivalent to continuing training, and the difference from direct training is when initializing:
A. Direct training is initialized in the manner specified by the network definition (e.g. Gaussian random initialization)
B. Finetune is initialized with the parameter file you already have (that is, the previously trained Caffemodel)
* * Part One: Caffe command-line parsing * * —————
First, training model code |
Script:
./build/tools/caffe train -solver models/finetune/solver.prototxt -weights models/vgg_face_caffe/VGG_FACE.caffemodel -gpu 0
BAT Command:
..\..\bin\caffe.exe train --solver=.\solver.prototxt -weights .\test.caffemodelpause
Second, Caffe command full analysis |
Http://www.cnblogs.com/denny402/p/5076285.html
Part Two: Example of tuning parameter adjustment
I. Examples of models Finetune |
Caffe Finetune Resnet-50
http://blog.csdn.net/tangwenbo124/article/details/56070322
Caffe Finetune googlenet
http://blog.csdn.net/sinat_30071459/article/details/51679995
Caffe Finetune FCN
http://blog.csdn.net/zy3381/article/details/50458331
Caffe Finetune Alexnet
Second, the parameter adjustment attention |
- First change the name, so that the pre-training model assigned to the time here because the name does not match so that retraining, but also to achieve our goal of adapting to the new task;
- Adjust the learning rate because the last layer is re-learning and therefore need to have a faster learning rate compared to the other layers, so we will, weight and bias learning rate 10 times times faster, to make non-fine layer learning faster;
- Finetune the name of the last fully connected layer is modified, it is necessary to reset the output number of the FC8 layer according to the class number of its own data set;
- The class number of the dataset starts at 0 and is contiguous in the middle, otherwise it can cause unexpected errors
- Data sets remember to disrupt, otherwise it is likely not convergence;
- If there is a problem of non-convergence, you can put the solver in the LR set small, generally starting from 0.01, if the appearance of Loss=nan has been small adjustment;
- Can be accuracy and loss curve draw out, easy to set stepsize, generally in accuracy and loss tend to smooth when the LR can be reduced;
- Finetune the mean file that should be generated with your own data set (is it correct?). );
Part III: The choice of fine-tune experience
In fine-tune, exactly which way to choose transfer Learning? There are many factors to consider, the two most important of which are the size of the new database and how similar it is to the pre-trained database, and there are four scenarios based on the different configurations of these two factors:
The new database is small and similar to the pre-trained database. Because the database is relatively small, fine-tune words may produce overfitting, it is better to use the pre-trained network as feature extractor, and then train the linear classifier in the new task.
The new database is large and similar to the pre-trained database. In this case, you can safely fine-tune the entire network without worrying about fitting.
The new database is small and not similar to the pre-trained database. At this point, can not be fine-tuning, using the pre-training network to remove the last layer as a feature extractor is also inappropriate, the feasible scheme is to use pre-trained network of the previous layers of the activation value as a feature, and then train the linear classifier.
The new database is large and not similar to the pre-training database. You can start training from scratch, or you can fine-tune it on a pre-trained basis.
Summary: When doing freeze operations, there is usually a selective finetune based on the data set in different situations. such as small datasets, you can freeze the front conv layer-> fc4086 to extract the multi-class generalization features of CNN on the imagenet to assist as a classification of feature, and revise on this side fc-20-> Softmax for training. And so on, if the medium datasets is freeze to half the conv. The big reason for personal understanding is that the lower level layer has a stronger generalization of the basic feature, while remembering to consider your data to choose.
Part IV: How to fix the network parameters for the above different ways
For example, there are 4 fully connected layers a->b->c->d:
A. You want the C-layer parameters will not change, C, the AB layer of the parameters will not change, this is the case that the gradient of the D layer does not forward to propagate to the D layer of the input blob (that is, the C layer output BLOB does not get a gradient), you can set the D layer of Lr_mult: The gradient of the 0,layer does not propagate backwards, and the parameters of all layers in front of it will not change.
B. You want the C-layer parameter to not change, but the AB layer in front of the C parameter will change, in this case, just fixed the C-layer parameters, the C-layer gradient will still be transmitted back to the front layer B. You only need to adjust the learning rate of the corresponding parameter blob to 0:
add param {lr_mult:0} to the layer, such as the full join layer:
layer { "InnerProduct" param { # 对应第1个参数blob的配置,也就是全连接层的参数矩阵的配置 lr_mult: 0 # 学习率为0,其他参数可以看caffe.proto里面的ParamSpec这个类型 } param { # 对应第2个参数blob的配置,也就是全连接层的偏置项的配置 lr_mult: 0 # 学习率为0 }}
Part V: Caffe Fine-tune FAQs
First, according to the online tutorial fine tuning alexnet, why loss has been 87.3365? |
Workaround: Check whether the label of the dataset starts at 0, BASE_LR down by one order of magnitude, and the batch_size is one times higher.
Causes: 87.3365 is a very special number, Nan after Softmaxwithloss produced this number, so your FC8 output is all Nan;
Specific analysis:
Http://blog.csdn.net/jkfdqjjy/article/details/52268565?locationNum=14
Second, the loss declined, but the accuracy rate has not changed significantly? |
Solution: First shuffle before training, second, whether the study rate is appropriate.
Third, the Data Augmentation skills Summary: |
Turn the white in the retreat https://www.zhihu.com/question/35339639
The change of image brightness, saturation and contrast;
PCA jittering
Random Resize
Random crop
Horizontal/vertical Filp
Rotational affine transformations
Noise and fuzzy processing of heightening
Label Shuffle: Category unbalanced data amplification, see the report of the Conway ILSVRC2016
Iv. How to judge the situation of network training through loss curve: |
The individual loss curve can provide very little information, usually combined with the accuracy curve on the test machine to determine whether it is fit;
The key is to see how your ACC is on the test set.
If your learning_rate_policy is a step or other type of change, the loss curve can help you choose a more appropriate stepsize;
Five, Finetune_net.bin can not use after, with the new method to do Finetune will problem, how to solve? |
Change a name for the last innerproduct layer.
Part VI: References
1.http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html
2.https://www.zhihu.com/question/54775243
3.http://blog.csdn.net/u012526120/article/details/49496617
4.https://zhidao.baidu.com/question/363059557656952932.html
DL Open Source Framework Caffe | Model Fine-tuning (finetune) scenarios, issues, tips, and solutions