DL Open Source Framework Caffe | Model Fine-tuning (finetune) scenarios, issues, tips, and solutions

Source: Internet
Author: User
Tags shuffle

Transferred from: http://blog.csdn.net/u010402786/article/details/70141261

Preface
What is the fine tuning of the model?


Train with someone else's trained network model, if you have to use the same network as someone else, because the parameters are based on the network. Of course the last layer can be modified, because our data may not have 1000 classes, and only a few. Change the name of the output category and layer on the last layer. Training with other people's parameters, modified networks, and their own data, so that parameters adapt to their own data, such a process, often referred to as fine-tuning (fine tuning).

is the network parameter updated during fine tuning?


Update, the finetune process is equivalent to continuing training, and the difference from direct training is when initializing:
A. Direct training is initialized in the manner specified by the network definition (e.g. Gaussian random initialization)
B. Finetune is initialized with the parameter file you already have (that is, the previously trained Caffemodel)

* * Part One: Caffe command-line parsing * * —————

First, training model code

Script:

./build/tools/caffe train -solver models/finetune/solver.prototxt -weights models/vgg_face_caffe/VGG_FACE.caffemodel -gpu 0

BAT Command:

..\..\bin\caffe.exe train --solver=.\solver.prototxt -weights .\test.caffemodelpause
Second, Caffe command full analysis

Http://www.cnblogs.com/denny402/p/5076285.html

Part Two: Example of tuning parameter adjustment
I. Examples of models Finetune

Caffe Finetune Resnet-50

http://blog.csdn.net/tangwenbo124/article/details/56070322

Caffe Finetune googlenet

http://blog.csdn.net/sinat_30071459/article/details/51679995

Caffe Finetune FCN

http://blog.csdn.net/zy3381/article/details/50458331

Caffe Finetune Alexnet

                 

Second, the parameter adjustment attention
    • First change the name, so that the pre-training model assigned to the time here because the name does not match so that retraining, but also to achieve our goal of adapting to the new task;
    • Adjust the learning rate because the last layer is re-learning and therefore need to have a faster learning rate compared to the other layers, so we will, weight and bias learning rate 10 times times faster, to make non-fine layer learning faster;
    • Finetune the name of the last fully connected layer is modified, it is necessary to reset the output number of the FC8 layer according to the class number of its own data set;
    • The class number of the dataset starts at 0 and is contiguous in the middle, otherwise it can cause unexpected errors
    • Data sets remember to disrupt, otherwise it is likely not convergence;
    • If there is a problem of non-convergence, you can put the solver in the LR set small, generally starting from 0.01, if the appearance of Loss=nan has been small adjustment;
    • Can be accuracy and loss curve draw out, easy to set stepsize, generally in accuracy and loss tend to smooth when the LR can be reduced;
    • Finetune the mean file that should be generated with your own data set (is it correct?). );
Part III: The choice of fine-tune experience

In fine-tune, exactly which way to choose transfer Learning? There are many factors to consider, the two most important of which are the size of the new database and how similar it is to the pre-trained database, and there are four scenarios based on the different configurations of these two factors:

The new database is small and similar to the pre-trained database. Because the database is relatively small, fine-tune words may produce overfitting, it is better to use the pre-trained network as feature extractor, and then train the linear classifier in the new task.
The new database is large and similar to the pre-trained database. In this case, you can safely fine-tune the entire network without worrying about fitting.
The new database is small and not similar to the pre-trained database. At this point, can not be fine-tuning, using the pre-training network to remove the last layer as a feature extractor is also inappropriate, the feasible scheme is to use pre-trained network of the previous layers of the activation value as a feature, and then train the linear classifier.
The new database is large and not similar to the pre-training database. You can start training from scratch, or you can fine-tune it on a pre-trained basis.


Summary: When doing freeze operations, there is usually a selective finetune based on the data set in different situations. such as small datasets, you can freeze the front conv layer-> fc4086 to extract the multi-class generalization features of CNN on the imagenet to assist as a classification of feature, and revise on this side fc-20-> Softmax for training. And so on, if the medium datasets is freeze to half the conv. The big reason for personal understanding is that the lower level layer has a stronger generalization of the basic feature, while remembering to consider your data to choose.

Part IV: How to fix the network parameters for the above different ways

For example, there are 4 fully connected layers a->b->c->d: 

A. You want the C-layer parameters will not change, C, the AB layer of the parameters will not change, this is the case that the gradient of the D layer does not forward to propagate to the D layer of the input blob (that is, the C layer output BLOB does not get a gradient), you can set the D layer of Lr_mult: The gradient of the 0,layer does not propagate backwards, and the parameters of all layers in front of it will not change.  
B. You want the C-layer parameter to not change, but the AB layer in front of the C parameter will change, in this case, just fixed the C-layer parameters, the C-layer gradient will still be transmitted back to the front layer B. You only need to adjust the learning rate of the corresponding parameter blob to 0: 
add param {lr_mult:0} to the layer, such as the full join layer:

layer {    "InnerProduct"    param { # 对应第1个参数blob的配置,也就是全连接层的参数矩阵的配置         lr_mult: 0 # 学习率为0,其他参数可以看caffe.proto里面的ParamSpec这个类型 } param { # 对应第2个参数blob的配置,也就是全连接层的偏置项的配置 lr_mult: 0 # 学习率为0 }}
Part V: Caffe Fine-tune FAQs
First, according to the online tutorial fine tuning alexnet, why loss has been 87.3365?

Workaround: Check whether the label of the dataset starts at 0, BASE_LR down by one order of magnitude, and the batch_size is one times higher.

Causes: 87.3365 is a very special number, Nan after Softmaxwithloss produced this number, so your FC8 output is all Nan;

Specific analysis:
Http://blog.csdn.net/jkfdqjjy/article/details/52268565?locationNum=14

Second, the loss declined, but the accuracy rate has not changed significantly?

Solution: First shuffle before training, second, whether the study rate is appropriate.

Third, the Data Augmentation skills Summary:

Turn the white in the retreat https://www.zhihu.com/question/35339639

The change of image brightness, saturation and contrast;
PCA jittering
Random Resize
Random crop
Horizontal/vertical Filp
Rotational affine transformations
Noise and fuzzy processing of heightening
Label Shuffle: Category unbalanced data amplification, see the report of the Conway ILSVRC2016

               

Iv. How to judge the situation of network training through loss curve:

The individual loss curve can provide very little information, usually combined with the accuracy curve on the test machine to determine whether it is fit;
The key is to see how your ACC is on the test set.
If your learning_rate_policy is a step or other type of change, the loss curve can help you choose a more appropriate stepsize;

Five, Finetune_net.bin can not use after, with the new method to do Finetune will problem, how to solve?


Change a name for the last innerproduct layer.

Part VI: References

1.http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html
2.https://www.zhihu.com/question/54775243
3.http://blog.csdn.net/u012526120/article/details/49496617
4.https://zhidao.baidu.com/question/363059557656952932.html

DL Open Source Framework Caffe | Model Fine-tuning (finetune) scenarios, issues, tips, and solutions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.