Source: https://www.zhihu.com/question/40850491 For example, first design a CNN structure.
Then use a large data set a, train the CNN network, get Network A.
However, on DataSet B, a network prediction is not ideal (the possible reason is that there are some differences between datasets A and B, such as representative differences caused by different data sources). If you train directly with part B, the amount of data is too small for CNN to apply.
Workaround:
The data set B is divided into train set and test, the parameters of a network as the initial parameters, with a smaller learning rate, the train set of B as training data, continue training, get network B.
In this way, B is generally able to achieve better prediction accuracy in the test set of B.
Wang Cheng
Links: https://www.zhihu.com/question/40850491/answer/88651844
Source: Know
Copyright belongs to the author, please contact the author for authorization. —————————————————————————————————————————————————————————————————————————————— is to modify the ready-made model slightly and then make a few training, Mainly used in the case of insufficient sample quantity. —————————————————————————————————————————————————————————————————————————————— apply the trained model to the new data set. The main advantage is that the same effect can be achieved in a shorter period of time compared to training from scratch.
Example:
1.fine-tuning: Start with CNN training on cifar100, then just modify the last layer Softmax the number of output nodes (100 to 10), and then put on the CIFAR10 training.
2. Train from scratch: CNN with the same structure is trained directly with CIFAR10.
Results:
In the first case, the 60%accuracy can be reached in as few as 1000 iterations, and the second case takes 4,000 times to reach 60% accuracy.
Caffe Official website has fine-tuning example, explanation is more detailed.
Qian Fei Hung
Links: https://www.zhihu.com/question/40850491/answer/88763800
Source: Know
Copyright belongs to the author, please contact the author for authorization. ——————————————————————————————————————————————————————————————————————————————— This is migration learning, The idea is to take a task-trained parameter directly to another task as the initial parameter value of his neural network, then train it, which is more accurate than the direct random initialization parameter. At the same time can be set according to their own needs of some layers of the parameters unchanged. ——————————————————————————————————————————————————————————————————————————————— references: article fine-tuning Deep convolutional Networks for Plant Recognitionangie K. Reyes1, Juan C. Caicedo2, and Jorge E. Camargo11 Laboratory fo R Advanced Computational Science and Engineering, Universidad Antonio Nari˜no, Colombia angreyes,jorgecamargo{@u An.edu.co}, 2 Fundaci´on Universitaria Konrad Lorenz, Colombia [email protected] The main content is: Fine-tune a custom plant-identified data set based on a model trained by Imagenet's very large data set (more than 1000 classifications). Where 3.2 knots fine-tuning the cnn We initialized the CNN to recognize $ categories of generic Objects that is part of the ImageNet hierarchy following the procedure described on the previous section. Then, we proceed to finetune the network for the PlantIdentification task. Fine-tuning a network is a procedure based on the concept of transfer learning [1, 3]. We start training a CNN to learn features for a broad domain with a classification function targeted at minimizing error I n that domain. Then, we replace the classification function and optimize the "network again to minimize" error in another, more specific do Main. Under this setting, we is transferring the features and the parameters of the network from the broad domain to the SPECIF IC One. the Classification function in the original CNN was a softmax classifier that computes the probability of $ classes of the ImageNet dataset. To start the fine-tuning procedure, we remove this softmax classifier and initialize a new one with random values. The new Softmax classifier is trained from scratch using the back-propagation algorithm with data from the Plant identific ation task, which also has different categories. in order to start the back-propagation algorithm for fine-tuning, it's key to set the learning rates of each layer APPROPR Iately. The classification layer, i.e., the new Softmax classifier, needs a large learning rate because it had been just Initializ ed with random values. The rest of the layers need a relatively small learning rate because we want to preserve the parameters of the previous NE Twork to transfer this knowledge into the new network. However, notice that the learning rate is isn't set to zero in the rest of the Layers:they would be optimized again at a SLO Wer pace. in Our experiments we set the learning rate of the top classification layers to ten while leaving the L Earning rate of all the other seven layers to 0.1. We run the back-propagation algorithm for 50,000 iterations, which optimizes the network parameters using stochastic Gradi Ent descent (SGD). Figure 3 shows how the precision of classifying a images improves with more iterations. Our impleMentation is based in the open source deep learning Library Caffe [7], and we run the experiments using a NVIDIA Titan Z G PU (5,760 cores and GB of RAM). fig. 3. Evolution of image classification accuracy in a validation set during the finetuning process. Accuracy improves quickly during the first iterations and stabilizes after 40,000 iterations. main content: &NB Sp There are more than 1000 types of plant classifications in this article, but the classification will be different from imagenet (imagenet not only have plants). It is therefore necessary to redefine the parameters of the classification function Softmax. The goal is to minimize the error (error) of the New softmax Backward Propagation (back-propagation) obtained by scratch training. The key to starting the backward propagation algorithm is to set the appropriate learning rate (learning rates) at each level. At the top level, which is the new Softmax function, the learning rate is a bit larger because the new Softmax function is randomly initialized. The rest of the layers have to use a relatively small learning rate because we want to keep the previously pre-trained network information. Note, however, that it is not possible to set the remaining layer parameters to 0 in order to preserve the previous information, or the network will be optimized more slowly. Therefore, the learning rate of the classification layer in the experiment is set to 10, and the learning rate of the remaining layer is set to 0.1. Iterate 50,000 times and use the random gradient descent method (stochastic gradient descent,sgd) to optimize the network (that is, to minimize backward propagation errors). As you can see, the first iteration accuracy rate (accuracy) rises particularly fast, tending to stabilize at approximately 40,000 iterations.
Understanding fine-tuning of Caffe