In recent months in order to write a small paper, the topic is about using the depth of learning face search, so you need to choose a suitable depth learning framework, Caffe I learned after the use of the feeling is not very convenient, after someone recommended to me Keras, its simple style attracted me, After four months I have been using the Keras framework, because I use the time, the TensorFlow tutorial is not much, so the backend I use Theano. This time the experience is divided into two main articles, the first is to talk about starting to use Keras easy to encounter pits, the second will be after my paper, along with the code and some more details of things to send out, I will also at the end of some of my own collection of CNN and TensorFlow related tutorials, and so on.
First put on the Keras official Chinese document Link: http://keras-cn.readthedocs.io/en/latest/
and Keras QQ Group number: 119427073
In this paper, I have a chat about some of the problems I encountered in this process: 1, Keras output loss,val how these values are saved to the text:
The Fit function in Keras returns a History object whose history.history attributes are stored in all of the previous values and, if there is a validation set, contains the changes to these metrics of the validation set, specifically:
Hist=model.fit (train_set_x,train_set_y,batch_size=256,shuffle=true,nb_epoch=nb_epoch,validation_split=0.1)
with open (' Log_sgd_big_32.txt ', ' W ') as F:
f.write (str (hist.history))
I think that before saving the Loss,val these values are more important, in the subsequent parameters of the process sometimes still need to loss the results as a reference, especially if you have added some of your own loss case, but such a way of writing will make the name of the entire text more chaotic, So you can actually consider using Aetros plug-ins, Aetros Web site, this is a keras based on a management tool, you can visualize your network structure, intermediate convolution results visualization, as well as save all the results you used to run, or very convenient, that is, some instability, sometimes collapse ... 2, about the training set, validation set and test set:
In fact, at first I didn't get the point, holding the test set when the validation set, in fact, the validation set is extracted from the training set for the parameter, and the test set is no intersection with the training set, used to test the selected parameters for the effect of the model, this or do not make a mistake ... In Keras, the partition of the validation set is only as good as setting the value of the validation_split in the Fit function, which corresponds to the percentage of the training set as a validation set. However, since shuffle is performed after validation _split, it is possible to make the validation set a negative sample if there is no shuffle at the beginning of the training set. The use of the test set is as long as it is set in the Evaluate function.
Print Model.evaluate (test_set_x,test_set_y, batch_size=256)
Notice here that the default batch_size for evaluate and fit functions are 32, and remember to modify them. 3. Questions about the use of optimization methods:
There's always a twist on which optimization works, but the best way is to try, countless times after the experiment is not difficult to find, SGD this learning rate is not adaptive optimization method, adjust the learning rate and initialization method will make its results are very different, but because the convergence is really unhappy, the total feeling is not very convenient, I think the reason for the previous use of SGD is because the optimization method is not much, the second is to use SGD can have such a good result, show you how good the network. Other Adam,adade,rmsprop results are similar, and Nadam is better at convergence because it is a version of Adam's momentum. So if you're not satisfied with the results, you can change these methods.
There are a lot of beginners will be curious how to make SGD learning rate changes, in fact, Keras has a feedback function called Learningratescheduler, the specific use of the following:
def Step_decay (Epoch):
initial_lrate = 0.01
drop = 0.5 Epochs_drop
= 10.0 Lrate
= initial_lrate * Math.po W (Drop,math.floor (1+epoch)/epochs_drop) return
lrate
lrate = Learningratescheduler (step_decay)
SGD = SGD (lr=0.0, momentum=0.9, decay=0.0, Nesterov=false)
model.fit (train_set_x, train_set_y, validation_split=0.1, nb_epoch=200, batch_size=256, Callbacks=[lrate])
The above code is to make the learning Rate index drop, as shown in the following figure:
Of course, can also directly modify the parameters in the SGD declaration function to directly modify the learning rate, learning rate changes as follows:
SGD = SGD (lr=learning_rate, Decay=learning_rate/nb_epoch, momentum=0.9, Nesterov=true)
You can refer to this article using Learning Rate schedules for Deep Learning Models in Python with Keras 4, the discussion about the over fitting problem:
The solution I know now is roughly two, the first is to add dropout layer, the principle of dropout I will not say more, mainly say some of its use, dropout can be placed in the back of many classes, used to inhibit the fitting phenomenon, common can be directly placed behind the dense layer, For the convolutional and maxpooling layers dropout should be placed between convolutional and maxpooling, or maxpooling behind the argument, my advice is to try. I've seen both of these placement methods, but it's not good for me to say, but most of what I see is between convolutional and maxpooling. About the choice of dropout parameters, this is only to keep trying, but I found a problem, When the dropout is set above 0.5, there will be validation set accuracy is generally higher than the training set precision, but not much impact on the accuracy of the validation set, the opposite result is good, my explanation is dropout equivalent to ensemble,dropout is equivalent to a combination of multiple models, some poor model will pull down the accuracy of the training set 。 Of course, this is only my guess, we have a good explanation, may wish to leave a message to discuss.
Of course, there is a second is the use of parameter regularization, that is, in some layers of the declaration to add L1 or L2 regularization coefficient, the principle of regularization of what I do not elaborate, specific look at the code:
C1 = convolution2d (4, 4, border_mode= ' valid ', init= ' he_uniform ', activation= ' Relu '), W_regularizer=l2 (regularizer_ params))
where W_REGULARIZER=L2 (Regularizer_params) is used to set the coefficient of regularization, this has a good effect on the fitting, to some extent, improve the generalization ability of the model. 5, the Batchnormalization layer placement problem:
bn layer is really hanging, simply artifact, in addition to make the network to build time and each epoch time to extend a little, but on this issue I have seen countless statements, for convolution and pool layer of the release method, said put in the middle, but also said the pool layer behind, for the dropout layer, there is said to put behind it, Also have said put in front of it, to this question I say still try. Although trouble ... But DL is not a partial engineering discipline ... Another point is to note that the BN layer of the parameters of the problem, I did not notice at first, carefully look at the bn layer parameters:
Keras.layers.normalization.BatchNormalization (epsilon=1e-06, mode=0, Axis=-1, momentum=0.9, Weights=none, Beta_init = ' zero ', gamma_init= ' one ')
Mode: integer, specifying the normalized mode, taking 0 or 1
0: According to the characteristic normalization, each characteristic chart of the input will be normalized independently. The normalized axis is specified by the parameter axis. Note that if the input is a samples,channels,rows,cols 4D image tensor, you should set the normalized axis to 1, which is normalized along the channel axis. The input format is ' TF ' in the same vein.
1: Normalized by sample, this mode is entered as 2D by default
Most of us use the mode=0 is to standardize the characteristics, for the placement of the convolution and the pool or after the 4D tensor, you need to set the Axis=1, and the dense layer after the BN layer directly using the default value is good.
Summarize
This time first write so much, this writing is relatively shallow, there are a lot of things to expand the next time to write, the following for you to attach some good information it ~
CNN Basics:
"1" Do you really know the CNN network?
"2" CNN recent developments and practical tips (ON)
"3" depth | From getting started to mastering: a beginner's Guide to convolution neural Network (attached thesis)
"4" convolutional neural Networks (CNNs): An illustrated explanation
"5" convolutional neural Networks backpropagation:from intuition to derivation
"6" Congratulations to the end. CS231N Official notes Authorized translation of the anthology published
DL and Keras Related:
Activation function Guidance in "1" depth learning
Discussion on the problem of over-fitting of "2" depth network
"3" How to improve Deep Learning performance
Summary and comparison of the most complete optimization methods for "4" depth learning (Sgd,adagrad,adadelta,adam,adamax,nadam)
"5" keras/python depth learning in the grid search Super parameter tuning (with source code)
"6" Yoshua Bengio and other great God impart: 26 deep learning experience
"7" leriomaggio/deep-learning-keras-euroscipy2016
TensorFlow Related:
"1" tensorflow depth study, an article is enough
"2" alrojo/tensorflow-tutorial
"3" Mo TensorFlow Neural Network tutorial