-
- Database Introduction
- Development tools
- Network framework
- Training results
- Training Essentials
- Activation function
- The role of dropout
- Training Code
"Original" Liu_longpo
Reprint Please specify the source "CSDN" http://blog.csdn.net/llp1992
Database Introduction
CIFAR-10 is a data set for universal object recognition, collected by Hinton's two great disciples Alex Krizhevsky, Ilya Sutskever.
CIFAR-10 is composed of 60000 32*32 RGB color images, a total of 10 categories. 50000 training, 10000 tests (cross-validation). The biggest feature of this data set is the migration of recognition to ubiquitous objects, and it is applied to multiple classifications (the sister dataset Cifar-100 reaches 100 classes, and the ILSVRC game is 1000 classes).
The data set can be downloaded to the CIFAR website.
Compared with the mature face recognition, pervasive object recognition is a huge challenge, and the data contains a large number of features, noises, and different proportions of the object of recognition.
Development tools
Now the popular Deeplearning library is quite a lot of, now GitHub on the most hot should be caffe. However, I personally think that the Caffe package is too dead, many things are packaged into a library, to learn the principle, or to see the Theano version.
My personal use of the library is recommended by Friends Keras, is based on Theano, the advantage is easy to use, can be developed quickly.
Network framework
The network framework references Caffe's CIFAR-10 framework, but has made its own modifications.
The framework is as follows:
Layer1 CONV1:
KERNEL:32, kernel size:5, Activation:relu, dropout:0.25
Layer2 Conv2:
Kernel:32,kernel Size:5, Activation:relu, dropout:0.25
Layer3 Maxpooling1:poolsize:2
Layer4 Conv3:
Kernel:64,kernel Size:3, Activation:relu, dropout:0.25
Layer5 Maxpooling2:poolsize:2
Layer6:full Connect, Activation:tanh
Layer7:softmax
Training results
Comparing the trend of training errors in Alex's paper, the paper says that using Relu to replace Tanh as an activation function can quickly converge the algorithm, but in practice I don't feel how fast it can be.
Or, look at loss's direction is not so good in the paper, of course, my framework is not exactly the same as the author's.
35 iterations, the final training accuracy rate of 0.86, cross-validation accuracy is: 0.78
The feeling was faster than fitting, so no more iterations were added.
Training Essentials
- What has to be done is to preprocess the CIFAR-10 database, that is, to mean, normalization, and whitening.
- After replacing Tanh with Relu as an activation function, it must be said that the learningrate is lowered by an order of magnitude, otherwise the fitting will occur.
- What layer should Relu be used on? The answer is that in addition to the last layer of Softmax to use Tanh, other layers can.
- About the ratio of dropout, the paper says 0.5, but I found 0.25 better, which may need to be adjusted according to the data
Activation function
The activation functions in deep learning are: sigmoid,tanh,relus,softplus
Currently the best is rectified linear units (Relus) Correction linear Unit
Multi-layered neural networks will not be able to converge because gradient vanishing problem if the sigmoid or Tanh activates the function and does not do pre-training. This is not the problem with Relu.
The use of pre-training: rule, prevent overfitting, compress data, remove redundancy, strengthen features, reduce errors, and speed up convergence.
The standard sigmoid output is not sparse, and some penalty factors need to be used to train a large stack of nearly 0 redundant data, resulting in sparse data such as L1, L1/L2, or student-t as a penalty factor. Therefore, unsupervised pre-training is required.
The Relu is a linear correction, the formula is: g (x) = max (0, x), the function diagram is as follows. The effect is that if the calculated value is less than 0, let it be equal to 0, otherwise keep the original value unchanged. This is a simple and brutal method of forcing some data to be 0, however, it has been proven that the network after training is completely moderately sparse. And the visual effect after training is similar to the traditional way of pre-training, which also shows that Relu has the ability to guide moderately sparse.
The role of dropout
Dropout: Prevent overfitting, Practice: At the time of training, the value of the node output in the hidden layer is randomly set to 0 in the FP, and the error of the hidden layer node corresponding to 0 in the BP process is also 0 and sparse. This will allow neurons to learn some of the more robust and more abstract features.
Training Code
#-*-Coding:utf-8-*-"" " Created on Thu 11:27:34 2015@author:lab-liu.longpo " "" from__future__ImportAbsolute_import from__future__ImportPrint_function fromKeras.modelsImportSequential fromKeras.layers.coreImportDense, dropout, Activation, Flatten fromKeras.layers.convolutionalImportconvolution2d, Maxpooling2d fromKeras.optimizersImportSGD, Adadelta, Adagrad fromKeras.utilsImportNp_utils, Generic_utilsImportMatplotlib.pyplot asPltImportNumPy asNpImportScipy.io asSiod = Sio.loadmat (' Data.mat ') data = d[' d ']label = d[' l ']data = Np.reshape (data, (50000,3, +, +) label = np_utils.to_categorical (Label,Ten)Print(' Finish loading data ') model = sequential () Model.add (convolution2d ( +,3,5,5, border_mode=' valid ')) Model.add (Activation (' Relu '))#model. Add (Maxpooling2d (poolsize= (2, 2) )Model.add (Dropout (0.25)) Model.add (convolution2d ( +, +,5,5, border_mode=' valid ')) Model.add (Activation (' Relu ')) Model.add (Maxpooling2d (poolsize= (2,2))) Model.add (Dropout (0.25)) Model.add (convolution2d ( -, +,3,3, border_mode=' valid ')) Model.add (Activation (' Relu ')) Model.add (Maxpooling2d (poolsize= (2,2))) Model.add (Dropout (0.25)) Model.add (Flatten ()) Model.add (Dense ( -*5*5, +, init=' normal ')) Model.add (Activation (' Tanh ')) Model.add (Dense ( +,Ten, init=' normal ')) Model.add (Activation (' Softmax ')) SGD = SGD (l2=0.001, lr=0.0065, decay=1e-6, momentum=0.9, nesterov=True) Model.compile (loss=' Categorical_crossentropy ', optimizer=sgd,class_mode="categorical")#checkpointer = Modelcheckpoint (filepath= "Weight.hdf5", Verbose=1,save_best_only=true)#model. Fit (data, label, Batch_size=100,nb_epoch=10,shuffle=true,verbose=1,show_accuracy=true,validation_ Split=0.2,callbacks=[checkpointer])result = Model.fit (data, label, batch_size= -, nb_epoch= *, shuffle=True, verbose=1, show_accuracy=True, validation_split=0.2)#model. Save_weights (Weights,accuracy=false)# Plot The resultPlt.figureplt.plot (result.epoch,result.history[' ACC '],label="ACC") Plt.plot (result.epoch,result.history[' VAL_ACC '],label="VAL_ACC") Plt.scatter (result.epoch,result.history[' ACC '],marker=' * ') Plt.scatter (result.epoch,result.history[' VAL_ACC ']) Plt.legend (loc=' under Right ') Plt.show () Plt.figureplt.plot (result.epoch,result.history[' loss '],label="Loss") Plt.plot (result.epoch,result.history[' Val_loss '],label="Val_loss") Plt.scatter (result.epoch,result.history[' loss '],marker=' * ') Plt.scatter (result.epoch,result.history[' Val_loss '],marker=' * ') Plt.legend (loc=' upper right ') Plt.show ()
Today, I put the machine learning and deep learning algorithms that I have realized on GitHub, and the data sets and data preprocessing, training code, etc. will be put on GitHub, if you feel useful, please give it to a star.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Deeplearning (v) CNN training CIFAR-10 database based on Keras