Deeplearning (v) CNN training CIFAR-10 database based on Keras

Last Update:2015-08-28 Source: Internet

Author: User

Tags shuffle theano keras

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

- Database Introduction
- Development tools
- Network framework
- Training results
- Training Essentials
- Activation function
- The role of dropout
- Training Code

"Original" Liu_longpo
Reprint Please specify the source "CSDN" http://blog.csdn.net/llp1992

Database Introduction

CIFAR-10 is a data set for universal object recognition, collected by Hinton's two great disciples Alex Krizhevsky, Ilya Sutskever.
CIFAR-10 is composed of 60000 32*32 RGB color images, a total of 10 categories. 50000 training, 10000 tests (cross-validation). The biggest feature of this data set is the migration of recognition to ubiquitous objects, and it is applied to multiple classifications (the sister dataset Cifar-100 reaches 100 classes, and the ILSVRC game is 1000 classes).

The data set can be downloaded to the CIFAR website.

Compared with the mature face recognition, pervasive object recognition is a huge challenge, and the data contains a large number of features, noises, and different proportions of the object of recognition.

Development tools

Now the popular Deeplearning library is quite a lot of, now GitHub on the most hot should be caffe. However, I personally think that the Caffe package is too dead, many things are packaged into a library, to learn the principle, or to see the Theano version.

My personal use of the library is recommended by Friends Keras, is based on Theano, the advantage is easy to use, can be developed quickly.

Network framework

The network framework references Caffe's CIFAR-10 framework, but has made its own modifications.
The framework is as follows:
Layer1 CONV1:
KERNEL:32, kernel size:5, Activation:relu, dropout:0.25
Layer2 Conv2:
Kernel:32,kernel Size:5, Activation:relu, dropout:0.25
Layer3 Maxpooling1:poolsize:2
Layer4 Conv3:
Kernel:64,kernel Size:3, Activation:relu, dropout:0.25
Layer5 Maxpooling2:poolsize:2
Layer6:full Connect, Activation:tanh
Layer7:softmax

Training results

Comparing the trend of training errors in Alex's paper, the paper says that using Relu to replace Tanh as an activation function can quickly converge the algorithm, but in practice I don't feel how fast it can be.
Or, look at loss's direction is not so good in the paper, of course, my framework is not exactly the same as the author's.

35 iterations, the final training accuracy rate of 0.86, cross-validation accuracy is: 0.78
The feeling was faster than fitting, so no more iterations were added.

Training Essentials

What has to be done is to preprocess the CIFAR-10 database, that is, to mean, normalization, and whitening.
After replacing Tanh with Relu as an activation function, it must be said that the learningrate is lowered by an order of magnitude, otherwise the fitting will occur.
What layer should Relu be used on? The answer is that in addition to the last layer of Softmax to use Tanh, other layers can.
About the ratio of dropout, the paper says 0.5, but I found 0.25 better, which may need to be adjusted according to the data

Activation function

The activation functions in deep learning are: sigmoid,tanh,relus,softplus
Currently the best is rectified linear units (Relus) Correction linear Unit

Multi-layered neural networks will not be able to converge because gradient vanishing problem if the sigmoid or Tanh activates the function and does not do pre-training. This is not the problem with Relu.

The use of pre-training: rule, prevent overfitting, compress data, remove redundancy, strengthen features, reduce errors, and speed up convergence.

The standard sigmoid output is not sparse, and some penalty factors need to be used to train a large stack of nearly 0 redundant data, resulting in sparse data such as L1, L1/L2, or student-t as a penalty factor. Therefore, unsupervised pre-training is required.

The Relu is a linear correction, the formula is: g (x) = max (0, x), the function diagram is as follows. The effect is that if the calculated value is less than 0, let it be equal to 0, otherwise keep the original value unchanged. This is a simple and brutal method of forcing some data to be 0, however, it has been proven that the network after training is completely moderately sparse. And the visual effect after training is similar to the traditional way of pre-training, which also shows that Relu has the ability to guide moderately sparse.

The role of dropout

Dropout: Prevent overfitting, Practice: At the time of training, the value of the node output in the hidden layer is randomly set to 0 in the FP, and the error of the hidden layer node corresponding to 0 in the BP process is also 0 and sparse. This will allow neurons to learn some of the more robust and more abstract features.

Training Code

#-*-Coding:utf-8-*-"" " Created on Thu 11:27:34 2015@author:lab-liu.longpo " "" from__future__ImportAbsolute_import from__future__ImportPrint_function fromKeras.modelsImportSequential fromKeras.layers.coreImportDense, dropout, Activation, Flatten fromKeras.layers.convolutionalImportconvolution2d, Maxpooling2d fromKeras.optimizersImportSGD, Adadelta, Adagrad fromKeras.utilsImportNp_utils, Generic_utilsImportMatplotlib.pyplot asPltImportNumPy asNpImportScipy.io asSiod = Sio.loadmat (' Data.mat ') data = d[' d ']label = d[' l ']data = Np.reshape (data, (50000,3, +, +) label = np_utils.to_categorical (Label,Ten)Print(' Finish loading data ') model = sequential () Model.add (convolution2d ( +,3,5,5, border_mode=' valid ')) Model.add (Activation (' Relu '))#model. Add (Maxpooling2d (poolsize= (2, 2) )Model.add (Dropout (0.25)) Model.add (convolution2d ( +, +,5,5, border_mode=' valid ')) Model.add (Activation (' Relu ')) Model.add (Maxpooling2d (poolsize= (2,2))) Model.add (Dropout (0.25)) Model.add (convolution2d ( -, +,3,3, border_mode=' valid ')) Model.add (Activation (' Relu ')) Model.add (Maxpooling2d (poolsize= (2,2))) Model.add (Dropout (0.25)) Model.add (Flatten ()) Model.add (Dense ( -*5*5, +, init=' normal ')) Model.add (Activation (' Tanh ')) Model.add (Dense ( +,Ten, init=' normal ')) Model.add (Activation (' Softmax ')) SGD = SGD (l2=0.001, lr=0.0065, decay=1e-6, momentum=0.9, nesterov=True) Model.compile (loss=' Categorical_crossentropy ', optimizer=sgd,class_mode="categorical")#checkpointer = Modelcheckpoint (filepath= "Weight.hdf5", Verbose=1,save_best_only=true)#model. Fit (data, label, Batch_size=100,nb_epoch=10,shuffle=true,verbose=1,show_accuracy=true,validation_ Split=0.2,callbacks=[checkpointer])result = Model.fit (data, label, batch_size= -, nb_epoch= *, shuffle=True, verbose=1, show_accuracy=True, validation_split=0.2)#model. Save_weights (Weights,accuracy=false)# Plot The resultPlt.figureplt.plot (result.epoch,result.history[' ACC '],label="ACC") Plt.plot (result.epoch,result.history[' VAL_ACC '],label="VAL_ACC") Plt.scatter (result.epoch,result.history[' ACC '],marker=' * ') Plt.scatter (result.epoch,result.history[' VAL_ACC ']) Plt.legend (loc=' under Right ') Plt.show () Plt.figureplt.plot (result.epoch,result.history[' loss '],label="Loss") Plt.plot (result.epoch,result.history[' Val_loss '],label="Val_loss") Plt.scatter (result.epoch,result.history[' loss '],marker=' * ') Plt.scatter (result.epoch,result.history[' Val_loss '],marker=' * ') Plt.legend (loc=' upper right ') Plt.show ()

Today, I put the machine learning and deep learning algorithms that I have realized on GitHub, and the data sets and data preprocessing, training code, etc. will be put on GitHub, if you feel useful, please give it to a star.

Deeplearning (v) CNN training CIFAR-10 database based on Keras

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More