In this article, I will make a model summary of CIFAR10 (for object recognition), mnist (for character recognition) & Imagenet (for object recognition) according to the common CNN model of classification image.
This article does not speak coding (coding see convolution neural Network (CNN) principle and implementation article) this article does not involve company internal information, pure public data summary
OK, this article from the data set, the data set of unfamiliar partners please first to understand the next 3 datasets, the following we have for each dataset to draw its common model.
===================================
1. Cifar10
60000 32*32 color charts, 10 classes, 5000 for each class of training,1000 for testing, usually for object recognition/classification.
Model: (the number written above is the number of nodes in the layer)
2. Mnist
Black and white graphics, handwriting, 60000training,10000testing, have done croping,28*28, used as classification.
Lenet Model:
3. Imagenet
10w class, a large data set of about 1000 color graphs per class, needs to be registered for downloading. Every year since 10, there have been imagenet competitions, divided into detection, classification & localization. 14 game results and methods see here.
3.1 alexnet
Model:
But inside the details I have not done, today is a wayward, each layer out of the size and its corresponding operation. I think it looks not as clear as the above, but will have a deeper understanding of each step of the operation ...
From the bottom up, the bottom is the input data (note that 224 is wrong in the image above, where the crop image is actually 227*227).
Ps:crop to carry the picture four border crop+ Center crop
Data format per layer (batch size, # feature map, height of feature, width of feature)
Each time convolution (CONV) format (#output feature,#conv feature map,kernel height,kernel width)
Here we see the final Fc8 (8th layer, fully connect) is the label, which is a loss layer, a number of categories, using Softmax loss as loss function. This is the training time optimization parameters set, then the test how to do.
——
At the time of testing, the final feature fc8 a layer of probability, the probability that the return type is Softmax, and which highest result is evaluated.
If you do a global system assessment, you can then add a layer of accuracy layer, the return type is accuracy.
3.2 2014 googlenet
2014 Imagenet Classification & Detection Champion, 22-tier network ... To kneel, interested students to see the structure of the paper, where I can not cut off the screenshot ...
In addition, give a few references:
1. Beginners to play: You can use the online convnet to try
2. DIY Deep Learning Architecture
3. In fact, the best reference or paper + code, the above architecture can refer to Caffe Example/imagenet prototxt.
from:http://blog.csdn.net/abcjennifer/article/details/42493493