ImageNet classification with deep convolutional neural Networks reading notes
(after deciding to read a paper each time, the notes are recorded on the blog.) )
This article, published in NIPS2012, was Hinton and his students, in response to doubts about deep learning, used deep learning for imagenet, the largest database of image recognition, and eventually achieved very surprising results, The result is much better than the original state of the art (the first 5 selection error rate is reduced from 25% to 17%).
Imagenet currently contains about 22000 types of calibrated images with about 15 trillion. Among them, the most commonly used LSVRC-2010 contest contains 1000 classes, 1.2 trillion images. The result of this paper is the result of 17% of the first five error rate on this test set. The structure of the whole deep net is given:
A total of 8 layers, of which the first 5 is CNN, and the back 3 is an all-connected network, the last layer is the softmax composition of the output decision-making (the number of output nodes equals the number of categories 1000). Concrete implementation, this article on the structure of some of the improvements are: 1, the use of Relu to replace the traditional Tanh introduced nonlinearity, 2, the use of 2 video card for parallel computing, reduce the need for more video card host data transfer time consumption, in the structure, There is no connection between the front and back layer nodes distributed on different graphics cards, which improves the training speed; 3, the local normalization of the response of the adjacent nodes improves the recognition rate (TOP5 error rate is reduced by 1.2%), 4, the overlapping pooling (TOP5 error rate is reduced by 0.3%); In addition, In order to reduce the over-fitting, the article uses two ways: 1, data enhancement: to the training data to the left and right symmetry and translation transformation, the training data is increased to 2048 times times the original; a new sample is constructed for the PCA transformation of the pixel (this mechanism causes the TOP5 error rate to be reduced by 1); 2, Dropout: Optimization algorithm: Optimized with Mini-batch SGD algorithm, each batch128 sample, momentum = 0.9, weight decay = 0.0005
Random initialization weights and bias (see paper for specific random parameters) article link: http://books.nips.cc/papers/files/nips25/NIPS2012_0534.pdf Source address:/http code.google.com/p/cuda-convnet/
ImageNet classification with deep convolutional Neural Networks (reprint)