ImageNet classification with deep convolutional neural Networks reading notes(2013-07-06 22:16:36)
reprint
Tags: deep_learning imagenet Hinton |
Category: machine learning |
(after deciding to read a paper each time, the notes are recorded on the blog.) )
This article, published in NIPS2012, is Hinton and his students are using deep learning in response to doubts about deep learning.ImageNet(Image recognition is currently the largest database), the result is very impressive, and the results are much better than the original state of the art (the first 5 selection error rate is reduced from 25% to 17%).
Imagenet currently contains about 22000 types of calibrated images with about 15 trillion. Among them, the most commonly used LSVRC-2010 contest contains 1000 classes, 1.2 trillion images. The result of this paper is the result of 17% of the first five error rate on this test set.
The structure of the whole deep net is given:
A total of 8 layers, of which the first 5 is CNN, and the back 3 is an all-connected network, the last layer is the softmax composition of the output decision-making (the number of output nodes equals the number of categories 1000).
Concrete implementation, this article on the structure of some of the improvements are: 1, the use of Relu to replace the traditional Tanh introduced nonlinearity, 2, the use of 2 video card for parallel computing, reduce the need for more video card host data transfer time consumption, in the structure, There is no connection between the front and back layer nodes distributed on different graphics cards, which improves the training speed; 3, the local normalization of the response of the adjacent nodes improves the recognition rate (TOP5 error rate is reduced by 1.2%), 4, the overlapping pooling (TOP5 error rate is reduced by 0.3%);
In addition, in order to reduce the over-fitting, the article uses two ways: 1, data enhancement: the training data to the left and right symmetry and translation transformation, the training data to increase the original 2048 times times; The PCA transformation of the pixel to construct a new sample (this mechanism makes the TOP5 error rate decreased by 1); 2, Dropout:
Optimization algorithm: Optimized with Mini-batch SGD algorithm, each batch128 sample, momentum = 0.9, weight decay = 0.0005
Random initialization weights and bias (see paper for specific random parameters)
Paper Link: http://books.nips.cc/papers/files/nips25/NIPS2012_0534.pdf
Source Address: http://code.google.com/p/cuda-convnet/
Source: >
From for notes (Wiz)