The history of CNN
In a review of the 2006 Hinton their science Paper, it was mentioned that the 2006, although the concept of deep learning was proposed, but the academic community is still not satisfied. At that time, there was a story of Hinton students on the stage when the paper, machine learning under the Taiwan Daniel Disdain, questioned your things have a theoretical deduction? Is there a mathematical basis? Have you ever done anything like svm? Looking back, Even if it is true, Daniel is really not unreasonable, is the mule is horse out to walk, do not light up a concept.
Time finally to 2012 years, Hinton student Alex Krizhevsky in the bedroom with a GPU dead knock a deep learning model, one swoop off the visual field competition ILSVRC 2012 laurel, in the Million-magnitude imagenet data collection, The effect is significantly more than the traditional method, from the traditional 70% to more than 80%. Personally think, at that time most accord with Hinton their state of mind song not "i don't do eldest brother for many years" Mo belongs.
This deep learning model is the later famous Alexnet model. Why is this alexnet so big? There are three very important reasons:
- A lot of data, deep learning field should thank the Li Feifei team to make such a large collection of labeled Data imagenet;
- GPU, This highly parallel computing artifact really helped the force of the primitive, no artifact in hand, Alex estimated not to engage in too complex models;
- Algorithm improvements, including network depth, data enhancement, ReLU, dropout, and so on, which is described in detail later.
From then on, deep learning out, ILSVRC every year is constantly being learning brush list, 1, as the model becomes more and more deeply, Top-5 error rate is also getting lower, currently down to 3.5% near, and in the same Imagenet data set, The recognition error rate of the human eye is about 5.1%, that is, the ability to recognize the deep learning model has surpassed the human eye. The models in Figure 1 are also a landmark representation of the deep learning vision Development.
Figure 1. ILSVRC Top-5 Error rate over the years
Before we look at the model structures in Figure 1, we need to look at one of the deep-learning Troika ———— Lecun's lenet network Structure. Why to mention LeCun and lenet, because now visually these artifacts are based on convolutional neural network (cnn), and LeCun is CNN huang, Lenet is lecun to create the CNN Classic.
Lenet named after its author name lecun, This kind of naming method is similar to alexnet, and later appeared the network structure named by the organization googlenet, vgg, named after the core algorithm Resnet. Lenet is sometimes referred to as LENET5 or LeNet-5, of which 5 represents a Five-storey model. But don't worry, There's a much older CNN model before Lenet.
The oldest CNN Model
In 1985, Rumelhart and Hinton and others put forward the back-propagation (propagation,bp) algorithm [1] (also said 1986 years, referring to their other article paper:learning representations by Back-propagating errors), so that the training of neural networks is simple and feasible, this article on Google Scholar on the number of citations reached more than 19,000, is still more than Cortes and Vapnic Support-vector Networks a little behind, but with deep learning the recent development of the momentum of view, beyond the Corner.
A few years later, LeCun uses the BP algorithm to train the multilayer neural network to identify the handwritten zip code [2], which is the work of cnn, 2, the use of a number of 5*5 convolution core, but in this article LeCun just said 5*5 adjacent areas as the feeling of wild, No mention of convolutional or convolutional neural networks. [10] readers interested in the original prototype of CNN can also look at the Literature.
Figure 2. The oldest CNN network structure diagram LeNet
The 1998 lenet5[4] marked Cnn's true debut, but the model did not fire for a while, mainly because of the cost of the machine (which was not the GPU at the time), and the other algorithms (SVM, honestly you did it?). ) can achieve similar effects even more Than.
Figure 3. LENET Network Structure
Beginners can also refer to the configuration file in Caffe:
Https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet.prototxt
AlexNet, vgg, googlenet, resnet Contrast
Lenet is mainly used to identify 10 handwritten digits, of course, as long as a little modification can also be used on the imagenet data set, but the effect is Poor. And the follow-up model to be introduced in this paper is the leader of ILSVRC competition over the years, here the specific comparison of alexnet, vgg, googlenet, resnet four Models. As shown in Table 1.
model name |
AlexNet |
Vgg |
googlenet |
ResNet |
First into the lake |
2012 |
2014 |
2014 |
2015 |
Number of layers |
8 |
19 |
22 |
152 |
Top-5 Error |
16.4% |
7.3% |
6.7% |
3.77g |
Data Augmentation |
+ |
+ |
+ |
+ |
Inception (NIN) |
– |
– |
+ |
– |
Number of convolution layers |
5 |
16 |
21st |
151 |
Convolution core size |
11,5,3 |
3 |
7,1,3,5 |
7,1,3,5 |
Total number of connected layers |
3 |
3 |
1 |
1 |
Full connection Layer Size |
4096,4096,1000 |
4096,4096,1000 |
1000 |
1000 |
Dropout |
+ |
+ |
+ |
+ |
Local Response Normalization |
+ |
– |
+ |
– |
Batch Normalization |
– |
– |
– |
+ |
Table 1 Comparison of AlexNet, vgg, googlenet, ResNet AlexNet
next, directly, The alexnet structure diagram is as Follows:
Figure 4. ALEXNET Network Structure
A different perspective:
Figure 5. Alexnet Network Architecture Lite Edition
What are the major changes in alexnet compared to traditional CNN (such as Lenet):
(1) Data Augmentation
Data enhancement, This reference Li Feifei teacher's cs231 course is the Best. Common methods of data enhancement are:
- Random cropping, panning Transformations
- color, Light Transformation
(2) dropout
The dropout method, like data enhancement, is to prevent overfitting. Dropout should be regarded as a great innovation in alexnet, so that Hinton in a long period of time after the talk of dropout said things, and later came out a number of variants, such as Dropconnect.
(3) Relu activation function
Replace the traditional tanh or logistic with RELU. Benefits Include:
- Relu is essentially a piecewise linear model, and the forward calculation is very simple and requires no exponential operations;
- The Relu is also very simple, the reverse propagation gradient, no exponential or division operations;
- Relu is not prone to the problem of gradient divergence, tanh and logistic activation function at both ends of the derivative is prone to near zero, the gradient is more than 0 after the multistage multiplication;
- Relu closed the right side, which will make a lot of hidden layer output of 0, that is, the network becomes sparse, played a similar L1 of the regularization of the role, to a certain extent, to alleviate the fit.
of course, Relu also have shortcomings, such as the left all closed very easy to cause some hidden nodes never turn over the day, so later appeared prelu, random Relu and other improvements, and Relu will easily change the distribution of data, so relu after adding batch Normalization is also a commonly used method of Improvement.
(4) Local Response Normalization
Local Response normalization to hard translation is the partial response normalization, called lrn, is actually the use of adjacent data to do Normalization. This strategy contributes 1.2% of the Top-5 error Rate.
(5) overlapping Pooling
Overlapping means there is overlap, that is, the pooling step is smaller than the corresponding edge of the pooling Kernel. This strategy contributes 0.3% of the Top-5 error Rate.
(6) Multi-gpu Parallel
This is not much to say, than a great force of the Primitive.
Vgg
VGG structure diagram
Figure 6. VGG Series Network Structure
Look at VGG-19 from a different perspective:
Figure 7. VGG-19 Network Architecture Lite Edition
Vgg well inherited the mantle of alexnet, a word: deep, two words: deeper.
Googlenet
Figure 8. GOOGLENET Network Structure
Googlenet remains: no deepest, only Deeper.
The main innovation is his inception, which is a network in network structure, that is, the original node is also a Web. Inception has been in continuous development, has been V2, V3, V4, interested students can consult the relevant Information. Inception structure 9, wherein the 1*1 convolution is mainly used to reduce the dimension, using the inception after the entire network structure width and depth can be expanded, can bring 2-3 times performance Improvement.
Figure 9. Inception structure ResNet
The network structure is shown in 10.
Figure 10. ResNet Network Structure
ResNet remains: no deepest, only deeper (152 layers). It is said that the number of layers has exceeded 1000.
The main innovation in the residual network, 11, in fact, the proposed network is essentially to solve the problem of the depth of the level of the lack of training. The network that draws on the idea of highway networks is equivalent to opening a channel next to make the input direct to the output, while the optimized target is converted from the original fitted output H (x) to the difference H (x)-x of the output and input, where H (x) is the original expected map output of a layer, and X is the Input.
Figure 11. ResNet Network Structure Summarize
Deep learning All the way, we also slowly realized that the model itself is the deep learning study of the heavy, and this review of lenet, AlexNet, googlenet, vgg, ResNet is a classic classic.
With the 2012 Alexnet fame, CNN became the perfect choice for computer vision Applications. currently, CNN has a lot of other tricks, such as the R-CNN series, please look forward to my Love Machine learning website (52ml.net) #deep Learning Review # Next Issue.
This article is just a brief review, the omission of the place please understand, interested can add QQ group together to Learn: 252085834
[references]
[1] DE rumelhart, GE Hinton, RJ Williams, Learning Internal representations by error Propagation. 1985–dtic Document.
[2] Y. lecun, B. boser, J. S. denker, D. Henderson, R. Howard, W. Hubbard and L. D. jackel, "backpropagation appli Ed to handwritten zip code recognition ", Neural computation, Vol. 1, No. 4, pp. 541-551, 1989.
[3] kaiming He, deep residual learning, http://image-net.org/challenges/talks/ilsvrc2015_deep_residual_learning_ Kaiminghe.pdf
[4] Y. lecun, L. bottou, Y. bengio, and P. Haffner. Gradient-based Learning applied to document Recognition. Proceedings of the IEEE, 86 (11): 2278–2324, 1998.
[5] A. krizhevsky, I. sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. In advances in Neural information processing Systems, pages 1106–1114, 2012.
[6] Christian szegedy, Wei Liu, yangqing Jia, Pierre sermanet, Scott E. Reed, Dragomir anguelov, dumitru erhan, Vincent Va nhoucke, Andrew rabinovich:going deeper with Convolutions. CVPR 2015:1-9
[7] Karen simonyan, Andrew zisserman:very deep convolutional Networks for large-scale Image Recognition. CoRR abs/1409.1556 (2014)
[8] kaiming He, Xiangyu Zhang, shaoqing Ren, and Jian Sun. deep residual learning for Image Recognition. IEEE Conference on computer Vision and Pattern recognition (CVPR), 2016
[9] some corresponding caffe-implemented or pre-trained models: Https://github.com/BVLC/caffe https://github.com/BVLC/caffe/wiki/Model-Zoo
[K] K. Fukushima. NEOCOGNITRON:A self-organizing Neural network model for A mechanism of pattern recognition unaffected by shift in Positio N. Biological cybernetics, 36 (4): 93-202, 1980.
#Deep Learning Review # lenet, AlexNet, googlenet, vgg, ResNet