Reference.
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural Networks [J]. Advances in neural information processing Systems, 2012, 25 (2): 2012.
https://code.google.com/p/cuda-convnet/
Say ashamed, see deep study Fast five months, the previous weeks of paper review just notice alexnet, that decisive use Ah, said lenet although good, that is fast 20 years ago Network structure, Alexnet is the structure of the 2012, no matter how said certainly good ah. Post a history of network structure
This alexnet is what I'm going to talk about today.
Alexnet is the 2012 Imagenet competition winner Alex Krizhevsky design, this network structure and lenet What is the difference?
1 components of convolutional neural networks
The routines are the same, let's first introduce the composition of the Deep Learning convolutional neural Network (convnet).
1.1 convolutional layers (convolutional layer)
This does not say, learned the signal is probably know what the convolution is a thing, do not know the words can refer to the amount ... Do not refer to, learn the basics to go ~ To illustrate the point is that the convolution process is a good simulation of the process of human visual neural system, listen to the teacher said people see things is a convolution process oh, this I can not guarantee oh ~
The dynamic process of convolution 1.2 down-sampling layer (Pooling layers)
Drop sampling is a value to replace an area, this value can be the average value of the region, the maximum, the minimum, and so on, anyway, there is a representative of the good, the purpose of this layer is to reduce the amount of data.
Drop sampling Process 1.3 activating function layer (Activation layer)
The purpose of the activation function is to compress the results of the convolution into a fixed range, so that the range of values that can be maintained one layer at a time is controllable. For example, some common activation functions
- Sigmoid: Control in [0, 1]
- Tanh: Control in [-1, 1]
- ReLU: Control in [0, positive infinity]
- There are a lot of new activation functions, and here's not an example, knowing that their role is OK.
I'm using the Relu activation function.
1.4 Standardized layers (normalization layer)
No big deal, just a formula to standardize the 1.5 full-Connected layer
The whole connection layer gives the feeling that the artificial neural network, all the network nodes are connected with some weight value. This layer usually appears in the back of the CNN, the layer is very long, can be used as a feature vector of the image, but also the paper is to put the entire connection layer into the Svm,rf,adaboost,ann and other traditional classifier, to replace the last CNN softmax layer, then I have done this experiment, The results are not good, I do not know how these Daniel did. A question mark here?
As the name implies, all the nodes are connected, the weight here is very much, because it's all connected. 1.6 Discard layers (dropout layer)
This layer I do not know how to turn, anyway, the role is to put some useless knot to throw away.
- This idea refers to the actual functioning of the human brain, and studies show that the brain is not much activated in the process of analyzing images, specifically which are activated by the previous study, and the memories left behind. Without this dropout layer, our CNN is judging that all the images are the equivalent of all the nodes are activated, so that the reality is inconsistent with the situation, so we have to simulate the brain, a little bit of useless nodes to throw away.
- The role of this layer is to speed up the operation, to prevent overfitting, so that the network more pervasive, more so-called "robustness"-a force, in fact, is better haha:)
- The implementation of the method is to set a threshold value, if the node and the node between the weight above this value, then this is a strong relationship, we reserve, if the weight is lower than this value, indicating that the relationship is not big, we throw it out. This method of implementation is wrong, especially thanks to @hzzzol classmate, correct explanation should be:
- Dropout is in the training process with a certain probability 1-p the output value of the hidden layer node is cleared 0, while using the BP to update the weight value, no longer update the weight value connected to the node. What do you mean, this is the probability of the problem, and the weight of the size of the degree of activation regardless of oh, the knot in the pumping is unconditionally discarded. (Dropout simple comprehension, dropconnect simple understanding)
Because there are too many weights, we throw away some useless stuff.
- This layer often appears in the full-attached layer, because the nodes in the full-join layer are connected too much to the nodes, consuming the vast majority of the memory resources in CNN, which is not necessary.
You see, the last few layers are much wasted!
2 Lenet and Alexnet
The above is a simple description of CNN, of course, more trivial, a complete CNN structure is composed of more than 1 convolution layer, 2) drop sampling layer, 3) activation function layer, 4) the standardized layer, 5) the whole connection layer and 6) throw away the layer is formed in an orderly manner, then this problem, What is the difference between lenet and Alexnet's spelling?
2.1 Lenet again to this classic lenet figure
Too classic!
The successful application of lenet is the recognition of handwritten characters, which is to give a bunch of handwritten Arabic numerals, and use the network to judge what the word is. The application is the post office and other places, as well as the number. In fact, the traditional classifier has been able to do a very good level (the correct rate at 96% Bar probably), that lenet as a new show, the correct rate reached 98%, that at that time is very famous Ah, earned a lot of money, and then convolutional Neural network research began to fire AH. To 2012 appeared alexnet, with convolutional neural network as the core of deep learning began to hot haha.
We can see the lenet in 1) convolution layer, 2) Drop sampling layer (that is subsampling), and 3) Full connection layer, of course, it should be activated by the function layer, but the picture is not painted, then the use should be sigmoid function bar, anyway, not now. You can see that the discard layer (dropout) and the standardized layer just mentioned are useless, why, because
There was no such thing, ==b.
It can be noted that the Lenet
- Input size is 32*32 pixels
- Convolution layer: 3
- Down-Sampling layer: 2 x
- Fully connected layer: one
- Output: 10 categories (probability of a number 0-9)
Then Softmax according to the network output, that is, the image is 0-9 of the probability value of the size to determine how much input, such as the output is a node, The value of 4 is 0.9, the other is 0.001 such, then that is the input of the image is 4, and then according to the probability of the output, we can arrange the input image input a certain class of probability values, from large to small, take 3 for example, the rules of the Imagenet contest is that three of them are right when your network prediction is right, Or it's a prediction error.
2.2 AlexNet
My personal feeling is that alexnet more emphasis on the role of the full connection layer, it used two fully connected layer, then in order to reduce the number of weights, the concept of dropout, the other difference is not actually called the difference
- Input size: 227*227 pixels (due to competition)
- Convolution layer: A lot (because of the need to enter the size)
- Reduced sampling layer: a lot (because of the need to enter the size)
- Standard layer: This is a formula anyway.
- Output: 1000 categories (as required by the competition)
Here to illustrate: do not assume that the number of convolution layer, the number of drop-down sampling layer, the size of the convolution kernel, the number of convolutional cores These network details will have a final training results of what the impact, these will follow the size of your input image to the line. There is nothing to say head, you can also refer to the existing network structure to design, all can. Most of these parameters are manually adjusted to see how the learning results.
Let's put the structure of the alexnet.
Overall alexnet structure diagram
The structure of convolutional neural network is not a simple combination of each layer, it is composed of a "module", within the module, the arrangement of the various layers is fastidious. For example, the structure diagram of alexnet, which is composed of eight modules.
Module One
Module two
Module one and module two are the front sections of CNN,
Convolution-activation function-downsampling-normalization
constitute a computational module, this can be said to be a standard convolution process, the structure of CNN is so, from a macro point of view, is a layer of convolution, a layer of reduced sampling such a loop, the middle of the appropriate insertion of some functions to control the range of values, so that the subsequent loop calculation.
Module three or four
Modules three and four are also two convolution process, the difference is less drop sampling, the reason is related to the size of the input, the characteristics of the data volume has been relatively small, so there is no drop sampling, this does not matter.
Module Five
Module Five is also a convolution process, and module one by one or 21 kind of thing, is repeated repetition. Well, can summarize, the module one to five is actually doing convolution operation, according to the input image size in the appropriate decision which layers to use the drop sampling. And then add some necessary functions to control the values, it is possible. Module five output is actually already asmall piece of 6\ 6 (generally I design are to 1\1 small pieces, because imagenet image is large, so 6\6 is also normal. )
Why the original input 227\227 pixels of the image will become 6\*6 so small, the main reason is due to the reduction of sampling, of course, the convolution layer will also make the image smaller, a layer of down, the image is getting smaller
CNN Process
Module Six
Module seven or eight
Modules six and seven is the so-called full-connection layer, the whole connection layer and the structure of artificial neural network, the knot points super, the connection line is too much, so here leads to a dropout layer, come and go apart from a part of not enough to activate the layer, in fact, I remember the right words of the idea in Ann has already.
Module Eight is the result of an output, combined with the Softmax to make the classification. There are several types of outputs, and each node holds the probability values belonging to that category. :)
All right, it's over, just the sauce.
Googlenet and Vgg I have not looked at now, estimated should be the same routine and Lenet and Alexnet, perhaps the depth of the network added, super-large GPU, or the network is more streamlined, anyway, the whole idea is unchanged, I will look back to see:)
Alexnet Detailed 3