In 2012, Geoffrey and his student Alex, in order to respond to the doubters, in the imagenet contest shot, refreshing the imageclassification record, laid a deep learning in computer vision status. The story behind us all know, deeplearning eminence, invincible.
The structure Alex used in this competition is known as alexnet. In this part, we first introduce the basic architecture of alexnet, and then analyze the basic principles of the algorithm and the details of the parameters in the later study.
For the 2012 set of datasets Caffe also defined its own structure, known as Caffenet, in which the accuracy was probably increased by 0.2% in the case of iterations of more than 30 w times. Here are two-net charts, which differ in norm1,pool1 and norm2,pool2 in order.
Let's look at the structure of the alexnet. In Alex's paper, the basic structure is
1. Basic structure
A. A total of 8 layers, of which the first 5-storey convolutional, the rear 3-layer full-connected, and the final full-connected layer output is softmax with 1000 outputs, The final optimization goal is to maximize the average multinomiallogistic regression
B. After the first layer of CONV1 and conv2 directly following the Response-nomalizationlayer, that is, the NORM1,NORM2 layer.
C. The operation immediately following each conv layer and full-connected layer is the relu operation.
D. Max pooling operations are followed by the first Norm1,norm2, and the 5th conv layer, which is CONV5
The e.dropout operation is in the last two full-connected layers.
2. Operation Process
A. In the first conv layer (CONV1), Alexnet used 96 11*11*3 kernel to filter stride images in the case of 224*224*3 4. The straightforward point is to use the 11*11 convolution template on three channels, the interval of 4 pixel sampling frequency for the image of the convolution operation. 4 pixels is the distance of the Receptivefield center on the kernel map, which is an empirical data.
The initial number of input neurons is 224*224*3 = 150,528. For each map, the interval is 4, so 224/4 = 56, then minus one of the edges is 55, that is, the map size of this layer is 55*55, and then the number of neurons is 55*55*96 =290400 (the original is 253440, this is not very understanding, you know, please tell)
After the basic conv data is obtained, a relu (RELU1) and Norm (Norm1) transformation is performed, followed by pooling (Pool1), which is passed to the next level as output.
The number of maps in this layer is 96.
B. The second conv layer (CONV2) is the result of the first conv layer (CONV1) after Norm (Norm1) and pool (pool1), and then apply256 the volume of the convolution template.
After pool1, the size of map is halved by int (55/2) = 27, and the number of neurons in this layer is 27*27*256 = 186,642.
The number of maps in this layer is 256.
C. The generation of the third conv layer (CONV3) is similar to the second layer, with the difference being that this layer is apply384 a 3*3 convolution template.
After pool2, the size of map is halved by int (27/2) = 13, and the number of neurons in this layer is 13*13*384 = 64896.
The number of maps in this layer is 384.
D. The fourth conv layer (CONV4) is the third conv layer (CONV3) after a relu (RELU3), and then directly apply384 a 3*3 of the convolution template.
The number of neurons in this layer is 13*13*384 = 64896.
The number of maps in this layer is 384, size or 13*13.
E. The fifth conv layer (CONV5), similar to the fourth generation, is generated only after a Relu (RELU4) is performed on the upper layer, except that the volume template apply here is 256 3*3.
The number of neurons in this layer is 13*13*256 = 43264.
The number of maps in this layer is 256, size or 13*13.
F. The first full-connected layer (FC6) is a full connection after pooling (POOL5) of the previous conv layer (CONV5).
After Pool5, the size of the map is halved by half int (13/2) = 6, the upper level is basically connected to 6*6*256, then all connected to 4,096 nodes, and the final node of this layer is 4,096.
G. The second full-connected layer (FC7) is the result of the last full-connected (FC6) Relu (Relu6) after dropout (DROP6) followed by a full connection.
The number of nodes in this layer is 4,096.
H. The last full-connetcted layer (FC8) is the result of the previous full-connected layer (FC7) Relu (RELU7) and Dropout (DROP7) again after the full connection. The final output is the softmaxloss of the fused label.
The number of nodes in this layer is 1000, corresponding to 1000 classes of objects.
The following is an analysis of the alexnet principle and the implementation details of the algorithm.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.