Alexnet Detailed 2

Last Update:2017-05-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Here is an example of the Alexnet, which is officially provided by Caffe.

Directory:

1. Background

2. Introduction to the framework

3. Detailed instructions for the procedure

5. References

Background:

Alexnet was published in 2012 as a golden code, and in the year imagenet the best results, but also after that year, more deeper neural network was proposed, such as excellent vgg,googlelenet.

Its official data model, the accuracy rate reached 57.1%,top 1-5 to reach 80.2%. This is pretty good for traditional machine learning classification algorithms.

Framework Description:

The structural model of the alexnet is as follows:

As shown, with two GPU servers, all will see two flowcharts, and here we describe a CPU server as an example. The model is divided into eight layers, 5 volume base, and 3 fully connected layers, in each convolution layer contains the excitation function relu and local response normalization (LRN) processing, Then after the drop-down sampling (pool processing), let's analyze each layer individually.

3. Detailed Description:

1. For the CONV1 layer, the following

Layer {  name: "Conv1"  type: "Convolution"  bottom: "Data"  Top: "conv1"  param {    lr_mult:1    Decay_mult:1  }  param {    lr_mult:2   decay_mult:0  }  convolution_param {    num_output:    kernel_size:11    stride:4  }}

The process,

1. Enter the image specification for input: 224*224*3 (RGB image), which is actually preprocessed into 227*227*3

2. The 96 size specifications for 11*11 filter filters, or convolution cores, for feature extraction, (PS: The figure appears to be 48 because of the 2 GPU server processing, each server undertook 48).

Special mention is that the original image is RBG image, that is, three channels, our 96 filters are also three channels, that is, we use the actual size specifications for 11*11*3, that is, the original image is color, we extracted the characteristics of color, in the convolution, We will extract the feature graph according to this formula: "Img_size-filter_size"/stride +1 = new_feture_size, so here we get the feature map size:

([227-11]/4 + 1) = 55 Note "" means rounding down. We get the new feature map specification for 55*55, note that the feature map extracted here is colored. This gives the feature map of 96 55*55 size, and is the RGB channel.

It is important to note that when we use filter filter and data for convolution (PS: convolution is [1,2,3]*[1,1,1] = 1*1+2*1+3*1=6, which is the corresponding multiplication and summation), and we use the convolution kernel size is 11* 11, that is, the use of a local link, each connection 11*11 size area, and then get a new feature, again on the basis of convolution, and then get new features, that is, the traditional use of the full link of the shallow-level neural network, through the deepening of the network level of neural networks is to increase the hidden layer, Then one of the neurons in the next hidden layer is multiplied by the weights and biases of the previous network layer, which is the shared weight, which gradually expands the local field of view (shaped like a pyramid) and finally achieves the full link effect. The advantage of this is to save memory, in general, while saving space, consumption time will be increased correspondingly, but in recent years, computer computing speed, such as GPU. has been a good solution to the problem of this time limit.

3. Use the Relu excitation function to ensure that the value range of the feature map is within a reasonable range, such as {0,1},{0,255}

Finally there is an LRN processing, but because I have been useless this thing, so, haha haha, no deep experience, there is no say.

4. The de-sampling processing (pool layer is also called pooling), such as:

Layer {  name: "Pool1"  type: "Pooling"  Bottom: "Norm1"  Top: "Pool1"  pooling_param {    Pool:max    kernel_size:3    stride:2  }}

5. Using LRN, the Chinese translation is localized to local area normalization, and if there are two modes in which LRN is present, the feature map data for the reduced sampling:

5.1 Source default is Across_channels, cross-channel normalization (here I call the weakening), Local_size:5 (the default), indicating that the local weakening in the adjacent five feature maps and each of the values to remove the sum.

The official given is the kernel is 3*3 size, the process is the 3*3 area of data processing (mean, maximum/small value, is the region to mean, the region to maximize, the region to find the minimum), through the reduction of sampling processing, we can get

([55-3]/2 + 1) = 27, that is, to obtain a feature map of 96 27*27, and then with these features, for the input data, the second convolution.

Conv2 layers, such as:

The corresponding Caffe:

Layer {  name: "Conv2"  type: "Convolution"  bottom: "pool1"  Top: "conv2"  param {    lr_mult:1    decay_mult:1  }  param {    lr_mult:2    decay_mult:0  }  convolution_param {    num_output:256    pad:2    Kernel_size:5    group:2  }}

Unlike the Conv2 and CONV1, the CONV2 uses 256 5*5-sized filters to further extract features from the 96*27*27 feature map, but the method of processing differs from CONV1 The filter is the corresponding area of a number of features in the 96 feature graphs multiplied by the corresponding weights, and then the resulting region after the offset to the convolution, such as a point in the filter X11, such as x11*new_x11, need and 96 features in the 1,2,7 feature map x11,new_x11 =1_x_11*1_w_11+2_x_11*2_w_11+7_x_11*7_w_11+bias, after such a convolution, and then in addition to the width of the height of the two sides are filled with 2 pixels, will be to a new 256 feature map. The size of the feature map is:

("27+2*2-5"/1 + 1) = 27, that is, there will be 256 27*27 size of the feature map.

Then perform the relu operation.

And then the "pool" processing of the reduced sample, for example, to get

Layer {  name: "Pool2"  type: "Pooling"  Bottom: "Norm2"  Top: "pool2"  pooling_param {    Pool:max    kernel_size:3    stride:2  }}

Get: "27-3"/2 +1 = 13 that is to get 256 13*13 size of the feature map.

Conv3 layers, such as:

Layer {  name: "Conv3"  type: "Convolution"  bottom: "pool2"  Top: "conv3"  param {    lr_mult:1    decay_mult:1  }  param {    lr_mult:2    decay_mult:0  }  convolution_param {    num_output:384    pad:1    Kernel_size:3  }}

Get a new feature map for "13+2*1-3"/1 +1 = 13, 384 13*13.

The conv3 does not use a drop-down sampling layer.

CONV4 Layer:

Layer {  name: "Conv4"  type: "Convolution"  bottom: "conv3"  Top: "conv4"  param {    lr_mult : 1    decay_mult:1  }  param {    lr_mult:2    decay_mult:0  }  convolution_param {    num_output:384    pad:1    kernel_size:3    group:2  }}

Still

Get a new feature map for "13+2*1-3"/1 +1 = 13, 384 13*13.

The CONV4 does not use a drop-down sampling layer.

CONV5 layers, such as:

View Code

Get 256 13*13 feature maps.

Drop sampling layer pool to prevent overfitting:

Layer {  name: "Pool5"  type: "Pooling"  Bottom: "conv5"  Top: "Pool5"  pooling_param {    Pool:max    kernel_size:3    stride:2  }}

Get: 256 ("13-3"/2 + 1) =6 6*6 size feature map.

Fc6 Full link graph:

Describe this: Use 4,096 neurons here, make a full link to 256 size 6*6 feature map, that is, the 6*6 size of the feature map, the convolution into a feature point, and then for 4,096 neurons in a point, is obtained by multiplying the feature points obtained by the TU product of some features in 256 feature graphs by the corresponding weights, plus a bias.

Layer {  name: "DROP6"  type: "Dropout"  Bottom: "fc6"  Top: "Fc6"  dropout_param {    dropout _ratio:0.5  }}

Alexnet Detailed 2

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Alexnet Detailed 2

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Alexnet Detailed 2

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support