Summary of translation of imagenet classification with Deep convolutional neural networks

Source: Internet
Author: User

alexnet Summary Notes

Thesis: "Imagenet classification with Deep convolutional neural"

1 Network Structure

The network uses the logic regression objective function to obtain the parameter optimization, this network structure as shown in Figure 1, a total of 8 layer network: 5 layer of convolution layer, 3 layer full connection layer, and the front is the image input layer.

1) convolution layer

A total of 5-layer convolution layer, known from the structure diagram, this structure used 2 GPU parallel computing part of the convolution layer, the 2nd, 4, 5 convolution layer is the input is the output from the previous layer of the convolution layer (the data on the same GPU) incoming, and the 3rd layer of the convolution layer input is from the previous layer of the full input of the convolution layer. The 1th, 2, and 5 layers of the convolution layer are followed by the Max pooling layer, and the rest of the convolution layer is not connected to the pool layer.

2) Full connection layer

A total of 3 layer full connection layer, the last one is 1000 categories of classification of the Softmax classifier, the first two full join layer is all the link front layer of all output elements.

Figure 1 Network structure

2 Detail Technology

1) image data set

This network structure requires that the trained and tested image size is a fixed bit 256x256 RGB image.

2) activating function Relu

The activation function used in this network is not a traditional activation function (sigmoid, tanh), but rather a non-linear unsaturated relu function (rectified Linear Units). In the training time, the unsaturated function trains faster than the saturated function, and the Non-linear function not only retains the non-linear expression ability, but also because of its linear property (positive part), compared with the Tanh and sigmoid function in the error transmission, There will be no due to non-linear gradient dispersion image (the top error is large, due to the gradual decline of the error transmission, resulting in low levels of error, resulting in deep network formation weights are very small, resulting in deep network local optimal). The nature of the Relu allows us to train deeper networks.

3 Local response normalization of LRN

The Relu function does not need normalization to prevent saturation, and if no neuron produces a positive activation value, learning will occur in this neuron; however, the authors find that local normalization helps generalization. Normalized formula:

General initialization parameter k=2,n=5, and, here's n is the number of neurons in each layer.

4) overlapping pooling

The pooling area here is z*z=3*3, and the distance is s=2. Compare z=2,s=2 without overlapping, increase the result of 0.4% (feel a little bit) and also find that overlapping pooling is not easy to fit.

5) Reduction of the cross fitting

5.1 Data Augmentation

1, shearing image;

2, change the image of the intensity of the RGB channel;

5.2 Dropout

Dropout means that the weights of some hidden layer nodes of the network are not working at random in model training. The nodes that do not work can be temporarily considered not part of the network structure, but their weights are retained (only temporarily not updated), because the next time the sample is entered it may have to work again (a bit abstract, The concrete realization looks after the experiment part).

Here is generally to take the probability of p=0.5 randomly selected general node to participate in training. In this network, it is mainly applied in the final full connection layer.

6) Training details

The training model is a combination of random gradient descent method and Minibatch, Minbatch 128. Here the formula for updating weights is as follows:

Here d is Minibatch, is the learning rate, V is the momentum variable. Here is the beginning of initialization w is a random satisfied with the mean value of 0, the standard deviation is 0.01 Gaussian distribution. For offset initialization is initialized to 1 in the convolution Layer 2, 4, 5 layer, and the full join layer. The other layers are initialized to 0. For the learning rate is the same as the value of all the layers, the initial initialization to 0.01, each time when the validation error rate stopped to improve, we manually divided the learning rate by 10, from the beginning to the end of the learning rate reduced by three times.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.