"CV paper reading" Network in Network

Source: Internet
Author: User

Objective:

By replacing the traditional conv layer with the MLPCONV layer, we can learn more abstract features. The traditional convolution layer is based on the linear combination of the previous layer and then through the nonlinear activation (GLM), the author thinks that the hypothesis of the traditional convolution layer is a linear integrable feature. The MLPCONV layer uses a multilayer perceptron, which is a deep network structure that can approximate any nonlinear function. The abstract features of the high level in the network represent the invariance of the different representations of the same concept (by abstraction we mean, the feature is invariant to the variants of the same conce PT). The tiny neural network slides on the input map, its weights are shared, and the MLPCONV layer can also use the BP algorithm to learn the parameters. The traditional convolution layer (left) is compared with the Mlpcon layer (right) as follows:

Realize:

For nonlinear activation functions, such as a Relu function, K represents a channel subscript, and Xij represents an input area centered on pixels (i,j). In the MLPCONV layer, the rules for each neuron are calculated as

n represents the hierarchy of the network. In the above-mentioned B-diagram, for each neuron, the generated only a single output, and the input is multidimensional (can be understood as multi-channel, each layer in the network is a 1*k vector), the entire process can be regarded as a 1*1*k convolution layer on the K-channel. In some subsequent papers, this method is commonly used to reduce the dimension of the input (not the input space of the image, but the channel dimensionality), so that the non-abstract process can be very good to compress the multidimensional information.

Use the global average pooling layer instead of the FC layer:

Using such a micro-network structure, you can abstract a better local features, so that the feature map and the category is consistent. When the FC layer is removed from the previous layer of the Softmax, there is no parameter optimization in this layer, which can reduce the calculation consumption and reduce the overfitting of this layer.

The process is this: For each feature graph, it calculates its average, and then the averages are formed into a feature vector, which is entered into the subsequent Softmax layer.

Such as:

Summarize the advantages of NIN:

(1) Better local abstraction

(2) Remove the full connection layer, less parameters

(3) Smaller overfitting

"CV paper reading" Network in Network

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.