Objective:
By replacing the traditional conv layer with the MLPCONV layer, we can learn more abstract features. The traditional convolution layer is based on the linear combination of the previous layer and then through the nonlinear activation (GLM), the author thinks that the hypothesis of the traditional convolution layer is a linear integrable feature. The MLPCONV layer uses a multilayer perceptron, which is a deep network structure that can approximate any nonlinear function. The abstract features of the high level in the network represent the invariance of the different representations of the same concept (by abstraction we mean, the feature is invariant to the variants of the same conce PT). The tiny neural network slides on the input map, its weights are shared, and the MLPCONV layer can also use the BP algorithm to learn the parameters. The traditional convolution layer (left) is compared with the Mlpcon layer (right) as follows:
Realize:
For nonlinear activation functions, such as a Relu function, K represents a channel subscript, and Xij represents an input area centered on pixels (i,j). In the MLPCONV layer, the rules for each neuron are calculated as
n represents the hierarchy of the network. In the above-mentioned B-diagram, for each neuron, the generated only a single output, and the input is multidimensional (can be understood as multi-channel, each layer in the network is a 1*k vector), the entire process can be regarded as a 1*1*k convolution layer on the K-channel. In some subsequent papers, this method is commonly used to reduce the dimension of the input (not the input space of the image, but the channel dimensionality), so that the non-abstract process can be very good to compress the multidimensional information.
Use the global average pooling layer instead of the FC layer:
Using such a micro-network structure, you can abstract a better local features, so that the feature map and the category is consistent. When the FC layer is removed from the previous layer of the Softmax, there is no parameter optimization in this layer, which can reduce the calculation consumption and reduce the overfitting of this layer.
The process is this: For each feature graph, it calculates its average, and then the averages are formed into a feature vector, which is entered into the subsequent Softmax layer.
Such as:
Summarize the advantages of NIN:
(1) Better local abstraction
(2) Remove the full connection layer, less parameters
(3) Smaller overfitting
"CV paper reading" Network in Network