Network in Network learning notes
-lenet and other traditional CNN network of the convolution layer is actually using linear filter to the image of the internal product operation, after each local output followed by a non-linear activation function, the end is called the feature map. And the convolution filter is a generalized linear model. So using CNN for feature extraction, it implicitly assumes that the characteristics are linear and can be divided, but the actual problem is often difficult to linear.
GLM: (Generalized linear model) generalized linear models:
What kind of model is higher in abstraction level? Of course, there are nonlinear function approximations that are more expressive than linear models (e.g. MLP multilayer perceptual neural networks).
Advantages of MLP:
1. A very effective universal function approximation
2. The BP algorithm can be trained to seamlessly integrate into CNN
3. It is also a depth model, which can feature reuse
-innovative point of the network in Network network : Improved traditional CNN architecture. It is said that each convolution layer is replaced by a small number of multilayer fully connected neural networks (i.e., multilayer perceptron, MLP), which can be used to approximate any function rather than simply convolution operation. Eliminate the traditional CNN's full connection layer, the penultimate layer is feature maps, each feature map corresponds to a class, the last pair has been successive to the Softmax layer output various probabilities. - The difference between the local sensory field of common convolution and the MLP structure.
In the previous convolution layer, our local perception of the operation of the wild window could be understood as a Single-layer network, as shown in the following illustration:
The network structure of the MLPCONV layer is as follows:
To sum up in a word
1. Regular convolution layer: Conv→relu
2. Maxout:several Conv (full) →max
3. Nin:serveral Conv (full) →relu→conv (1x1) →relu
A little more specific .
Regular convolution layer: Conv→relu
Conv:conv_out=∑ (x W)
Relu:y=max (0, Conv_out)
Maxout:several Conv (full) →max
Several conv (full): CONV_OUT1 = x w_1, conv_out2 = x w_2, ...
Max:y = Max (CONV_OUT1, Conv_out2, ...)
Nin:conv→relu→conv (1x1) →relu
Several conv (full): CONV_OUT1 = x w_1, conv_out2 = x w_2, ...
RELU:RELU_OUT1 = max (0, conv_out1), relu_out2 = max (0, conv_out2), ...
Conv (1x1): Conv_1x1_out = [Relu_out1, Relu_out2, ...] * w_1x1
relu:y = max (0, Conv_1x1_out)
give an example to explain
Suppose there is now a 3x3 input, which is represented by a 9-D vector x, the convolution kernel size is also the 3x3, also the 9-D vector W represents.
For the conventional convolution layer, direct X and W for convolution, and then relu a bit better. Maxout, there are K's 3x3 W (where the k is freely set), the output of K-1x1 is obtained respectively, and then the K-input is NIN, with K-3x3 W (where K is also freely set), the output of K-1x1 is obtained from the convolution, and then Relu And then relu them again, and the result is again. (This process is equivalent to a small, fully connected network)
legend
Continue the slag hand-painted, from top to bottom respectively corresponding to the conventional convolution layer, Maxout,nin:
The following is a code example for the MLPCONV layer in Caffe:
Layers {bottom: "Data" Top: "CONV1" Name: "Conv1" type:convolution blobs_lr:1 blobs_lr:2 weight_decay:1 weight_decay:0 Convolution_param {num_output:96 kernel_size:11 stride:4 weight_filler {ty PE: ' Gaussian ' mean:0 std:0.01} bias_filler {type: ' Constant ' value:0}}} Lay ers {bottom: ' conv1 ' top: ' conv1 ' name: ' relu0 ' Type:relu} layers {bottom: ' conv1 ' top: ' CCCP1 ' name: ' C CCP1 "Type:convolution blobs_lr:1 blobs_lr:2 weight_decay:1 weight_decay:0 convolution_param {Num_ou
tput:96 kernel_size:1 stride:1 Weight_filler {type: "Gaussian" mean:0 std:0.05} Bias_filler {type: ' Constant ' value:0}}} layers {bottom: ' cccp1 ' top: ' CCCP1 ' name: ' Relu 1 "Type:relu} layers {bottom: CCCP1" top: "CCCP2" Name: "CCCP2" Type:convolution blobs_lr:1 BLOBS_LR: 2 Weight_decay:1 Weight_decay:0 Convolution_param {num_output:96 kernel_size:1 stride:1 weight_filler {type: "G
Aussian ' mean:0 std:0.05} bias_filler {type: ' Constant ' value:0}}} layers { Bottom: "CCCP2" Top: "CCCP2" Name: "RELU2" Type:relu}
The composite network structure of the MLPCONV layer is shown as follows:
- Summing up overall, maxout and Nin are all improvements to the traditional conv+relu. Maxout wants to show that it can fit any convex function, can also be fitted with any activation function (the default activation function is convex) Nin wants to show that it not only fits any convex function, but can fit any function, because it is essentially a small, fully connected neural network.