Network in Network notes

Last Update:2018-07-24 Source: Internet

Author: User

Tags constant

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Network in Network learning notes

-lenet and other traditional CNN network of the convolution layer is actually using linear filter to the image of the internal product operation, after each local output followed by a non-linear activation function, the end is called the feature map. And the convolution filter is a generalized linear model. So using CNN for feature extraction, it implicitly assumes that the characteristics are linear and can be divided, but the actual problem is often difficult to linear.

GLM: (Generalized linear model) generalized linear models:

What kind of model is higher in abstraction level? Of course, there are nonlinear function approximations that are more expressive than linear models (e.g. MLP multilayer perceptual neural networks).

Advantages of MLP:
1. A very effective universal function approximation
2. The BP algorithm can be trained to seamlessly integrate into CNN
3. It is also a depth model, which can feature reuse

-innovative point of the network in Network network : Improved traditional CNN architecture. It is said that each convolution layer is replaced by a small number of multilayer fully connected neural networks (i.e., multilayer perceptron, MLP), which can be used to approximate any function rather than simply convolution operation. Eliminate the traditional CNN's full connection layer, the penultimate layer is feature maps, each feature map corresponds to a class, the last pair has been successive to the Softmax layer output various probabilities. - The difference between the local sensory field of common convolution and the MLP structure.

In the previous convolution layer, our local perception of the operation of the wild window could be understood as a Single-layer network, as shown in the following illustration:

The network structure of the MLPCONV layer is as follows:

To sum up in a word
1. Regular convolution layer: Conv→relu
2. Maxout:several Conv (full) →max
3. Nin:serveral Conv (full) →relu→conv (1x1) →relu

A little more specific .

Regular convolution layer: Conv→relu
Conv:conv_out=∑ (x W)
Relu:y=max (0, Conv_out)

Maxout:several Conv (full) →max
Several conv (full): CONV_OUT1 = x w_1, conv_out2 = x w_2, ...
Max:y = Max (CONV_OUT1, Conv_out2, ...)

Nin:conv→relu→conv (1x1) →relu
Several conv (full): CONV_OUT1 = x w_1, conv_out2 = x w_2, ...
RELU:RELU_OUT1 = max (0, conv_out1), relu_out2 = max (0, conv_out2), ...
Conv (1x1): Conv_1x1_out = [Relu_out1, Relu_out2, ...] * w_1x1
relu:y = max (0, Conv_1x1_out)

give an example to explain

Suppose there is now a 3x3 input, which is represented by a 9-D vector x, the convolution kernel size is also the 3x3, also the 9-D vector W represents.

For the conventional convolution layer, direct X and W for convolution, and then relu a bit better. Maxout, there are K's 3x3 W (where the k is freely set), the output of K-1x1 is obtained respectively, and then the K-input is NIN, with K-3x3 W (where K is also freely set), the output of K-1x1 is obtained from the convolution, and then Relu And then relu them again, and the result is again. (This process is equivalent to a small, fully connected network)

legend
Continue the slag hand-painted, from top to bottom respectively corresponding to the conventional convolution layer, Maxout,nin:

The following is a code example for the MLPCONV layer in Caffe:

Layers {bottom: "Data" Top: "CONV1" Name: "Conv1" type:convolution blobs_lr:1 blobs_lr:2 weight_decay:1 weight_decay:0 Convolution_param {num_output:96 kernel_size:11 stride:4 weight_filler {ty PE: ' Gaussian ' mean:0 std:0.01} bias_filler {type: ' Constant ' value:0}}} Lay ers {bottom: ' conv1 ' top: ' conv1 ' name: ' relu0 ' Type:relu} layers {bottom: ' conv1 ' top: ' CCCP1 ' name: ' C CCP1 "Type:convolution blobs_lr:1 blobs_lr:2 weight_decay:1 weight_decay:0 convolution_param {Num_ou
    tput:96 kernel_size:1 stride:1 Weight_filler {type: "Gaussian" mean:0 std:0.05} Bias_filler {type: ' Constant ' value:0}}} layers {bottom: ' cccp1 ' top: ' CCCP1 ' name: ' Relu 1 "Type:relu} layers {bottom: CCCP1" top: "CCCP2" Name: "CCCP2" Type:convolution blobs_lr:1 BLOBS_LR: 2 Weight_decay:1 Weight_decay:0 Convolution_param {num_output:96 kernel_size:1 stride:1 weight_filler {type: "G
  Aussian ' mean:0 std:0.05} bias_filler {type: ' Constant ' value:0}}} layers { Bottom: "CCCP2" Top: "CCCP2" Name: "RELU2" Type:relu}

The composite network structure of the MLPCONV layer is shown as follows:

- Summing up overall, maxout and Nin are all improvements to the traditional conv+relu. Maxout wants to show that it can fit any convex function, can also be fitted with any activation function (the default activation function is convex) Nin wants to show that it not only fits any convex function, but can fit any function, because it is essentially a small, fully connected neural network.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More