Paper notes "Maxout Networks" && "Network in Network"Posted in 2014-09-22 | 1 ReviewsSource
Maxout:http://arxiv.org/pdf/1302.4389v4.pdf
nin:http://arxiv.org/abs/1312.4400
Reference
Maxout and NIN specific content without explanation, you can refer to:
Deep Learning: 45 (Maxout simple Understanding)
Network in Network
Each in a sentence to summarize
- Conventional convolutional layer: Conv→relu
- Maxout:several Conv (full) →max
- Nin:serveral Conv (full) →relu→conv (1x1) →relu
Specific point
- Conventional convolutional layer: Conv→relu
- Conv:conv_out=∑ (x W)
- Relu:y=max (0, Conv_out)
- Maxout:several Conv (full) →max
- Several conv (full): CONV_OUT1 = x w_1, conv_out2 = x w_2, ...
- Max:y = Max (CONV_OUT1, Conv_out2, ...)
- Nin:conv→relu→conv (1x1) →relu
- Several conv (full): CONV_OUT1 = x w_1, conv_out2 = x w_2, ...
- RELU:RELU_OUT1 = max (0, conv_out1), relu_out2 = max (0, conv_out2), ...
- Conv (1x1): Conv_1x1_out = [Relu_out1, Relu_out2, ...] w_1x1
- relu:y = max (0, Conv_1x1_out)
Example sub-explanation
Suppose there is now a 3x3 input, represented by a 9-D vector x, the convolution kernel size is also 3x3, and the 9-dimensional vector w is represented.
- For conventional convolutional layers, the direct X and W convolution, and then relu a bit.
- Maxout, there are K's 3x3 W (k here is free to set), respectively convolution to get the K 1x1 output, and then the K input to find the maximum value
- NIN, there are K 3x3 W (k is also freely set here), respectively convolution to get the K 1x1 output, and then all of them are Relu, and then re-convolution, the results again relu. (This process is equivalent to a small, fully connected network)
Legend
Continue the slag hand-drawn, from top to bottom respectively corresponding to the conventional convolution layer, Maxout,nin:
Summarize
In general, Maxout and Nin are improvements to traditional conv+relu.
Maxout wants to show that it fits any convex function, and can fit any activation function (by default the activation function is convex)
Nin wanted to show that it was not only able to fit any convex function, but could fit any function, because it was essentially a small, fully-connected neural network.
Paper notes "Maxout Networks" && "Network in Network"