Vggnet is from alexnet, mainly to explore the relationship between the depth and performance of convolutional neural networks, by repeatedly stacking 3x3 convolution cores (c with 1x1 convolution cores, only C, C is 16 layers) and 2x2 maximum pooling layer, Vggnet constructs a 16-19-layer deep convolutional neural network.
3x3 Convolution Core: the smallest size to capture the notion of left/right,up/down,center
1x1 convolution core: can be seen as a linear transformation of th input channels (followed by non-linearity)
The convolution stride of the entire network is fixed to 1, and all active functions of the hidden layer are relu.
The Vggnet network stack stride is a 1 3x3 convolution core. The local sensation field of two 3x3 convolution cores is equivalent to a 5x5 local sensation field, and the local sensation field of 3 3x3 convolution cores is equivalent to 7x7 local sensation field. Stacked 3x3 cores have two advantages over direct use of a large convolution core:
1. Increase nonlinearity, as each layer of convolutional nuclei has a non-linear activation function The 3x3 convolution kernel of the relu,3 layer has two nonlinear transformations than the 7x7 convolution core, which makes the decision function more discriminative
2. Reduced the parameters. Assuming that the number of channels is c,3 layer 3x3 convolution core parameter number is 3*3*C*C, 1 layer 7x7 convolution core parameter number is 7*7*c*c.
1x1 convolution kernel in vggnet: Googlenet also uses 1x1 convolution nuclei, but the difference is that the vggnet aim is to increase nonlinearity without dimensionality reduction. The input and output dimensions of a 1x1 convolution kernel are required to be equal in vggnet.
http://blog.csdn.net/wcy12341189/article/details/56281618 explanation Vgg
Calculation of the number of http://blog.csdn.net/u014114990/article/details/51125776 multi-channel parameters
Very Deep convolutional Networks for large-scale Image recognition (vggnet)