Preface
This article will simply write down the calculation method of the parameters on the convolution neural network, and then compute the parameters of each common neural network. One is to strengthen the understanding of the network structure, on the other hand, the magnitude of the network parameters have a general understanding, can also be used as a memo, lest want to know when to calculate. parameter calculation method
The parameter calculation of the whole connection is not said, relatively simple.
First, the parameter calculation of convolution network is simply mentioned. The following figure is a 32x32x3 input, and then a 5x5x3 convolution is used to compute one of the positions, which is a dot product, so the output is a single scalar value.
Because the convolution operation is implemented through a sliding window, then through the convolution operation, we get a 28x28x1 output.
If I had 6 of the filter above, I would get a 28x28x6 output.
This is one of the most basic convolution operations, so what are the parameters used here? We just need to add up the parameters of each filter, and of course, don't forget to add bias:5x5x3x6 + 6 = 456
Another need to calculate the size of the output after the convolution, from the following figure is very good to understand, with the formula directly calculated. where n is the input image of the size,f is the size,stride of the filter is the sliding step.
And then from the last example in the diagram above, we can see that when the stride is greater than 1 is not necessarily divisible, this time, you need to add a layer of padding on the original image, so that the size of the image changes, and then use the previous formula to calculate on the line.
And then there's a maxpooling operation that changes the input output, but there's no argument. So using the same formula as the calculation of the convolution is OK.
lenet
First, calculate the simplest lenet. The network structure is as follows:
Network Layer (operations) |
input |
Filter |
Stride |
padding |
Output |
Calculation Formula |
Number of parameters |
Input |
32x32x1 |
|
|
|
32x32x1 |
|
0 |
Conv1 |
32x32x1 |
5x5x6 |
1 |
0 |
28x28x6 |
5x5x1x6+6 |
156 |
MaxPool1 |
28x28x6 |
2x2 |
2 |
0 |
14x14x6 |
|
0 |
Conv2 |
14x14x6 |
5x5x16 |
1 |
0 |
10x10x16 |
5x5x6x16+16 |
2416 |
MaxPool2 |
10x10x16 |
2x2 |
2 |
0 |
5x5x16 |
|
0 |
FC1 |
5x5x16 |
|
|
|
120 |
5x5x16x120+120 |
48120 |
FC2 |
120 |
|
|
|
84 |
120x84+84 |
10164 |
FC3 |
84 |
|
|
|
84 |
84x10+10 |
850 |
Total Parameters: 61706
parameter memory consumption: 241.039KB alexnet
Alexnet's structure is a bit strange. But in fact, because the network is divided into two GPU, only to draw a two-layer, the two-tier structure is the same, the following calculation of the structure is equivalent to the merged network.
Network Layer (operations) |
input |
Filter |
Stride |
padding |
Output |
Calculation Formula |
Number of parameters |
Input |
227x227x3 |
|
|
|
227x227x3 |
|
0 |
Conv1 |
227x227x3 |
11x11x96 |
4 |
0 |
55x55x96 |
11x11x3x96+96 |
34944 |
MaxPool1 |
55x55x96 |
3x3 |
2 |
0 |
27x27x96 |
|
0 |
Norm1 |
27x27x96 |
|
|
|
27x27x96 |
|
0 |
Conv2 |
27x27x96 |
5x5x256 |
1 |
2 |
27x27x256 |
5x5x96x256+256 |
614656 |
MaxPool2 |
27x27x256 |
3x3 |
2 |
0 |
13x13x256 |
|
0 |
Norml2 |
13x13x256 |
|
|
|
13x13x256 |
|
0 |
Conv3 |
13x13x256 |
3x3x384 |
1 |
1 |
13x13x384 |
3x3x256x384+384 |
885120 |
Conv4 |
13x13x384 |
3x3x384 |
1 |
1 |
13x13x384 |
3x3x384x384+384 |
1327488 |
Conv5 |
13x13x384 |
3x3x256 |
1 |
1 |
13x13x256 |
3x3x384x256+256 |
884992 |
MaxPool3 |
13x13x256 |
3x3 |
2 |
0 |
6x6x256 |
|
0 |
FC6 |
6x6x256 |
|
|
|
4096 |
6x6x256x4096+4096 |
37752832 |
FC7 |
4096 |
|
|
|
4096 |
4096x4096+4096 |
16781312 |
FC8 |
4096 |
|
|
|
1000 |
4096x1000+1000 |
4097000 |
Total Parameters: 62378344
parameter memory consumption: 237.9545MB Vgg
Vgg are commonly found in 16 and 19 layers, with 16 layers as an example, the following is a model structure diagram.
Network Layer (operations) |
input |
Filter |
Stride |
padding |
Output |
Calculation Formula |
Number of parameters |
Input |
224x224x3 |
|
|
|
224x224x3 |
|
0 |
Conv3-64 |
224x224x3 |
3x3x64 |
1 |
1 |
224x224x64 |
3x3x3x64 + 64 |
1792 |
Conv3-64 |
224x224x64 |
3x3x64 |
1 |
1 |
224x224x64 |
3x3x64x64 + 64 |
36928 |
MaxPool2 |
224x224x64 |
2x2 |
2 |
0 |
112x112x64 |
|
0 |
conv3-128 |
112x112x64 |
3x3x128 |
1 |
1 |
112x112x128 |
3x3x64x128 + 128 |
73856 |
conv3-128 |
112x112x128 |
3x3x128 |
1 |
1 |
112x112x128 |
3x3x128x128 + 128 |
147584 |
MaxPool2 |
112x112x128 |
2x2 |
2 |
0 |
56x56x128 |
|
0 |
conv3-256 |
56x56x128 |
3x3x256 |
1 |
1 |
56x56x256 |
3X3X128X256 + 256 |
295168 |
conv3-256 |
56x56x256 |
3x3x256 |
1 |
1 |
56x56x256 |
3X3X256X256 + 256 |
590080 |
conv3-256 |
56x56x256 |
3x3x256 |
1 |
1 |
56x56x256 |
3X3X256X256 + 256 |
590080 |
MaxPool2 |
56x56x256 |
2x2 |
2 |
0 |
28x28x256 |
|
0 |
conv3-512 |
28x28x256 |
3x3x512 |
1 |
1 |
28x28x512 |
3X3X256X512 + 512 |
1180160 |
conv3-512 |
28x28x512 |
3x3x512 |
1 |
1 |
28x28x512 |
3X3X512X512 + 512 |
2359808 |
conv3-512 |
28x28x512 |
3x3x512 |
1 |
1 |
28x28x512 |
3X3X512X512 + 512 |
2359808 |
MaxPool2 |
28x28x512 |
2x2 |
2 |
0 |
14x14x512 |
|
0 |
conv3-512 |
14x14x512 |
3x3x512 |
1 |
1 |
14x14x512 |
3X3X512X512 + 512 |
2359808 |
conv3-512 |
14x14x512 |
3x3x512 |
1 |
1 |
14x14x512 |
3X3X512X512 + 512 |
2359808 |
conv3-512 |
14x14x512 |
3x3x512 |
1 |
1 |
14x14x512 |
3X3X512X512 + 512 |
2359808 |
MaxPool2 |
14x14x512 |
2x2 |
2 |
0 |
7x7x512 |
|
0 |
FC1 |
7x7x512 |
|
|
|
4096 |
7x7x512x4096 + 4096 |
102764544 |
FC2 |
4096 |
|
|
|
4096 |
4096*4096 + 4096 |
16781312 |
FC3 |
4096 |
|
|
|
1000 |
4096*1000 + 1000 |
4097000 |
Total Parameters: 138357544
parameter memory consumption: 527.7921MB googlenet
Googlenet proposed the concept of inception to increase the depth and width of the network and improve the performance of the deep neural network. The following is the GOOGLENET network structure:
The structure of the inception is as follows:
As you can see, the inception structure is a combination of multiple convolution stacks.
Also, from the above network structure, you can see a total of three output of the classification layer:
This is to solve the problem of the gradient disappearing in the deep network training, so we have added several full connection layers in the middle of the training.
Finally, attach a structure diagram of the model given in the paper:
In this diagram, we have given the number of parameters and the memory used, but I still say the inception module calculation methods and some considerations. First is the input, the size of the input should be 224x224x3 attention to the first layer of convolution, not marked padding, directly calculated, the result is wrong, here padding calculation method and TensorFlow convolution method padding parameter set to ' SAME ' is the same. In simple terms, it is ceil (size/kernel_size), which is the same for the following calculations, in short, is to fill in the appropriate 0, so that the output and the above figure should be the corresponding.
3. In the above figure the 5~10 column corresponds to the inception module of each convolution operation, the corresponding value is the number of feature output, for maxpool operation, his padding is 2,stride 1.
4. When a inception module is finished, its output is connected to the output of each convolution operation, that is, if the output is 28x28x64, 28x28x128, 28x28x32, 28x28x32, then the final output is 28x28x (63+128 +32+32).
The following figure shows the output of the internal calculation of inception module.
It can be seen that the number of googlenet is much less than vgg, but the effect is more excellent. ResNet
About ResNet, I do not intend to calculate the parameters, because the amount is very large, and in fact, the basic structure of resnet is relatively simple, the calculation method and the front of no difference. Here is a simple picture of the structure.
It can be seen that if there is no middle line, in fact, is a very deep ordinary convolution network, the middle of the line can ensure that the gradient can be passed to the lower levels to prevent the gradient disappeared problem.