The convolution parameters of some classical CNN structures have been statistically studied.
Alexnet
Layer |
Input |
Kernel |
Output |
Stride |
Pad |
1 |
256 * 3 * 227 * 227 |
48 * 3 * 11 * 11 |
256 * 48 * 55 * 55 |
4 |
0 |
2 |
256 * 48 * 27 * 27 |
128 * 48 * 5 * 5 |
256 * 128 * 27 * 27 |
1 |
2 |
3 |
256 * 128 * 13 * 13 |
192 * 128 * 3 * 3 |
256 * 192 * 13 * 13 |
1 |
1 |
4 |
256 * 192 * 13 * 13 |
192 * 192 * 3 * 3 |
256 * 192 * 13 * 13 |
1 |
1 |
5 |
256 * 192 * 13 * 13 |
192 * 192 * 3 * 3 |
256 * 192 * 13 * 13 |
1 |
1 |
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
Over Feat
Layer |
Input |
Kernel |
Output |
Stride |
Pad |
1 |
128 * 3 * 221 * 221 |
96 * 3 * 11 * 11 |
128 * 96 * 106 * 106 |
2 |
0 |
2 |
128 * 96 * 58 * 58 |
256 * 96 * 5 * 5 |
128 * 96 * 54 * 54 |
1 |
0 |
3 |
128 * 96 * 27 *27 |
512 * 96 * 3 * 3 |
128 * 512 * 27 * 27 |
1 |
1 |
4 |
128 * 512 * 27 * 27 |
1024 * 512 * 3 * 3 |
128 * 1024 * 27 * 27 |
1 |
1 |
5 |
128 * 1024 * 27 * 27 |
1024 * 1024 * 3 * 3 |
128 * 1024 * 27 * 27 |
1 |
1 |
Sermanet, Pierre, et al. "overfeat:integrated recognition, localization and detection using convolutional networks." arXiv preprint arxiv:1312.6229 (2013).
Vgg
Layer |
Input |
Kernel |
Output |
Stride |
Pad |
1 |
256 * 3 * 224 * 224 |
64 * 3 * 3 * 3 |
256 * 64 * 222 * 222 |
1 |
0 |
2 |
256 * 64 * 222 * 222 |
64 * 64 * 3 * 3 |
256 * 64 * 220 * 220 |
1 |
0 |
3 |
256 * 64 * 110 * 110 |
128 * 64 * 3 * 3 |
256 * 128 * 108 * 108 |
1 |
0 |
4 |
256 * 128 * 108 * 108 |
128 * 128 * 3 * 3 |
256 * 128 * 106 * 106 |
1 |
0 |
5 |
256 * 128 * 58 * 58 |
256 * 128 * 3 * 3 |
256 * 256 * 56 * 56 |
1 |
0 |
6 |
256 * 256 * 56 * 56 |
256 * 256 * 3 * 3 |
256 * 256 * 54 * 54 |
1 |
0 |
7 |
256 * 256 * 54 * 54 |
256 * 256 * 3 * 3 |
256 * 256 * 52 * 52 |
1 |
0 |
8 |
256 * 256 * 52 * 52 |
256 * 256 * 3 * 3 |
256 * 256 * 52 * 52 |
1 |
1 |
9 |
256 * 256 * 26 * 26 |
512 * 256 * 3 * 3 |
256 * 512 * 24 * 24 |
1 |
0 |
10 |
256 * 512 * 24 * 24 |
512 * 512 * 3 * 3 |
256 * 512 * 22 * 22 |
1 |
0 |
11 |
256 * 512 * 22 * 22 |
512 * 512 * 3 * 3 |
256 * 512 * 20 * 20 |
1 |
0 |
12 |
256 * 512 * 20 * 20 |
512 * 512 * 3 * 3 |
256 * 512 * 18 * 18 |
1 |
0 |
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." ARXIV Preprint arxiv:1409.1556 (2014).
Relationship between Output_size and Input_size/kernel_size/padding/stride
OuT _sIZe=I N_sIZe?K eR Nel _sIZe+2xPad _sIZes t r i d e +1
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Convolution parameters in the Alex/overfeat/vgg