Learn Lenet5 again, and here's a simple record
Problem definition:
Input: 32x32 grayscale image
Output: 0-9 Digital Recognition results
Total 7 layers (without input, including output layer):
Full connectivity-all connections--all connections--all connected outputs--pooling----------
Key points:
Convolution core: for 5x5
padding: for 0
Step: 1
Note the point:
1. Because the input is a grayscale image, the number of channels can be considered as 1
2. In the second convolution, a combination of different feature is carried out, and the third convolution is carried out in all cumulative combinations (similar to the subsequent convolution, in later convolution, the combination becomes the channel)
(i) Input 32x32
(ii) The first layer of convolution 6 5x5 convolution core padding is 0, with a step length of 1 6 28x28 (number of parameters 6x5x5+6)
(iii) First-tier pooled 2x2 pooling with no overlap of 6 14x14
(iv) The second layer of convolution 16 5x5 convolution core padding is 0, with a step size of 1 16 10x10 (3, 4 and 6 combinations are carried out, a total of 16 cases, of which 3 combinations of 6, 4 combinations of 9, 6 combinations of 1) corresponding parameter is (3x6+4x9+6x1) x5x5+16
(v) Second layer of pooled 2x2 pooling, no overlap 16 5x5
(vi) The third layer convolution 120 5x5 convolution core padding is 0, the step is 1 120 1x1 (in this case, each of the obtained 1x1 value is 16 convolution kernel convolution plus 1 offset items, so the number of parameters is (16x5x5+1) X12 0)
(vii) Full Connection 1 84 (parametric 120x84 +84)
(eight) Full connection 2 Here the original thesis uses the RBF unit Yi=sum ((XJ-WIJ) ^2) 10 (parametric 84x10)
Say a little more:
In the second-tier convolution, it is found that a combination is made, but not a complete combination of the third-tier convolution. The reason for this practice is explained in 2 points in the original paper:
A. In order to make the connection count controllable, personal understanding or to reduce the parameters
B. To break the symmetry, let the different feature map learn a different feature (because their input data is not the same)
The above is probably the Lenet5 of a précis-writers, the corresponding graphic analysis, the internet is also a lot, here is a simple paste:
78274066
You can look at the chart above.
Lenet5 Précis-writers