The traditional CNN filter is a generalized linear model (GLM), which we think is a low level of the ability to abstract the GLM. Replacing GLM with a more efficient approximation of a nonlinear function can improve the ability to abstract. When an example is a linear tick, GLM can achieve a good abstraction. However, many cases are linearly irreducible, and the input linear function can be well expressed. This network does not adopt the traditional full-connection layer, but adopts the global average pooling layer. On the one hand, the global average pooling layer can well explain the relationship between feature mappings and categories. On the other hand, the global average pool layer itself is a structured regularizer that is able to naturally block overfitting because of the fact that the parameters are too many to be merged and highly dependent on the dropout regularization.
Network in Network