Alexnet
2013 Hinton Daniel raised
Million Parameters + 650000 neuron
Five-layer convolution layer (some with a pool layer behind) + three-layer full connection layer + 1000-channel Softmax layer
Put forward the idea that Relu is faster than Tanh.
Relu may not need to be normalized, but local Response normalization is still helpful
The error rates of the model TOP1 and TOP5 were reduced by 1.4% and 1.2% respectively.
The formula for the standard layer:
Overlapping pool (overlapping pooling): overlapping sections between adjacent pool layers
The error rates of the model TOP1 and TOP5 were reduced by 0.4% and 0.3% respectively.
Pooled Data https://my.oschina.net/findbill/blog/550565
Use of GPU
Overall structure
Input image (224*224*3)--96 convolution cores (11*11**3, step 4)
Formula (W-F+2P)/s+1 (here should be (227-11)/4+1=55, the paper itself is not clear)
The spatial size of the convolution kernel is less than the input, but the depth and input data remain the same.
The second layer 55x55x96 neurons, each connected to an area of [11x11x3] in the input data body. The 96 neurons on the depth column are connected to the same [11x11x3] area in the input data body, but the weights are different.
How to reduce the cross fitting. Data enhancement: Using tags-retention conversions to artificially enlarge the DataSet dropout: To set the output of each hidden neuron to zero at a 0.5 probability