First, Introduction
Alexnet the last 2 fully connected layers are used with dropout because the fully connected layer is easy to fit, and the convolution layer is not easy to fit.
1. Randomly delete some hidden neurons in the network, keep the input and output neurons unchanged;
2. Forward propagation of the input through the modified network, and then reverse propagation of the error through the modified network;
3. Repeat the above operation for another batch of training samples 1
Second, the function, the principle
1. It takes a lot of time to train a large number of models, and dropout only twice times the training time in Alexnet.
2. Introduction of sparsity. Because some neurons are deleted, some features may depend on the joint action of the implicit nodes of the fixed relationship, which effectively prevents some features from being effective in the presence of other features through dropout.
3. Similar model integration. In the course of dropout training, because of the random deletion of nodes, the network structure is actually existed, but a set of parameters is shared.
AlexNet----Dropout