theoretical knowledge : Deep learning: 41 (Dropout simple understanding), in-depth learning (22) dropout shallow understanding and implementation, "improving neural networks by preventing Co-adaptation of feature detectors "
Feel there is nothing to say, should be said in the citation of the two blog has been made very clear, direct test it
Note :
1. During the testing phase of the model, the output of the hidden layer is obtained by using "mean network", in fact, the output value of the hidden layer node is halved (if the ratio of dropout is p=50%) before the network is propagated to the output layer for the following reasons:
At test time, we use the "mean Network", which contains all of the hidden units but with their outgoing weights halved to co Mpensate for the fact, twice as many of them are active.
That is, because the probability of neurons being activated is increased by twice times (because the number of neurons being activated is half as much), the weight of the neuron is reduced by half in order to compensate for this.
Of course, this compensation can be done at the time of training, the X will be magnified (divided by 1-p), or in the test, the weight is reduced (multiplied by the probability p).
2. Deep Learning: 41 (Dropout simple to understand) One thing is easy to misunderstand:
Deep Learning: 41 (Dropout simple comprehension) nn.dropoutfraction and depth learning in experiments (22) dropout shallow comprehension and implementation the level in the experiment refers to the probability that the neuron is dropout (that is, discarded), and the paper The probability p in "dropout:a simple means toprevent neural networks from overfitting" refers to the probability that neurons are present (i.e., not dropout). That is: p=1-dropoutfraction = Retain_prob = 1-level. Do not understand this, when looking at the code is easy to misunderstand deep learning: 41 (Dropout easy to understand) the code in the experiment and the paper "Dropout:a simple-to-prevent neural networks from Overfitting"is not the same, but in fact it is the same.
So, in the paper "Dropout:a simple-to-prevent neural networks from overfitting" has been noted:
After shielding some neurons from the above so that their activation value is 0, we also need to rescale the vector x1......x1000, that is, multiply by 1/(1-P). If you are in training, after 0, no x1......x1000 is rescale, then you need to rescale the weights when testing:
3. What is clearly stated in the paper is why it is in the code:
%Dropout if0) if(nn.testing) = nn.a{i}.* ( 1 - nn.dropoutfraction); Else = (rand (Size (Nn.a{i})) >nn.dropoutfraction); = nn.a{i}.*nn.dropoutmask{i}; End End
That is, is the activation value good with P?
Answer: Because there are
It is known from the above formula that the W is multiplied by p to be equal to the Z superior with P.
4.Deep Learning: 41 (Dropout simple comprehension) in the experiment, the following code shows what D means:
Code in the inverse propagation function nnbp.m:
if (nn.dropoutfraction>0) = D{i}. * [Ones (Size (D{i},1),1) nn.dropoutmask{i}]; End
A: where d is the residual or error delta
advantages and disadvantages of 5.dropout :
For:
Pros: Can be used to prevent overfitting when training data is low
Disadvantage: The training time will be extended, but does not affect the test time
some MATLAB functions
Use RNG in 1.matlab to replace the popular interpretation of rand (' seed ', SD), Randn (' seed ', SD) and rand (' state ', SD)
Experiment
What I did was experiment was repeated deep learning: 41 (Dropout simple comprehension) in the experiment, the result is the same, specifically to see the blog post
Reference documents:
Dropout:a simple-to- prevent neural networks from overfitting [Paper] [BibTeX] [Code]
Imagenet classification with deep convolutional neural networks
Improving neural Networks with dropout
Deep Learning 23:dropout Understanding _ Reading Paper "Improving neural networks by preventing co-adaptation of feature detectors"