This is a 2014 Nips workshop an article paper. This paper claims they presented invariant dropout, which can be activation shift units for inputs and additive transform (I understand in fact that add additive noise) has invariance.
Usually if additive noise is added to each input unit and activation unit, the variance of the next layer of activation unit (i.e., the linear combination is not non-linear) increases or decreases. In this case, ordinary dropout will get different results with the addition of additive noise, which is the author's so-called variant to additive shift. As shown in the following five formulas:
A_i is the original node:
A_i + phi is the node after additive shift transform:
The variance of the above two cases:
The difference between the variance of two cases:
From this difference can be seen, can be negative, indicating that additive shift transform after the variance may increase or decrease.
The author's solution is to introduce a new variable to each input or activate node, called invariance parameter, Beta_j.
This new parameter Beta_j is learn out. In fact, the equivalent of adding an offset on each node, and additive noise very much like, but additive noise is in the distribution of known parameters sampled, but the beta is learned. So when each node goes through a different additive shift transform, because there is a additive shift that can learned adaptive beta, it makes the final result compare stable, So the authors claim that their method is additive shift transform invariant.
The main idea of this article is to introduce a learned adaptive shift (beta) to each node to counteract the pre-defined additive shift (so called additive shift) that is deliberately added to each node. Transform). The so-called invariant dropout is also just Las + dropout for No-las + dropout (Las refers to learned adaptive shift), and I think if you remove dropout, simply comparing LAS and No-las will also There are similar conclusions, the former also additive shift transform invariant.
The final experiment mainly ran Mnist,cifar-10,street View house Numbers (SVHN) several datasets, invariant dropout results better than regular dropout results.
Deep Learning paper notes (1): Making dropout invariant to transformations of activation functions and inputs