Deep Learning paper Notes (1): Making dropout invariant to transformations of activation functions and inputs

Source: Internet
Author: User

This is a 2014 Nips workshop an article paper. This paper claims they presented invariant dropout, which can be activation shift units for inputs and additive transform (I understand in fact that add additive noise) has invariance.

Usually if additive noise is added to each input unit and activation unit, the variance of the next layer of activation unit (i.e., the linear combination is not non-linear) increases or decreases. In this case, ordinary dropout will get different results with the addition of additive noise, which is the author's so-called variant to additive shift. As shown in the following five formulas:

A_i is the original node:

A_i + phi is the node after additive shift transform:

The variance of the above two cases:

The difference between the variance of two cases:

From this difference can be seen, can be negative, indicating that additive shift transform after the variance may increase or decrease.

The author's solution is to introduce a new variable to each input or activate node, called invariance parameter, Beta_j.

This new parameter Beta_j is learn out. In fact, the equivalent of adding an offset on each node, and additive noise very much like, but additive noise is in the distribution of known parameters sampled, but the beta is learned. So when each node goes through a different additive shift transform, because there is a additive shift that can learned adaptive beta, it makes the final result compare stable, So the authors claim that their method is additive shift transform invariant.

The main idea of this article is to introduce a learned adaptive shift (beta) to each node to counteract the pre-defined additive shift (so called additive shift) that is deliberately added to each node. Transform). The so-called invariant dropout is also just Las + dropout for No-las + dropout (Las refers to learned adaptive shift), and I think if you remove dropout, simply comparing LAS and No-las will also There are similar conclusions, the former also additive shift transform invariant.

The final experiment mainly ran Mnist,cifar-10,street View house Numbers (SVHN) several datasets, invariant dropout results better than regular dropout results.

Deep Learning paper notes (1): Making dropout invariant to transformations of activation functions and inputs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.