Dropout & Maxout

Source: Internet
Author: User

[ML] My Journal from the neural Network to the deep learning:a Brief Introduction to the deep learning. Part. Eightdropout & Maxout This is the 8th post of a series of posts I planned about a journal of myself studying deep learning in Professor Bhik Sha Raj ' s course, deep learning lab. I decided to write these posts as notes of my learning process and I hope these posts can help the others with similar BAC Kground.
Back to Content Page--------------------------------------------------------------------
PDF Version Available here
--------------------------------------------------------------------
In the Last post if we looked at the techniques for convolutional neural networks, we have mentioned dropout as a Techni Que to control sparsity. Here's look at the details of it and let's look at another similar technique called maxout. Again, these techniques is not constrained-to-convolutional neural networks, but can is applied to almost any deep n Etworks, or at least feedforward deep networks.
Dropoutdropout is famous, and powerful, and simple. Despite the fact that dropout was widely used and very powerful, the idea was actually simple:randomly dropping out some of The units while training. One case can is showed as in the following figure.
Figure 1. An illustration of dropout
To state this a little more formally:one each training case, each hidden unit is randomly omitted from the network with a Probability of P. One thing to notice though, the selected dropout units was different for each training instance, and that's why this was more O f a training problem, rather than an architecture problem.
As stated in the Origin paper by Hilton et al, another view to look at dropout makes this solution interesting. Dropout can seen as an efficient-perform model averaging across a large number of different neural networks, whe Re overfitting can avoided with much less cost of computation.
Initially in the paper, dropout is discussed under p=0.5, but the course it could basically set up to any probability.  maxoutmaxout is a idea derived for dropout. It is simply a activation function that takes the max of input, if it works with dropout, it can reinforce the prop Erties dropout Has:improve The accuracy of fast model averaging technique and facilitate optimization. different fro M max-pooling, Maxout is based on a whole hidden layer which is built upon the layer we're interested in, so it's more lik e a layerwise activation function. As stated by The original paper from Ian, with these hidden layers that's only consider the max of input, the network R Emains the same power of universal approximation. The reasoning isn't very different from "what we do in the 3rd post of this series" on universal approximation P Ower.   despite of the fact that Maxout was an idea that works derived on dropout and works better, maxout can only Implemented on Feedforward neural networks like multi-layer percEptron or convolutional neural networks. In contrast, dropout was a fundamental idea, the though simple, the can work for basically any networks. Dropout is more like the "idea of bagging", both in the sense of bagging's ability to increase accuracy by model Averag ING, and in the sense of bagging's widely adaption that can is integrated with almost any machine learning algorithm.&nbsp ;  in this post we had talked about both simple and powerful ideas so can help to increase the accuracy with Model A Veraging technique. In the next post, let's move back to the track of network architectures and start to talk generative models ' network Archi tecture. ----------------------------------------------
If You find the helpful, please cite:
Wang, Haohan, and Bhiksha Raj. "A Survey:time Travel in the deep learning Space:an Introduction to the deep learning Models and how deep learning Models Evolve D from the Initial Ideas, ArXiv preprint arxiv:1510.04781.----------------------------------------------by Haohan Wang
Note:I am still a student learning everything, there may is mistakes due to my limited knowledge. Please feel free to tell me wherever you find incorrect or uncomfortable with. Thank.

Main Reference:

    1. Hinton, Geoffrey E., et al. "Improving neural networks by preventing co-adaptation of feature detectors." ArXiv preprint A rxiv:1207.0580 (2012).
    2. Goodfellow, Ian J., et al. "Maxout networks." ArXiv preprint arxiv:1302.4389 (2013).

Dropout & Maxout

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.