Dropout & Maxout

Last Update:2015-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

[ML] My Journal from the neural Network to the deep learning:a Brief Introduction to the deep learning. Part. Eightdropout & Maxout This is the 8th post of a series of posts I planned about a journal of myself studying deep learning in Professor Bhik Sha Raj ' s course, deep learning lab. I decided to write these posts as notes of my learning process and I hope these posts can help the others with similar BAC Kground.
Back to Content Page--------------------------------------------------------------------
PDF Version Available here
--------------------------------------------------------------------
In the Last post if we looked at the techniques for convolutional neural networks, we have mentioned dropout as a Techni Que to control sparsity. Here's look at the details of it and let's look at another similar technique called maxout. Again, these techniques is not constrained-to-convolutional neural networks, but can is applied to almost any deep n Etworks, or at least feedforward deep networks.
Dropoutdropout is famous, and powerful, and simple. Despite the fact that dropout was widely used and very powerful, the idea was actually simple:randomly dropping out some of The units while training. One case can is showed as in the following figure.
Figure 1. An illustration of dropout
To state this a little more formally:one each training case, each hidden unit is randomly omitted from the network with a Probability of P. One thing to notice though, the selected dropout units was different for each training instance, and that's why this was more O f a training problem, rather than an architecture problem.
As stated in the Origin paper by Hilton et al, another view to look at dropout makes this solution interesting. Dropout can seen as an efficient-perform model averaging across a large number of different neural networks, whe Re overfitting can avoided with much less cost of computation.
Initially in the paper, dropout is discussed under p=0.5, but the course it could basically set up to any probability. maxoutmaxout is a idea derived for dropout. It is simply a activation function that takes the max of input, if it works with dropout, it can reinforce the prop Erties dropout Has:improve The accuracy of fast model averaging technique and facilitate optimization. different fro M max-pooling, Maxout is based on a whole hidden layer which is built upon the layer we're interested in, so it's more lik e a layerwise activation function. As stated by The original paper from Ian, with these hidden layers that's only consider the max of input, the network R Emains the same power of universal approximation. The reasoning isn't very different from "what we do in the 3rd post of this series" on universal approximation P Ower. despite of the fact that Maxout was an idea that works derived on dropout and works better, maxout can only Implemented on Feedforward neural networks like multi-layer percEptron or convolutional neural networks. In contrast, dropout was a fundamental idea, the though simple, the can work for basically any networks. Dropout is more like the "idea of bagging", both in the sense of bagging's ability to increase accuracy by model Averag ING, and in the sense of bagging's widely adaption that can is integrated with almost any machine learning algorithm.&nbsp ; in this post we had talked about both simple and powerful ideas so can help to increase the accuracy with Model A Veraging technique. In the next post, let's move back to the track of network architectures and start to talk generative models ' network Archi tecture. ----------------------------------------------
If You find the helpful, please cite:
Wang, Haohan, and Bhiksha Raj. "A Survey:time Travel in the deep learning Space:an Introduction to the deep learning Models and how deep learning Models Evolve D from the Initial Ideas, ArXiv preprint arxiv:1510.04781.----------------------------------------------by Haohan Wang
Note:I am still a student learning everything, there may is mistakes due to my limited knowledge. Please feel free to tell me wherever you find incorrect or uncomfortable with. Thank.

Main Reference:

Hinton, Geoffrey E., et al. "Improving neural networks by preventing co-adaptation of feature detectors." ArXiv preprint A rxiv:1207.0580 (2012).
Goodfellow, Ian J., et al. "Maxout networks." ArXiv preprint arxiv:1302.4389 (2013).

Dropout & Maxout

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Dropout & Maxout

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Dropout & Maxout

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support