Training Deep Neural Networks

Source: Internet
Author: User

Http://handong1587.github.io/deep_learning/2015/10/09/training-dnn.html//reprinted in Training deep neural NetworksPublished: The Oct Category: deep_learning Tutorials

Popular Training approaches of Dnns?—? A Quick Overview

https://medium.com/@asjad/POPULAR-TRAINING-APPROACHES-OF-DNNS-A-QUICK-OVERVIEW-26EE37AD7E96#.PQYO039BB

Activation functions

Rectified linear units improve restricted Boltzmann machines (ReLU)

    • Paper:http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_nairh10.pdf

Rectifier nonlinearities Improve Neural Network acoustic Models (Leaky-relu, aka Lrelu)

    • Paper:http://ai.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf

Delving deep to rectifiers:surpassing human-level performance on ImageNet classification (PRELU)

    • Keywords:prelu, Caffe "MSRA" weights initilization
    • arxiv:http://arxiv.org/abs/1502.01852

Empirical Evaluation of rectified activations in convolutional Network (Relu/lrelu/prelu/rrelu)

    • arxiv:http://arxiv.org/abs/1505.00853

Deep learning with s-shaped rectified Linear Activation Units (Srelu)

    • arxiv:http://arxiv.org/abs/1512.07030

Parametric Activation pools greatly increase performance and consistency in Convnets

    • blog:http://blog.claymcleod.io/2016/02/06/ parametric-activation-pools-greatly-increase-performance-and-consistency-in-convnets/

Noisy Activation Functions

    • arxiv:http://arxiv.org/abs/1603.00391
Weights initialization

An explanation of Xavier initialization

    • Blog:http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization

Deep neural Networks with Random Gaussian weights:a Universal classification strategy?

    • arxiv:http://arxiv.org/abs/1504.08291

All need are a good init

    • arxiv:http://arxiv.org/abs/1511.06422
    • Github:https://github.com/ducha-aiki/lsuvinit

Data-dependent initializations of convolutional neural Networks

    • arxiv:http://arxiv.org/abs/1511.06856
    • Github:https://github.com/philkr/magic_init

What is good initial weights in a neural network?

    • Stackexchange:http://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network

Randomout:using a convolutional gradient norm to win the Filter lottery

    • arxiv:http://arxiv.org/abs/1602.05931
Batch Normalization

Batch normalization:accelerating Deep Network Training by reducing Internal covariate Shift (ImageNet top-5 error:4.82%)

    • arxiv:http://arxiv.org/abs/1502.03167
    • blog:https://standardfrancis.wordpress.com/2015/04/16/batch-normalization/
    • notes:http://blog.csdn.net/happynear/article/details/44238541

Weight normalization:a simple reparameterization to accelerate Training of deep neural Networks

    • arxiv:http://arxiv.org/abs/1602.07868
    • GitHub (lasagne): Https://github.com/TimSalimans/weight_norm
    • notes:http://www.erogol.com/my-notes-weight-normalization/

Normalization propagation:a parametric technique for removing Internal covariate Shift in deep Networks

    • arxiv:http://arxiv.org/abs/1603.01431
Loss Function

The Loss surfaces of multilayer Networks

    • arxiv:http://arxiv.org/abs/1412.0233
Optimization Methods

On optimization Methods for deep learning

    • Paper:http://www.icml-2011.org/papers/210_icmlpaper.pdf

On the importance of initialization and momentum in deep learning

    • Paper:http://jmlr.org/proceedings/papers/v28/sutskever13.pdf

Invariant backpropagation:how to train a transformation-invariant neural network

    • arxiv:http://arxiv.org/abs/1502.04434
    • Github:https://github.com/sdemyanov/convnet

A practical theory for designing very deep convolutional neural network

    • kaggle:https://www.kaggle.com/c/datasciencebowl/forums/t/13166/happy-lantern-festival-report-and-code/69284
    • paper:https://kaggle2.blob.core.windows.net/forum-message-attachments/69182/2287/a%20practical%20theory%20for% 20designing%20very%20deep%20convolutional%20neural%20networks.pdf?sv=2012-02-12&se=2015-12-05t15%3a40% 3a02z&sr=b&sp=r&sig=kfbqkdua1pdtu837y9iqyrp2vyittv0hcgoeook9e3e%3d
    • Slides:http://vdisk.weibo.com/s/3nfsznjlkn

Stochastic Optimization Techniques

    • Intro:sgd/momentum/nag/adagrad/rmsprop/adadelta/adam/esgd/adasecant/vsgd/rprop
    • Blog:http://colinraffel.com/wiki/stochastic_optimization_techniques

Alec Radford ' s animations for optimization algorithms

Http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html

Faster asynchronous SGD (FASGD)

    • arxiv:http://arxiv.org/abs/1601.04033
    • Github:https://github.com/doctorteeth/fred

An overview of Gradient descent optimization algorithms (★★★★★)

    • blog:http://sebastianruder.com/optimizing-gradient-descent/

Exploiting the structure:stochastic Gradient Methods Using Raw Clusters

    • arxiv:http://arxiv.org/abs/1602.02151

Writing fast asynchronous Sgd/adagrad with Rcppparallel

    • blog:http://gallery.rcpp.org/articles/rcpp-sgd/
Regularization

Disturblabel:regularizing CNN on the Loss Layer [University of California & MSR] (2016)

    • Intro: "An extremely-algorithm which randomly replaces a part of the labels as incorrect values in each iteration"
    • Paper:http://research.microsoft.com/en-us/um/people/jingdw/pubs/cvpr16-disturblabel.pdf
Dropout

Improving neural networks by preventing co-adaptation of feature detectors (dropout)

    • arxiv:http://arxiv.org/abs/1207.0580

Regularization of neural Networks using Dropconnect

    • homepage:http://cs.nyu.edu/~wanli/dropc/
    • Gitxiv:http://gitxiv.com/posts/rjucpiqidhq7hkzox/regularization-of-neural-networks-using-dropconnect
    • Github:https://github.com/iassael/torch-dropconnect

regularizing neural networks with dropout and with Dropconnect

    • blog:http://fastml.com/regularizing-neural-networks-with-dropout-and-with-dropconnect/

Fast Dropout Training

    • Paper:http://jmlr.org/proceedings/papers/v28/wang13a.pdf
    • Github:https://github.com/sidaw/fastdropout

Dropout as Data augmentation

    • paper:http://arxiv.org/abs/1506.08700
    • Notes:https://www.evernote.com/shard/s189/sh/ef0c3302-21a4-40d7-b8b4-1c65b8ebb1c9/24ff553fcfb70a27d61ff003df75b5a9

A theoretically grounded application of dropout in recurrent neural Networks

    • arxiv:http://arxiv.org/abs/1512.05287
    • Github:https://github.com/yaringal/bayesianrnn

Improved dropout for shallow and deep learning

    • arxiv:http://arxiv.org/abs/1602.02220
Gradient descent

Fitting a model via closed-form equations vs. Gradient descent vs Stochastic Gradient descent vs Mini-batch learning. What's the difference? (Normal equations vs. GD vs. SGD vs. Mb-gd)

Http://sebastianraschka.com/faq/docs/closed-form-vs-gd.html

An Introduction to Gradient descent in Python

    • Blog:http://tillbergmann.com/blog/articles/python-gradient-descent.html

Train faster, generalize better:stability of stochastic gradient descent

    • arxiv:http://arxiv.org/abs/1509.01240

A variational Analysis of Stochastic Gradient algorithms

    • arxiv:http://arxiv.org/abs/1602.02666

The vanishing Gradient Problem:oh no?—? a obstacle to deep learning!

    • Blog:https://medium.com/a-year-of-artificial-intelligence/rohan-4-the-vanishing-gradient-problem-ec68f76ffb9b#.50hu5vwa8

Gradient Descent for machine learning

http://machinelearningmastery.com/gradient-descent-for-machine-learning/

Revisiting distributed synchronous SGD

    • arxiv:http://arxiv.org/abs/1604.00981
Accelerate Training

Acceleration of deep neural Network Training with resistive cross-point Devices

    • arxiv:http://arxiv.org/abs/1603.07341
Image Data Augmentation

Dataaugmentation ver1.0:image Data Augmentation tool for training of Image recognition algorithm

    • Github:https://github.com/takmin/dataaugmentation

Caffe-data-augmentation:a Branc Caffe with feature of Data augmentation using a configurable stochastic combination of 7 Data Augmentation Techniques

    • Github:https://github.com/shaharkatz/caffe-data-augmentation
Papers

Scalable and sustainable deep learning via randomized Hashing

    • arxiv:http://arxiv.org/abs/1602.08194
Tools

Pastalog:simple, realtime visualization of neural network training performance

    • Github:https://github.com/rewonc/pastalog

Torch-pastalog:a Torch interface for Pastalog-simple, realtime visualization of neural network training performance

    • Github:https://github.com/kaixhin/torch-pastalog

Training Deep Neural Networks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.