Http://handong1587.github.io/deep_learning/2015/10/09/training-dnn.html//reprinted in Training deep neural NetworksPublished: The Oct Category: deep_learning Tutorials
Popular Training approaches of Dnns?—? A Quick Overview
https://medium.com/@asjad/POPULAR-TRAINING-APPROACHES-OF-DNNS-A-QUICK-OVERVIEW-26EE37AD7E96#.PQYO039BB
Activation functions
Rectified linear units improve restricted Boltzmann machines (ReLU)
- Paper:http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_nairh10.pdf
Rectifier nonlinearities Improve Neural Network acoustic Models (Leaky-relu, aka Lrelu)
- Paper:http://ai.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf
Delving deep to rectifiers:surpassing human-level performance on ImageNet classification (PRELU)
- Keywords:prelu, Caffe "MSRA" weights initilization
- arxiv:http://arxiv.org/abs/1502.01852
Empirical Evaluation of rectified activations in convolutional Network (Relu/lrelu/prelu/rrelu)
- arxiv:http://arxiv.org/abs/1505.00853
Deep learning with s-shaped rectified Linear Activation Units (Srelu)
- arxiv:http://arxiv.org/abs/1512.07030
Parametric Activation pools greatly increase performance and consistency in Convnets
- blog:http://blog.claymcleod.io/2016/02/06/ parametric-activation-pools-greatly-increase-performance-and-consistency-in-convnets/
Noisy Activation Functions
- arxiv:http://arxiv.org/abs/1603.00391
Weights initialization
An explanation of Xavier initialization
- Blog:http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization
Deep neural Networks with Random Gaussian weights:a Universal classification strategy?
- arxiv:http://arxiv.org/abs/1504.08291
All need are a good init
- arxiv:http://arxiv.org/abs/1511.06422
- Github:https://github.com/ducha-aiki/lsuvinit
Data-dependent initializations of convolutional neural Networks
- arxiv:http://arxiv.org/abs/1511.06856
- Github:https://github.com/philkr/magic_init
What is good initial weights in a neural network?
- Stackexchange:http://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network
Randomout:using a convolutional gradient norm to win the Filter lottery
- arxiv:http://arxiv.org/abs/1602.05931
Batch Normalization
Batch normalization:accelerating Deep Network Training by reducing Internal covariate Shift (ImageNet top-5 error:4.82%)
- arxiv:http://arxiv.org/abs/1502.03167
- blog:https://standardfrancis.wordpress.com/2015/04/16/batch-normalization/
- notes:http://blog.csdn.net/happynear/article/details/44238541
Weight normalization:a simple reparameterization to accelerate Training of deep neural Networks
- arxiv:http://arxiv.org/abs/1602.07868
- GitHub (lasagne): Https://github.com/TimSalimans/weight_norm
- notes:http://www.erogol.com/my-notes-weight-normalization/
Normalization propagation:a parametric technique for removing Internal covariate Shift in deep Networks
- arxiv:http://arxiv.org/abs/1603.01431
Loss Function
The Loss surfaces of multilayer Networks
- arxiv:http://arxiv.org/abs/1412.0233
Optimization Methods
On optimization Methods for deep learning
- Paper:http://www.icml-2011.org/papers/210_icmlpaper.pdf
On the importance of initialization and momentum in deep learning
- Paper:http://jmlr.org/proceedings/papers/v28/sutskever13.pdf
Invariant backpropagation:how to train a transformation-invariant neural network
- arxiv:http://arxiv.org/abs/1502.04434
- Github:https://github.com/sdemyanov/convnet
A practical theory for designing very deep convolutional neural network
- kaggle:https://www.kaggle.com/c/datasciencebowl/forums/t/13166/happy-lantern-festival-report-and-code/69284
- paper:https://kaggle2.blob.core.windows.net/forum-message-attachments/69182/2287/a%20practical%20theory%20for% 20designing%20very%20deep%20convolutional%20neural%20networks.pdf?sv=2012-02-12&se=2015-12-05t15%3a40% 3a02z&sr=b&sp=r&sig=kfbqkdua1pdtu837y9iqyrp2vyittv0hcgoeook9e3e%3d
- Slides:http://vdisk.weibo.com/s/3nfsznjlkn
Stochastic Optimization Techniques
- Intro:sgd/momentum/nag/adagrad/rmsprop/adadelta/adam/esgd/adasecant/vsgd/rprop
- Blog:http://colinraffel.com/wiki/stochastic_optimization_techniques
Alec Radford ' s animations for optimization algorithms
Http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html
Faster asynchronous SGD (FASGD)
- arxiv:http://arxiv.org/abs/1601.04033
- Github:https://github.com/doctorteeth/fred
An overview of Gradient descent optimization algorithms (★★★★★)
- blog:http://sebastianruder.com/optimizing-gradient-descent/
Exploiting the structure:stochastic Gradient Methods Using Raw Clusters
- arxiv:http://arxiv.org/abs/1602.02151
Writing fast asynchronous Sgd/adagrad with Rcppparallel
- blog:http://gallery.rcpp.org/articles/rcpp-sgd/
Regularization
Disturblabel:regularizing CNN on the Loss Layer [University of California & MSR] (2016)
- Intro: "An extremely-algorithm which randomly replaces a part of the labels as incorrect values in each iteration"
- Paper:http://research.microsoft.com/en-us/um/people/jingdw/pubs/cvpr16-disturblabel.pdf
Dropout
Improving neural networks by preventing co-adaptation of feature detectors (dropout)
- arxiv:http://arxiv.org/abs/1207.0580
Regularization of neural Networks using Dropconnect
- homepage:http://cs.nyu.edu/~wanli/dropc/
- Gitxiv:http://gitxiv.com/posts/rjucpiqidhq7hkzox/regularization-of-neural-networks-using-dropconnect
- Github:https://github.com/iassael/torch-dropconnect
regularizing neural networks with dropout and with Dropconnect
- blog:http://fastml.com/regularizing-neural-networks-with-dropout-and-with-dropconnect/
Fast Dropout Training
- Paper:http://jmlr.org/proceedings/papers/v28/wang13a.pdf
- Github:https://github.com/sidaw/fastdropout
Dropout as Data augmentation
- paper:http://arxiv.org/abs/1506.08700
- Notes:https://www.evernote.com/shard/s189/sh/ef0c3302-21a4-40d7-b8b4-1c65b8ebb1c9/24ff553fcfb70a27d61ff003df75b5a9
A theoretically grounded application of dropout in recurrent neural Networks
- arxiv:http://arxiv.org/abs/1512.05287
- Github:https://github.com/yaringal/bayesianrnn
Improved dropout for shallow and deep learning
- arxiv:http://arxiv.org/abs/1602.02220
Gradient descent
Fitting a model via closed-form equations vs. Gradient descent vs Stochastic Gradient descent vs Mini-batch learning. What's the difference? (Normal equations vs. GD vs. SGD vs. Mb-gd)
Http://sebastianraschka.com/faq/docs/closed-form-vs-gd.html
An Introduction to Gradient descent in Python
- Blog:http://tillbergmann.com/blog/articles/python-gradient-descent.html
Train faster, generalize better:stability of stochastic gradient descent
- arxiv:http://arxiv.org/abs/1509.01240
A variational Analysis of Stochastic Gradient algorithms
- arxiv:http://arxiv.org/abs/1602.02666
The vanishing Gradient Problem:oh no?—? a obstacle to deep learning!
- Blog:https://medium.com/a-year-of-artificial-intelligence/rohan-4-the-vanishing-gradient-problem-ec68f76ffb9b#.50hu5vwa8
Gradient Descent for machine learning
http://machinelearningmastery.com/gradient-descent-for-machine-learning/
Revisiting distributed synchronous SGD
- arxiv:http://arxiv.org/abs/1604.00981
Accelerate Training
Acceleration of deep neural Network Training with resistive cross-point Devices
- arxiv:http://arxiv.org/abs/1603.07341
Image Data Augmentation
Dataaugmentation ver1.0:image Data Augmentation tool for training of Image recognition algorithm
- Github:https://github.com/takmin/dataaugmentation
Caffe-data-augmentation:a Branc Caffe with feature of Data augmentation using a configurable stochastic combination of 7 Data Augmentation Techniques
- Github:https://github.com/shaharkatz/caffe-data-augmentation
Papers
Scalable and sustainable deep learning via randomized Hashing
- arxiv:http://arxiv.org/abs/1602.08194
Tools
Pastalog:simple, realtime visualization of neural network training performance
- Github:https://github.com/rewonc/pastalog
Torch-pastalog:a Torch interface for Pastalog-simple, realtime visualization of neural network training performance
- Github:https://github.com/kaixhin/torch-pastalog
Training Deep Neural Networks