Practical suggestions on gradient-based deep architecture training methods
Practical recommendations for gradient-based training of deep ubuntures
Yoshua bengio
Universit é de montr é al
Abstract: learning algorithms related to neural networks, especially deep learning, are designed with many fancy things called hyperparameters. This chapter is a practical guide and provides some suggestions for the most commonly used hyperparameters, especially for reverse propagation learning algorithms that adopt gradient/gradient-based optimization. Many interesting results can be obtained when multiple super parameters are allowed to be adjusted, which is also discussed. In short, it describes the successful and effective training and debugging of a large scale, and it is often used by deep multi-layer neural networks. Finally, we will end with an open question that is difficult to train the deeper architecture.
Abstract: learning algorithms related to artificial neural networks and in particle for deep learning may seem to involve into bells and whistles, called hyper-parameters. this chapter is meant as a practical guide with recommendations for some of
Most commonly used hyperparameters, in participates in the context of Learning Algorithms Based on Back-propagated gradient and gradient-based optimization. it also discusses how to deal with the fact that more interesting results can be obtained when allowing
One to adjust extends hyperparameters. overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. it closes with open questions about the training difficulties observed
With deeper ubuntures.