Moving DL we have six months of time, accumulated a certain experience, experiments, also DL has some of their own ideas and understanding. Have wanted to expand and deepen the DL related aspects of some knowledge.

Then saw an MIT press related to the publication DL book http://www.iro.umontreal.ca/~bengioy/dlbook/, so you have to read this book and then make some notes to save some knowledge of the idea. This series of blog will be note-type, what is bad to write about the vast number of Bo friends forgive me, but also welcome you peer can point twos.

This is the first chapter of the book, the following are some of the points of personal feeling very important:

**Logistic regression can determine whether to recommend cesarean delivery**(application direction)

Naive Bayes can **separate legitimate e-mail from spam e-mail**(application direction)

Feature Related:

It isn't surprising that the choice of representation have an enormous effect on the performance of machine learning Algor Ithms.input x is often true to input x + epsilon for a small epsilon. This is called the smoothness prior and are exploited in most applications of machine learning that involve real numbers. Many artificial intelligence tasks can be solved by designing the right set of features to extract for that task, then pro Viding these features to a simple machine learning algorithm. For example,a useful feature for speaker identification from sound is the pitch. One solution to this problem are to use machine learning to discover not only the map-ping from representation to output bu T also the representation itself. This approach is known as representation learning. When designing features or algorithms for learning features, we goal is usually to separate the factors of variation Explain the observed data. Deeplearning is a particular kind of machine learning that achieves great power and fiexibility by learning to represent t He world as a nested hierarchy of concepts, with each concept defined in relation to simpler concepts. Representation learning algorithms can either be supervised, unsupervised, or a combination of both (semi-supervised). Deep learning have changed the field of machine learning and influenced our understanding of human perception, it Has revolutionized application areas such as speech recognition and image understanding.Pylearn2is a machine learning and deep learning library. A Live Online ResourceHttp://www.deeplearning.net/book/guidelinesAllows practitioners and researchers to share their questions and experience and keep abreast of developments in the art O F Deep learning.

1.2 Machine Learning

Human Brains also observe their own actions, which infiuence the world around them, and it appears that Human brains t Ry to learn the statistical dependencies between these actions and their consequences, so as-maximize future rewards. Bayesian machine learning attempts to formalize these priors as probability distributions and once this was done, Bayes the Orem and the laws of probability (discussed in Chapter 3) dictates What is the right predictions should is.Overfitting occurs when capacityis too large compared to the number of examples, so, the learner does a good job O n the training examples (it correctly guesses that they is likely configurations) but a very poor one on new examples (it Does not discriminate well between the likely configurations and the unlikely one). Underfitting occurs when instead the learner does not has enough capacity, so that even on the training examples it is no T able to make good guesses: It does not manage to capture enough of the information present in the training examples, maybe because it does not There are enough degrees of freedom to fit all the training examples. The main reason we get underfitting (especially with deep learning) are not so we choose to has insuficient capacity But because obtaining high capacity in a learner that have strong priors often involves dificult numerical optimization. Numerical optimization methods attempt to find a configuration of some variables (often called parameters, in machine Lear Ning) that minimizes or maximizes some given function of these parameters, which we call objective function or training CR Iterion. In the learning algorithms, this dificulty in optimizing the training criterion are related to the fact T Hat it isn't** convex** in the parameters of the model.We believe that the issue of underfitting are central in deep **learning algorithms and deserves a lot more attention fro M researchers.**Another machine learning concept, turns out to being important to understand many deep learning algorithms was that O F Manifold learning.**The Manifold Learning hypothesis (Cayton, 2005; Narayanan and Mitter, states that probability are concentrated around regions called manifolds, i.e., that's most confi Gurations is unlikely and that probable configurations is neighbors of other probable configurations. We define the dimension of a manifold as the number of orthogonal directions in which one can move and stay among probable Configurations. This hypothesis of probability concentration seems to hold for most AI tasks of interest, as can is verified by the fact T Hat most configurations of input variables is unlikely (pick pixel values randomly and you'll almost never obtain a NAT ural-looking image).**

1.3 Historical perspective and neural Networks

modern deep learning, a lot of it takes from inspiration Network Previous decades. Other major intellectual sources of concepts found in deep learning R include works on probabilistic modeling and G Raphical models, as well as works on manifold Learning. the breakthrough came from a semi-supervised procedure:using Unsupervised learning to learn one layer of features at a time and then fine-tuning the whole system with labeled data (Hi Nton et al., 2006; Bengio et al., 2007; Ranzatoet al., described in Chapter 10. This initiated a lot of new, and other ways of successfully training deep nets emerged. Even though unsupervised pre-trainingis sometimes unnecessary for datasets with a very large number of labels, it is the Early success of unsupervised pre-training that led many new researchers to investigate deep neural networks. in particular, T**he use of rectifiers (Nair and Hinton, 2010b) as non-linearity and appropriate initialization Allo Wing information to Fiow-both forward (to produce predictions from input) and backward (to propagate error signals) we Re later shown to enable training very deep supervised networks (Glorot et al., 2011a) without unsupervised pre-training< /c1>.**

1.4 Recent Impact of deep learning

Since, deep learning have had spectacular practical successes. It has LEDs to much better acoustic models that had dramatically improved the state of the art in speech recognition. Deep neural nets is now used in deployed speech recognition systems includingVoice search on the Android (Dahl et al., 2010; Deng et al., 2010; Seide et al.,2011; Hinton et al.,). Deep convolutional nets has led to major advances in the state of the art for recognizing large numbers of difierent type S of objects in images (now deployed in Google + photo search). They has also had spectacular successes for pedestrian detection and image segmentation (Sermanet et al., 2013; Farabet et al.,2013; Couprie et al,) and yielded superhuman performance in trafic sign classification (Ciresan et al., 2012). An organization called Kaggle runs machine learning competitions on the web. Deep learning have had numerous successes in these competitions:

**Http://blog.kaggle.com/2012/11/01/deep-learning-how-i-did-it-merck-1st-place-interview**

Http://www.nytimes.com/2012/11/24/science/scientists-see-advances-in-deep-learning-a-part-of-artificial-intelligence.html

Http://deeplearning.net/deep-learning-research-groups-and-labs

This have led Yann LeCun and Yoshua Bengio to create a new conference on the subject. They called it the**International Conference on Learning Representations (ICLR)**, to broaden the scope from just deep learning to the more general subject of representation learning (which includes Topics such as sparse coding, that learns shallow representations, because shallow representation-learners can be used as Building blocks for deep representation-learners). In the examples of outstanding applications of deep learning described above, the impressive breakthroughs has mostly bee N achieved with supervised learning techniques for deep architectures. We believe that some of the most important future progressin deep learning would hinge on achieving a similar impact in the Unsupervised and semi-supervised cases. **even though the scaling behavior of stochastic gradient descent are theoretically very good in terms of computations PE R Update, these observations suggest a numerical optimization challenge that must be addressed**. in addition to these numerical optimization dificulties, scaling up large and deep neural networks as they currentl Y stand would require a substantial increase in computing power, which remains a limiting factor of our own. To train much larger models and the current hardware (or the hardware likely to being available in the next few years) would Require a change in design and/or the ability to efiectively exploit parallel computation. These raise non-obvious questions where fundamental is also needed. Furthermore, some of the biggest challenges remain in front of us regarding unsupervised deep learning. Powerful unsupervised learning is important for many reasons:fi unsupervised learning allows a learner to take advantage O F unlabeled data. Most of the data available to machines (and to humans and animals) are unlabeled, i.e.,without a precise and symbolic chara Cterization of its semantics and of the outputs desired from a learner. Humans and animals are also motivateD, and this guidesresearch to learning algorithms based on a reinforcement signal, which is much weaker than the signal Required for supervised learning.

to summarize, some of the challenges we view as important for future break throughsin deep learning is the following:

1. How should we **deal with the fundamental challenges behind**unsupervised learning, such as intractable inference and sampling see chapters, 17.

2. How can we **build and train much larger**and more adaptive and reconfigurable deep architectures, thus Maximizin G The advantage one can draw from larger datasets see Chapter 8.

3. How can we **improve the ability of deep learning algorithms to disentangle the underlying factors of variation, or P UT more simply, make sense of the world around US**see Chapter-on-very basic question about what's involved in Learning a good representation.

Copyright notice: This article Bo Master original article. Blog, not reproduced without consent.

[Deep Learning a MIT press book in preparation] Deep Learning for AI