Introduction to Deep Learning Algorithms

Last Update:2014-09-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

See the following article for a recent survey of deep learning:

Yoshua bengio, learning deep neural tures for AI, foundations and trends in machine learning, 2 (1), 2009

Depth

The computations involved in producing an output from an input can be represented byFlow Graph: A flow graph is a graph representing a computation, in which each node represents an elementary computation and a value (the result of the computation, applied to the values at the children of that node ). consider the set of computations allowed in each node and possible graph structures and this defines a family of functions. input nodes have no children. output nodes have no parents.

The flow graph for the expression cocould be represented by a graph with two input nodes and, one node for the Division taking and as input (I. e. as children), one node for the square (taking only as input), one node for the addition (whose value wocould be and taking as input the nodes and, and finally one output node computing the sinus, and with a single input coming from the addition node.

A participant property of suchFlow GraphsIsDepth: The length of the longest path from an input to an output.

Traditional feedforward neural networks can be considered to have depth equal to the number of layers (I. e. the number of hidden layers plus 1, for the output layer ). support Vector Machines (SVMs) have depth 2 (one for the kernel outputs or for the feature space, and one for the linear combination producing the output ).

Motivations for deep ubuntures

The main motivations for studying Learning Algorithms for deep ubuntures are the following:

Insufficient depth can hurt
The brain has a deep Architecture
Cognitive Processes seem deep

Insufficient depth can hurt

Depth 2 is enough in parallel cases (e.g. logical gates, formal [threshold] neurons, sigmoid-neurons, radial basis function [RBF] units like in SVMs) to represent any function with a given target accuracy. but this may come with a price: that the required number of nodes in the graph (I. e. computations, and also number of parameters, when we try to learn the function) may grow very large. theoretical results showed that there exist Function Families for which in fact the required number of nodes may grow exponentially with the input size. this has been shown for logical gates, formal neurons, and RBF units. in the latter case hastad has shown families of funich which can be efficiently (compactly) represented with nodes (for inputs) When depth is, but for which an exponential number () of nodes is needed if depth is restricted.

One can see a deep architecture as a kind of factorization. most randomly chosen functions can't be represented efficiently, whether with a deep or a shallow architecture. but should that can be represented efficiently with a deep architecture cannot be represented efficiently with a shallow one (see the polynomials example in the bengio survey paper ). the existence of a compact and deep representation indicates that some kind of structure exists in the underlying function to be represented. if there was no structure whatsoever, it wocould not be possible to generalize well.

The brain has a deep Architecture

For example, the visual cortex is well-studied and shows a sequence of areas each of which contains a representation of the input, and signals flow from one to the next (there are also skip connections and at some level parallel paths, so the picture is more complex ). each level of this feature hierarchy represents the input at a different level of each action, with more abstract features further up in the hierarchy, defined in terms of the lower-level ones.

Note that representations in the brain are in between dense distributed and purely local: they areSparse: About 1% of neurons are active simultaneously in the brain. Given the huge number of neurons, this is still a very efficient (exponentially efficient) representation.

Cognitive Processes seem deep

Humans organize their ideas and concepts hierarchically.
Humans first learn simpler concepts and then compose them to represent more abstract ones.
Engineers break-up solutions into multiple levels of specific action and Processing

It wocould be nice to learn/discover these concepts (Knowledge Engineering failed because of poor introspection ?). Introspection of linguistically expressible concepts also suggestsSparseRepresentation: only a small fraction of all possible words/concepts are applicable to a participant input (say a visual scene ).

Breakthrough in learning deep ubuntures

Before 2006, attempts at training deep neural tures failed: training a deep supervised feedforward neural network tends to yield worse results (both in training and in test error) then shallow ones (with 1 or 2 hidden layers ).

Three papers changed that in 2006, incluheaded by Hinton's revolutionary work on deep belief networks (dbns ):

Hinton, G. E., osindero, S. and teh, Y., a fast learning algorithm for deep belief nets neural computation-1554,200 6

Yoshua bengio, Pascal lamblin, Dan popovici and Hugo larochelle, greedy layer-wise training of deep networks, in J. platt et al. (eds), advances in neural information processing systems 19 (NIPS 2006), pp. 153-160, MIT Press, 2007

Marc 'aurelio ranzato, Christopher poultney, sumit choupa And Yann lecun efficient learning of sparse representations with an energy-based model, in J. platt et al. (eds), advances in neural information processing systems (NIPS 2006), MIT Press, 2007

The following key principles are found in all three papers:

Unsupervised learning of representations is used to (pre-) train each layer.

Unsupervised training of one layer at a time, on top of the previusly trained ones. The representation learned at each level is the input for the next layer.

Use supervised training to fine-tune all the layers (in addition to one or more additional layers that are dedicated to producing predictions ).

The dbns use RBMS for unsupervised learning of representation at each layer. The bengio et al paper es and compares RBMS andAuto-encoders(Neural network that predicts its input, through a bottleneck internal layer of representation). The ranzato et al paper uses sparse auto-encoder (which is similarSparse Coding) In the context ofConvolutionalArchitecture. Auto-encoders and convolutional ubuntures will be covered later in the course.

Since 2006, a plethora of other papers on the subject of deep learning has been published, some of them exploiting other principles to guide training of intermediate representations. see learning deep ubuntures for AI for a survey.

Introduction to Deep Learning Algorithms

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to Deep Learning Algorithms

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support