Neural network and support vector machine for deep learning

Last Update:2015-07-03 Source: Internet

Author: User

Tags nets svm theano

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Neural network and support vector machine for deep learning

Introduction: Neural Networks (neural network) and support vector machines (SVM MACHINES,SVM) are the representative methods of statistical learning. It can be thought that neural networks and support vector machines both originate from the Perceptual machine (Perceptron). Perceptron is a linear classification model invented by Rosenblatt in the 1958. Perceptual machines are effective for linear classification, but the problem of classification in reality is usually nonlinear.

Neural networks and support vector machines (including kernel methods) are nonlinear classification models. In 1986, Rummelhart and McClelland invented the neural network learning algorithm back propagation. Later, Vapnik and others introduced support vector machines in 1992. Neural networks are non-linear models of multilayer (usually three layers), and support vector machines use nuclear techniques to transform nonlinear problems into linear problems.

Neural networks and support vector machines have been in a "competitive" relationship.

Scholkopf is the leader of Vapnik, support vector machine and nuclear method research. According to Scholkopf, Vapnik invented support vector machines to "kill" neural networks (He wanted to kill neural network). Support Vector machines are really effective, and a period of time support vector machines takes the upper hand.

In recent years, the Master of Neural network Hinton has proposed the deep learning algorithm of Neural Network (2006), which makes the ability of neural network greatly improved, and can be compared with support vector machine.

Deep learning assumes that the neural network is multi-layered, first using Boltzman machine (unsupervised learning) to learn the structure of the network, and then through the back propagation (supervised learning) learning network weights.

On the name of deep learning, Hinton joked: I want to call SVM shallow learning. (Note: Shallow has a superficial meaning). In fact deep learning itself means deeper learning, because it assumes that neural networks have multiple layers.

In short, deep learning is a new statistical learning algorithm that deserves attention.

Deep Learning (learning) is a new field in ML research that is introduced into ML to bring ml closer to its original target: AI. View a brief introduction to machine learning for AI and an introduction to deep learning algorithms.

Deep learning is about learning multiple representations and levels of abstraction, which help explain data, examples, sounds, and text.

For more knowledge about deep learning algorithms, you can see:

The monograph or review paper Learning deep architectures for AI (Foundations & Trends in Machine learning, 2009).

The ICML Workshop on learning Feature hierarchies webpage have a list of references.

The LISA public wiki has a reading list and a bibliography.

Geoff Hinton had readings from the last year ' s NIPS tutorial.

This review focuses on some of the most important deep learning algorithms and will demonstrate how to run them with Theano.

Theano is a python library that makes it easier to write deep learning models, and also gives some options for training them on the GPU.

The review of this algorithm has some prerequisites. First you should know a knowledge of Python and be familiar with NumPy. Since this review is about how to use Theano, you should first read Theano basic tutorial. Once you have done this, read our Getting Started chapter---it will introduce concept definitions, datasets, and methods to optimize the model using random gradient descent.

A purely supervised learning algorithm can be read in the following order:

Logistic regression-using Theano for something simple

Multilayer perceptron-introduction to Layers

Deep convolutional network-a simplified version of LeNet5

Unsupervised and semi-supervised learning algorithms can be read in any order (Auto-encoders can be read independently of RBM/DBM):

Auto encoders, denoising autoencoders-description of Autoencoders

Stacked denoising auto-encoders-easy steps into unsupervised pre-training for deep nets

Restricted Boltzmann machines-single layer generative RBM model

Deep belief networks-unsupervised generative pre-training of stacked RBMs followed by supervised fine-tuning

As for the MCRBM model, there is also a new overview of sampling from the energy model:

HMC Sampling-hybrid (aka Hamiltonian) Monte-carlo sampling with scan ()

Above translated from http://deeplearning.net/tutorial/

View Latest Papers

Yoshua Bengio, Learning deep architectures for AI, foundations and Trends in machine learning, 2 (1), 2009

Depth (Depth)

The calculation involved in generating an output from an input can be represented by a flow graph: a flow graph is a graph that represents a calculation, in which each node represents a basic calculation and a computed value (the result of the calculation is applied to the value of the child node of the node). Consider a collection of calculations that can be allowed in every node and possible graph structure, and define a family of functions. The input node has no children, and the output node has no father.

For expression

Flow graph, which can be passed through a two input node

And

Diagram, where a node is represented by using the

And

As input (for example, as a child) to represent

; a node uses only

As input to represent the square; a node uses

And

As input to represent the addition item (whose value is

); The last output node computes sin using a separate input from the addition node.

A special attribute of this flow graph is depth (depth): the length of the longest path from one input to one output.

The traditional Feedforward neural network can be seen as having a depth equal to the number of layers (for example, the output layer is an implicit layer plus 1). The SVMS has a depth of 2 (one corresponds to the core output or feature space, and the other corresponds to a linear blend of the resulting output).

Motivation for Deep architecture

The main motivations for learning a learning algorithm based on deep architecture are:

inadequate depth is harmful;

The brain has a deep structure;

The cognitive process is deep;

Inadequate depth is harmful

In many cases depth 2 is sufficient (e.g. logical gates, formal [threshold] neurons, sigmoid-neurons, Radial Basis Function [RBF] units like in SVMs) Represents any function with the precision of a given target. But the cost is that the number of nodes required in the graph (such as the number of calculations and parameters) can become very large. The theoretical results confirm that the number of nodes that are actually needed is present in the function family with the exponential growth of the input size. This has been confirmed in logical gates, formal [threshold] neurons and the RBF unit. In the latter case, Hastad illustrates that the function family can be effectively (tightly) using O (n) nodes (for n inputs) when the depth is D, but if the depth is limited to d-1, the number of nodes O (2^n) is required for the exponent number.

We can consider the depth architecture as a factor decomposition. Most randomly selected functions cannot be effectively represented, either in a deep or shallow architecture. But many of them can be effectively represented by deep architectures and cannot be represented efficiently with shallow architectures (see the polynomials example in the Bengio survey paper). The presence of a tight and deep representation means that there is some structure in the potentially represented function. If there are no structures, it is not possible to generalize well.

The brain has a deep architecture

For example, the visual cortex has been well researched and shows a series of areas in which each of these regions contains an input representation and a signal stream from one to the other (this ignores the association on some hierarchical parallel paths and is therefore more complex). Each layer of this feature hierarchy represents input on a different layer of abstraction and has more abstract features in the upper layers of the hierarchy, defined by the lower-level features.

It is important to note that the expression in the brain is tightly distributed in the middle and is purely local: they are sparse: 1% of neurons are active at the same time. Given a large number of neurons, there is still a very efficient (exponentially efficient) representation.

The cognitive process seems to be deep

The organization of ideas and concepts in a hierarchical manner;

Humans first learn simple concepts and then use them to express more abstract;

The engineer decomposes the task into multiple levels of abstraction to deal with;

Learn/Discover these concepts (knowledge engineering failed because of no introspection) It's beautiful. A reflection on the concept of language can also suggest a sparse representation: only a small part of all possible words/concepts can be applied to a particular input (a visual scene).

Breakthrough in learning depth architecture

2006 years ago, trying to train a depth architecture failed: Training a depth supervised feedforward neural network tends to produce bad results (both in training and test errors), and then it becomes 1 (1 or 2 hidden layers).

3 papers in 2006 changed the situation, led by Hinton's revolutionary work on deep belief Networks, DBNs:

Hinton, G. E., Osindero, S. and Teh, Y., A Fast Learning algorithm for deep belief nets. Neural Computation 18:1527-1554, 2006

Yoshua Bengio, Pascal Lamblin, Dan Popovici and Hugo Larochelle, greedy layer-wise Training of Deep Networks, in J. Platt et al. (Eds), Advances in Neural information processing Systems (NIPS 2006), pp. 153-160, MIT Press, 2007

Marc ' Aurelio Ranzato, Christopher Poultney, Sumit Chopra and Yann lecun efficient learning of Sparse representations with An energy-based Model, in J. Platt et al (Eds), Advances in Neural information processing Systems (NIPS 2006), MIT Press, 2007

The following main principles are found in these three papers:

Unsupervised learning expressed is used for (pre) training each layer;

A level of unsupervised training at a time, followed by the level of the previous training. The expression learned at each level as input to the next layer;

Use unsupervised training to adjust all layers (plus one or more additional layers for generating predictions);

DBNs uses unsupervised learning Rbms for presentation in each layer. Bengio et al paper explores and contrasts Rbms and auto-encoders (neural networks that predict input through a representation of the bottleneck intrinsic layer). Ranzato et al paper uses sparse auto-encoders (similar to sparse encoding) in the context of a convolutional schema. The Auto-encoders and convolutional architectures will be explained in a future course.

Since 2006, a large number of papers on deep learning have been published, and some have explored other principles to guide the middle representation of the training, to view learning Deepin architectures for AI

The English version of this article source http://www.iro.umontreal.ca/~pift6266/H10/notes/deepintro.html

Neural network and support vector machine for deep learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More