Deep learning veteran Yann LeCun detailed convolutional neural network
The author of this article: Li Zun |
2016-08-23 18:39 |
This article co-compiles: Blake, Ms Fenny Gao
Lei Feng Net (public number: Lei Feng net) Note: convolutional Neural Networks (convolutional neural network) is a feedforward neural network, its artificial neurons can respond to a part of the coverage of the surrounding cells, for large-scale image processing has excellent performance.
Yann LeCun was born in France and was a post-doctoral researcher at the University of Toronto following the Geoffrey Hinton, the founder of Deep Learning. As early as the late 1980s, Yann LeCun, a researcher at Bell Labs, presented convolutional networking techniques and demonstrated how to use it to dramatically improve handwriting recognition. At the beginning of the last century, when neural networks fell out of favor, Yann LeCun was one of the few scientists to insist on. He became a professor at New York University in 2003 and has led the development of deep learning, currently serving on the Facebook Fair lab. This paper is a presentation ppt of Yann LeCun for convolutional Neural Networks (convolutional neural network).
Yann LeCun (Information Science and Computer Science) (2015-2016) convnets attempt Process
First convolutional neural network Model (University of Toronto) (LeCun 88,89)
A total of 320 examples of training using the inverse propagation algorithm
Convolution with stride (sub-sample)
Tightly connected pooling process
The first "real" convolutional neural network model established at Bell Labs (LeCun et al)
Using the inverse propagation algorithm to train
USPS coded numbers: 7,300 sessions, 2000 tests
Convolution with Stride
Tightly connected pooling process
convolutional Neural Network (Vintage 1990)
Filter-Hyperbolic tangent--pooling--filter-hyperbolic tangent--pooling
Multi-convolutional networks Architecture
The structure of convolutional neural networks
The convolution operation process of convolutional neural networks is as follows:
The input image is non-linear convolution via three trained filter banks, the feature map is generated at each layer after convolution, then the four pixels in the feature map are summed, weighted, and biased, and in this process the pixels are pooled in the pool layer and the output value is finally obtained.
The overall structure of convolutional neural networks:
Normalization--Filter bank--nonlinear computation--pooling
Normalization: Distortion of image whitening treatment (optional)
Subtraction: Average removal, high-pass filter processing
Division operations: Local contrast normalization, variance normalization
Filter banks: Dimension expansion, mapping
Non-linear: sparse, saturated, side-suppressed
Distillation, wise contraction of ingredients, hyperbolic tangent, etc.
Pooling: aggregation of spatial or feature types
maximizing, LP norm, logarithmic probability
LeNet5
A simplified model of convolutional neural networks
MNIST (LeCun 1998)
Phase 1: Filter Banks--extrusion--maximum pooling
Phase 2: Filter Banks--extrusion--maximum pooling
Phase 3: Standard 2-layer MLP
multi-feature recognition (Matan et al 1992)
Each layer is a convolution layer
Single feature recognizer--SDNN
sliding window convolution neural network + weighted finite state machine Application
the application range of convolutional neural network
The signal appears as an array of (multiple degrees)
A signal with a strong local correlation
A signal that features can appear anywhere
A signal that a target is not altered by translation or distortion.
One dimensional convolutional neural network: Timing signal, text
Text classification
Music genre classification
Acoustic models for speech recognition
Time series prediction
Two dimensional convolutional neural networks: image, time-frequency characterization (voice and audio)
Object detection, positioning, identification
Three-dimensional convolutional neural networks: Video, stereo images, tomography
Video Recognition/understanding
Biomedical image analysis
hyperspectral image Analysis
Human Face Detection (Vaillant et al.93, 94)
convolutional neural Networks for large image detection
Multi-scale Thermal map
Non-maximum suppression of candidate images
6-second sparse for 256x256 images
the state of the art result of human face detection
Application of convolutional neural network in biological image cutting
Bio-Image Cutting (Ning et al. Ieee-tip 2005)
Using convolutional neural networks to mark pixels in a large background
convolutional neural networks have a pixel window that marks the central pixel
Use a conditional random field to clear
Version 3D connector (Jain et al.2007)
scene parsing/tagging
scene parsing/tagging: Multi-scale convolutional neural network architecture
Each output value corresponds to a large input background
46x46 full pixel window; 92X92 1/2 pixel window; 182x182 1/4 pixel window
[7x7 convolution Operation]->[2x2 Pooling]->[7x7 convolution operation]->[2x2 pooling]->[7x7 convolution operation]
Supervised training full-tagged Image
method: Select the main part by the Super Pixel region
Input image--hyper-pixel boundary parameter--hyper pixel boundary--the main part of the voting process via hyper-pixels--category and Region boundary alignment
Multi-scale convolutional networks-convolution network features (d=768 per pixel) Volume integration class--"soft" classification score
Scene analysis/tagging
No upfront processing
Frames-by-frame
Run a convolutional network on a 50ms frame on Vittex-6 FPGA hardware
But the transmission of features over Ethernet limits the performance of the system.
convolutional networks for Remote Adaptive Robot Vision (DARPA LAGR project 2005-2008)
Input image
Mark
Categorical output
very deep convolutional network architecture
Small cores, less secondary sampling (small secondary sampling)
Vgg
Googlenet
Resnet
object detection and positioning using convolutional networks
category + Positioning: multiple mobile windows
Applying a convolutional network with multiple sliding windows to an image
Important: Applying a convolutional network to a picture is very inexpensive
Just calculate the convolution of the entire image and copy the full connection layer
category + Positioning: sliding window + box return
Applying a convolutional network with multiple sliding windows to an image
For each window, predict a category and limit box parameters
Even if the goal is not fully contained in the Browse window, the convolutional network can guess what it considers the target to be.
Deep Face
Taigman et CVPR 2014
Queue
Convolutional Networks
Metric Learning
Facebook-Developed automatic tagging method
800 million photos per day
Posture Estimation and property recovery using convolutional networks
Posture Alignment Network for depth attribute model
Zhang et CVPR (Facebook AI)
character detection and posture estimation
Tompson,goroshin,jain,lecun,bregler et arxiv (2014)
monitoring convolutional Network drawing
Use convolutional networks to paint
Dosovitskyi et arxiv (1411:5,928)
monitoring convolutional Network drawing
Create Chair
Calculation of chairs with feature space
Global (end-to-end) learning: Energy Model
Input-convolution network (or other depth architecture)-Energy module (latent variable, output)-energy
So that each module in the system can be trained.
All modules are trained at the same time so that the global loss function can be optimized.
Includes feature extractor, recognizer, and front and back processor (image model).
Problem: Reverse propagation is skewed in the image model
deep convolutional networks (with other deep neural networks)
Training Sample: (Xi,yi) k=1 to K
Object function (Edge type loss = ReLU)
Map from NewScientist.com