Introduction to convolutional Neural Networks
Convolutional neural network is a multi-layer neural network that specializes in processing machine learning problems related to images, especially big images.
The most typical convolutional network consists of a convolution layer, a pooling layer, and a full connection layer. The convolution layer works with the pooling layer to form multiple convolution groups, extract features layer by layer, and finally complete classification through several full connection layers.
The Operations completed by the convolution layer can be considered to be inspired by the concept of local competence field, while the pooling layer mainly aims to reduce the data dimension.
In summary, CNN uses convolution to simulate feature differentiation, shares convolution weights, and pooling to reduce the magnitude of network parameters, and finally completes classification and other tasks through a traditional neural network.
Basic convolution Neural Network Model
A convolutional neural network is a multi-layer neural network. Each layer is composed of multiple two-dimensional planes, and each plane is composed of multiple independent neurons.
Figure: convolutional neural network concept Demonstration: the input image is convolutionized with three trainable filters and the addition and offset. The filtering process is 1, after convolution, three feature ing maps are generated at the C1 layer. Then, the four pixels in each group in the feature ing map are summed, weighted, and offset, A sigmoid function is used to obtain the feature ing diagrams of three S2 layers. These maps are then added to the filter wave to obtain the C3 layer. This hierarchical structure generates S4 like S2. In the end, these pixel values are raked and connected into a vector and input to the traditional neural network for output.
Generally, layer C is the feature extraction layer. The input of each neuron is connected to the local accept field of the previous layer and the local feature is extracted. Once the local feature is extracted, its location relationship with other features is also determined. The S layer is the feature ing layer, and each computing layer of the network is composed of multiple feature ing, and each feature is mapped to a plane, the weights of all neurons on the plane are equal. The feature ing structure uses the sigmoid function that has a small impact on the function kernel as the activation function of the convolutional network, so that the feature ing has displacement immutability.
Parameter reduction and weight sharing play a role in reducing the parameter magnitude
If we use a traditional neural network to classify an image, we connect each pixel of the image to the hidden layer node. For a x pixel image, if we have 1 m hidden layer unit, there are 10 ^ 12 parameters in total, which is obviously unacceptable. (As shown in)
However, in CNN, We can greatly reduce the number of parameters. Based on the following two assumptions:
1) The Bottom-layer features are local. That is to say, we can use a filter of 10x10 to represent bottom-layer features such as edges.
2) the features of different small fragments and small fragments on different images are similar. That is to say, we can use the same classifier to describe different images.
Convolution
In this process, we can understand that we use a filter (convolution kernel) to filter each small area of the image and obtain the feature values of these small areas.
In the actual training process, the value of the convolution kernel is learned during the learning process.
In specific applications, there are often multiple convolution kernels. It can be considered that each convolution kernel represents an image pattern. If an image block has a large value after convolution with this convolution kernel, this image block is considered very close to this convolution kernel.
Pooling)
Pooling sounds very advanced. In fact, it is simply a subsample. Shows the pooling process:
We can see that the original image is 20x20. We sample the image below, and the sampling window is 10x10. Then we sample the image as a 2x2 feature graph.
The reason for doing so is that even after convolution, the image is still very large (because the convolution kernel is relatively small), so in order to reduce the data dimension, the downsampling is performed.
This can be done because, even if a lot of data is reduced, the statistical attributes of features can still describe the image, and the data dimension is reduced to effectively avoid overfitting.
In practical applications, pooled sampling methods are divided into Max-pooling and mean-pooling ).
Here we will talk about Max pooling. In fact, the idea is very simple.
For each 2*2 window, the maximum number is selected as the value of the corresponding element of the output matrix. For example, the maximum number in the first 2*2 window of the input matrix is 6, the first element of the output matrix is 6, and so on.
CNN hierarchy Expansion
? Data input layer/input layer
? Conv Layer
? Relu incentive Layer/Relu Layer
? Pooled Layer/pooling Layer
? Full connection Layer/Fc Layer
Data input layer
This layer is mainly used to pre-process the original image data, including:
? Mean removal: converts all dimensions of the input data to 0, as shown in. The purpose is to pull the center of the sample back to the coordinate system origin.
? Normalization: The amplitude is normalized to the same range, as shown in the following figure, that is, to reduce the interference caused by the difference in the value range of data in each dimension. For example, we have two dimensions: feature a and Feature B, the range of A is 0 to 10, and the range of B is 0 to 10000. If you directly use these two features, there is a problem. A good way is to normalize them, that is, the data of A and B is in the range of 0 to 1.
? PCA/whitening: Dimensionality Reduction Using PCA; whitening is the amplitude normalization of each feature axis of data
Incentive Layer
Nonlinear ing of the output result of the convolution layer.
The excitation function used by CNN is generally Relu (the rectified linear unit/correction linear unit). It features fast convergence, simple gradient, but weak.
Practical experience at the incentive layer:
① Do not use sigmoid! Do not use sigmoid! Do not use sigmoid!
② First try Relu, because it is fast, but be careful
③ If 2 is invalid, use leaky Relu or maxout
④ In some cases, Tanh has good results, but few
Training Algorithms for Convolutional Neural Networks
1. Similar to general machine learning algorithms, we first define the loss function to measure the gap between the function and the actual results.
2. Find the W and B of the minimum loss function. The algorithm used in CNN is SGD (random gradient descent ).
Advantages and disadvantages of Convolutional Neural Networks
Advantages
? Shared convolution kernel, no pressure on high-dimensional data processing
? You do not need to manually select features and train the weights, that is, the feature classification effect is good.
Disadvantages
? Parameters need to be adjusted, large sample size is required, and GPU is preferred for training
? The physical meaning is unclear (that is, we do not know what features are extracted by no convolution layer, and the neural network itself is a hard-to-interpret "Black Box Model ")
Typical CNN of Convolutional Neural Networks
? Lenet, the earliest CNN used for digital recognition
? Alexnet, the 2012 ilsvrc competition far exceeded 2nd CNN, compared
? Lenet is deeper and replaces a single convolution layer with multiple layers of small convolution layers.
? ZF net, 2013 ilsvrc champion
? Googlenet, 2014 ilsvrc champion
? In the vggnet, 2014 ilsvrc competition model, image recognition is slightly inferior to googlenet, but it has a great effect in many image conversion learning problems (such as object detection ).
Fine-tuning of Convolutional Neural Networks
What is fine-tuning?
Fine-tuning is to use the weights or partial weights that have been used for other targets, pre-trained models, and start training as the initial values.
So why don't we randomly select a few numbers as the initial weight values? The reason is simple. First, training a convolutional neural network from the ground up is prone to problems. Second, fine-tuning can quickly converge to an ideal state, saving time and worry.
What is the specific method of fine-tuning?
? Reuse the weights of the same layer. The new definition layer obtains the initial Random weights.
? Tune the learning rate of the new definition layer, and reduce the learning rate of the reuse Layer
Common Framework of Convolutional Neural Networks
Caffe
? This tool is derived from the mainstream Berkeley CV toolkit and supports C ++, Python, and Matlab.
? Model Zoo has a large number of pre-trained models for use.
Torch
? Convolution Neural Network toolkit for Facebook
? The Local interface of time domain convolution is very intuitive to use.
? Simple definition of new network layer
Tensorflow
? Google's deep learning framework
? Tensorboard visualization is very convenient
? Good data and model parallelization, fast
Refer:
Https://www.cnblogs.com/alexcai/p/5506806.html
8781543
Https://www.cnblogs.com/skyfsm/p/6790245.html
Convolutional Neural Network (CNN)