1.why Look in case study
This week we'll talk about some typical CNN models, and by learning these we can deepen our understanding of CNN and possibly apply them in practical applications or get inspiration from them.
2.Classic Networks
The LENET-5 model was presented by Professor Yann LeCun in 1998 and is the first convolutional neural network to be successfully applied to digital recognition. In the mnist data, its accuracy rate is approximately 99.2%.
Its structure is as follows: by Conv Layer-pool layer-conv layer-pool layer-full connected Layer-full connected Layer-soft max out PU T lay composition
Features: Only 6w more than one parameter, at that time using the Sigmoid,tanh activation function, pooling using the average pooling, as the number of layers increased, nh,nw decreased, and NC increased.
AlexNet
Alexnet is similar to lenet, with its structure, but larger than lenet, and uses the Relu activation function, which has more than 60 million parameters.
Alexnet used more and more complex things in detail, such as multiple CPU training, to connect the CPUs to each other.
and using LRN, but later proved to be less useful.
VCG-16
VCG is a more complex network, he has up to 138 million parameters
Features: The filter used is 3x3,stride=1, the use of same padding;
Max-pool size is 2x2,stride=2
3.Residual Networks
As the depth of the trained neural network becomes deeper, the difficulty of model training is increased due to the existence of gradients disappearing and the effect of gradient explosions (reason: www.cnblogs.com/Dar-/p/9379956.html).
Resnets can solve this problem and train a very deep network. It makes the compartments of the neurons connected, weakening the strong connections between each layer.
Renets is used in the residual block (residual block), there are many jumping connected neurons, specifically as follows:
The residual network is made up of multiple residual blocks, and the normal neural networks may degrade as the layer increases, although the theoretical performance will not degrade, but in fact it will fall back.
And the use of resnet can have better performance.
4.Why resnets Work
Why is resnets useful?
Assuming that x is an input, it reaches the L layer after multiple NN and gets a[l],
A[l+2]=g (w[l+2]a[l+2]+b[l+2] +a[l]), assuming a gradient disappears, i.e. w[l+2] and b[l+2] close to 0, then A[l+2]=g (A[l]), when a[l]>=0, the activation function is Relu, a[l+2]=a [L],
This weakening reduces the connection between the neuron layer, making the compartment connected, in effect, ignoring the two layers of the L layer, the model itself can tolerate more nn.
And for Resnets to learn the identity function (A[l+2]=a[l]) is very easy, so although the depth of the network, but does not affect performance
If the increased network structure learns something new, it can improve network performance.
Of course, if residual blocks really can train a non-linear relationship, then the short cut will be ignored, with the same effect as the plain network.
If the dimensions of the a[l+2] are not the same as the dimensions of the a[l], a WS is introduced, allowing a[l] to be multiplied with WS to get the same dimension as the a[l+2], WS can be trained, or it can be fixed directly
Only the skip connection is added between the same volume base, as the implementation part of the skip connection is the same convolutional layer (the first three skip connection are added (3x3 conv, 64) above the 3x3 conv 128, To a dashed line (that is, not added), followed by a 3x3 conv 128 in the back of the place and made three skip connection, the reason is to keep the Z (l+2) and A (L) dimension of the same.
5.Network in Network and 1x1 convolutions
1x1 convolutions that is, filter is 1x1, similar to a product operation, can play a reduced number of channels effect, (this is not very understanding, as long as the number of filter can not be changed to the number of channels to make an adjustment?)
1x1 convolution can also play a similar full-connected effect, it has a slice of the original image, and then with the NC parameters in the 1x1xnc convolution, multiplied by the weight, after the Relu to get output results
1x1 convolution applications:
- Dimension compression: The number of convolution cores using 1x1 1x1 for the target dimension .
- Increased nonlinearity: maintains the number of convolution cores with the same 1x1 1x1 as the original dimension .
6.Inception Network Motivation
In the previous CNN we studied, the number of filter, size, pool parameters are to be specified by ourselves, inception can learn parameters to help us automatically choose the best filter,pool combination
As shown: we can make a same of multiple filter,pool combinations convolution
Inception also has a problem: the computational cost is relatively large, for example, we calculate the amount of 5x5x192x28x28x32=1.2 billion, in order to solve this problem, we introduced the 1x1 convolution
After the introduction of 1x1 convolution, the computation is calculated as follows, the result is 12.4 million, the computation is reduced by nearly 1/10
Specifically looking at its process, the first step is to get a 1x1 convoltuion of the middle layer called bottleneck layers (bottleneck layer)
7.Inception Network
The Inception module is as follows:
Get inception network from multiple inception module combinations
It is worth mentioning that the middle layer added a lot of softmax classifier, to prevent overfitting, that is: When the inception network, the branches of the same output, in order to make full use of the neural network structure, in the middle layer is the output, the final comparison of the output results, In order to find the best output of the corresponding structure.
8.Using Open-source Implementation
We can look for existing open source files from GitHub in the process of training the required models .....
9.Transfer Learning
Before in detail, add some:
We can freeze part of the structure we need.
Then save the part of the weight structure that needs to be freeze to the local, and then go through these parts of the input first, and then use the final layer neuron output as the training set to train the layer of the new architecture.
10.Data augamentation
Data enhancement
The problem of deep learning in computer vision is that it is often impossible to get enough samples, so we can do a processing of the existing samples to get more samples.
Most commonly used: a mirroring can be performed on a sample that has been collected, and the random croping
Rarely used: To rotate a picture, to bend locally, to trim
There is also a more common treatment is color shifting: that is, the image of the RGB channel value is arbitrarily increased or decreased, change the image tone.
In addition to randomly changing the RGB channel values, you can also more targeted to the image of the RGB channel PCA color Augmentation, that is, the image color of the principal component analysis,
- PCA Color Enhancement: The main color of the picture changes greatly, the picture of the minor changes in color, so that the overall color consistency.
。 The specific PCA color augmentation approach can be consulted on alexnet related papers.
In the training model we can separate data argumentation and training by 2 threads.
11.The State of Computer Vsion
The amount of data required to build different models is generally different, generally objection detection, Image recognization, speech recognization need more and more data
When we have less data, we may need more manual engineering, and when we have more data, we can use simpler algorithms and less manual engineering
Some tips to help you improve the performance of your model (available with tournaments):
Ensembling: Train several independent networks and then use their average output to make a prediction, but this takes up more memory
Multi-crop at test time: The picture is cropped and mirrored to the classifier as input, and the averaged mean is computed as a prediction.
However, because these two methods are computationally expensive, they generally do not apply to actual project development.
We can also use open source code to help our projects: