AlexNet:
(ILSVRC Top 5 test error rate of 15.4%)
the first successful display of the convolutional neural network potential network structure.
key point: with a large amount of data and long-time training to get the final model, the results are very significant (get 2012 classification first) using two GPU, divided into two groups for convolution. Since Alexnet, convolutional neural networks have been developing rapidly Vggnet: (ILSVRC 7.3% error rate)
Since Alexnet began, the neural network gradually became popular, began various attempts, the most classic is vggnet
key points: the use of 3x3 filter instead of 5x5 and 7x7, in the actual measurement of two 3x3 compared to the 5x5 has a better effect and less parameters with the deepening of the network, the number of layers gradually increased can be noted that the next convolution core layer is twice times the previous one, but also to reduce the spatial dimension , increase the depth this network can be well-realizable in classifying and locating tasks. Data enhancement method used in training this network shows that the deep convolutional neural network has a very good performance googlenet: (ILSVRC Top 5 error rate of 6.7%)
In addition to the longitudinal expansion of the neural network, it is possible to expand horizontally. So there's the first network structure that uses the convolution of the superimposed method to scale out.
In a traditional transformation network, each layer extracts information from the previous layer in order to transform the input data into a more useful representation. However, each layer type extracts different kinds of information. The output of the 5x5 convolution kernel tells us something different from the output of the 3x3 convolution core, which tells us that the output differs from the maximum pool core, and so on. At any given level, how we know what transforms provide the most useful information.
Why not let the model choose.
So inception will 1x1, 3x3, 5x5, max-pool together, let the network choose Inception module:
Entire Network:
The meaning of this module is that you may not know with a small feeling wild effect is good or big feeling wild effect good, can put these together to judge
Key points:
-Horizontal expansion
-Using 1x1 filter, it is convenient to change the number of layers of the convolution result (1x1 filter is also called bottleneck) xception:
Xception a new network based on inception
The authors put forward a hypothesis that the correlation between channel and space is completely irrelevant, so it's best not to map them together .
So the "deep separable convolution" operation is used in the Xception, first each channel does a spatial convolution, and then the 1x1 convolution together.
The effect is better than that of inception V3, and the number of parameters is not big
Resnet: (ILSVRC 3.6%)
Simply by overlaying the convolution layer to increase the network depth, does not improve the model effect, and even make the model worse, more difficult to train. (Alexnet only 5 convolution layers,)
Because the gradient slows down and the gradient disappears the phenomenon becomes very serious, because the gradient propagates to the earlier layer, the repetition multiplication may make the gradient infinitely big. As a result, as the network deepened, its performance saturation even began to deteriorate rapidly.
To solve this problem, Ms tries to construct a shortcut (shortcut connections) to pass the gradient.
ResNet is not the first to use the fast Track (shortcut connections), and there are highway network similar ideas, but the final effect resnet better. residual Block:
The purpose of this block is that the residual network is easier to train than the original network. The reason for this is that in the residual network, gradients can be passed forward more easily when the networks are being reversed, as this addition allows for easier propagation of gradients.
When the network structure becomes deep, the normal network will be difficult to train because the network is too deep and the gradient disappears.
Key points:
-The front network is not deep enough, and ResNet has 152 layers
-You can use residual block to train deeper network Resnext:
Resnex is ResNet's revision, more effective than resnet
The above three network equivalence (equivalent)
Resnext is now state-of-the-art in object recognition, which combines the ideas of inception and ResNet. Its width is very wide, reaching 32 layers, which in the paper also shows that the increase in width can give a better effect to the model. Compared to inception, each path in the inception differs from each other (1x1, 3x3, 5x5 convolution), and all paths in Resnext are the same, and the author also presents a number of parameters called cardinality--Independent paths (32 paths above), densenet:densely Connected CNN
Before Densenet, there were other networks that tried to make more cross-layer connections, but densenet was more rude and connected all the modules directly dense block .
Comparison between Densenet and ResNet:
As can be seen from the figure, the effect is slightly better than resnet, and the number of parameters decreased a lot, but densenet in training will consume more memory, but for densenet consumption of more memory problems already have some improvement methods.
Key points:
-Fully connected in block
-Fewer parameters than resnet, but training requires more memory Reference:
https://medium.com/towards-data-science/an-overview-of-resnet-and-its-variants-5281e2f56035
Https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html
Https://medium.com/towards-data-science/an-intuitive-guide-to-deep-network-architectures-65fdc477db41