Deep Residual learning

Source: Internet
Author: User

Recently in the task of doing a classification, the input is a 3-channel model picture, the output requires these images to classify the model, the final category of the total is 30.

The beginning is a trial of the laboratory of the vggnet model of the models to classify models, according to the experimental results before the training can reach the highest 92% of the correct rate, after the use of imagenet trained datalayer, can achieve 97% of the correct rate, Since I didn't run the test for a long time, I ran more than 10 hours up to 92%.

Later is an attempt to use the deep residual learning imagenet (hereafter referred to as ResNet) implementation method, more than 10 hours of training can reach 94% of the correct rate, because the RESNET-50 model is too large, not for a long time to test, Do not know whether to achieve the best results with imagenet the same effect.

The following is a brief introduction to the ResNet model.

The principle of ResNet is as follows:

Let's start with a simple layer: input-to-middle--and output. So assuming the function of the middle tier is f (x), the result we get is f (x). So for ResNet, let's say we're going to fit the function to H (x), and I'm changing the way I'm going to skip the middle layer and directly connect to the output. (Original paper).

That is, our f (x) is derived from this formula: F (x): =h (x)-X. That is, the function we want to fit is H (x) =f (x) +x. The above is a simple example in which the two components on the right side of the H (x) equation can be added as arguments. The final formula is: WS also has a purpose is to adjust the X weft number, that is, when the input and output weft number is not the same, WS is responsible for the two of the same number of weft.

So what's the good of doing that? In previous experiments, the researchers found that, in theory, the more layers of neural networks, the more complex the functions it could fit, and the smaller the error rate, but the researchers concluded that this was not the case.

In the comparison between 20-layer and 56-layer, it is found that 56-layer either training error or test error is significantly higher than 20-layer, which is not in accordance with the theoretical relationship. This problem is called degradation problem. The problem is that not all functions are easy to optimize.

So this method directly adds x to the output, in theory, if the identity item is optimal, then the parameters of the next non-linear layer should be all 0, and then a layer of identity mapping represents the optimal function, but normally, this x is not optimal, But typically, if the identity mapping is close to the optimal function, this is a good help to optimize. The smaller the error he would have for the deeper network to pass to the later.

In my experiment, I used the model of RESNET-50. The entire model can be consulted: http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006

Here I come up with a section to introduce:

In this is the first res layer, the top is an input layer, followed by a maxpooling. Res2a_branch represents the convolution layer, and the BN represents the batchnormalization. Here the author also performed a convolution operation on the identity item. This convolution operation is the convolution of the 1*1. In the original text, the author introduces that when the input and output weft number is not the same, there are two options: Select a, if the number of weft is different, then the extra weft number is zero-padding, so it does not increase the parameters. Select B, if the weft number is different, then the 1*1 convolution is used to balance the weft number.

In this model, select B is used.

When the weft number is the same, the input is directly to the output, there is no left this module.

After testing, at the same time (more than 10 hours), Vgg up to 92%,resnet-50 of TOP1 is 6%.

Full text reference deep residual learning for Image recognition,kaiming He Xiangyu Zhang shaoqing Ren Jian Sun.

Attached Torch Implementation Code Https://github.com/KaimingHe/deep-residual-networks

Deep Residual learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.