Deep Residual network ResNet

Last Update:2018-07-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As the best paper of CVPR2016, He Keming's article "1" aimed at the problem of the SGD optimization caused by the deep network gradient dispersion, proposed the residual (residual) structure, and solved the model degradation problem in 50, 101-layer, 152-or even 1202-layer network testing has been very good results.

The error rate applied to ResNet is significantly lower than in other mainstream depth networks (Figure 1)

　　　　　　　　　　
　　　　　　　　　　　　　　　　　　　　　　　Figure 1. ResNet network model of Champions network on ImageNet15

One obvious fact is that the deeper the network is, the stronger the ability to express it. However, as the depth of the increase, gradient dispersion of the phenomenon more obvious, resulting in SGD can not converge, the final precision is reduced (Fig 2)

　　　　　　　　　　　　　　　
　　　　　　　　　　　　　　　　　Fig. 2. Conventional 56-tier networks are more accurate than 20-tier networks for both training and testing

To solve this problem, a residual (residual) structure is proposed, which can maintain a good training effect for more than 1000 layers of network (although it has been proposed at this time).

　　　　　　　　　　　　　　　　　　　　
　　　　　　　　　　　　　　　Fig. 3. The residual structure directly inputs x into the output, equivalent to introducing an identity map

As shown in Figure 3, assuming that the original network is going to learn a function of h (x) H (x), the author decomposes it into h (x) =f (x) +x h (x) =f (x) +x.
After decomposing the original network (Figure 3 vertical downward flow) fitted F (x) f (x), Disk branch (Fig 3 Curved shortcut connection)

Figure 4 The network structure for adding residuals to the VGG-19, 34-tier common network and 34-tier

　　　　　　　　　　　　　　　　　　
　　　　　　　　　　Figure 4 Compared to the normal network, resnet only needs to increase the shortcut connection (the dotted line is to multiply the number of channels by 2)

The author's experiment shows that the residual structure needs more than 2 layers to be effective, and the linear transformation represented by Ws w_s in the following type is only the dimension of unified input and output, which is not helpful to improve the training effect.
Y=f (X,WI) +wsx y=f (x,{w_i}) +w_sx
Fig. 5 is the experimental effect of the author using resnet

　　　　　　　
　　　　　　　　　　　　　　　　　　　　　Figure 5. ResNet Network (right) and normal network (left) training error why ResNet work

Two questions need to be answered: First, why it is decomposed into H (x) =f (x) +x h (x) =f (x) +x (the actual optimization is f (x) =0 f (x) =0), and the second is why the decomposition can solve the problem of gradient slack

for the first question , the author does not explain the principle, but through experiments to prove that it is optimal. The answer to the question of why X instead of 0.5x or another is: "Practice finds that machine learning to fit (target function) function f (x) is often very close to the same mapping function." "But I can't understand.

for the second question , the following 3 explanations are available:
1 The optimization of f (x) =0 f (x) =0 has a natural advantage because the initial value of network weights is often near 0.
The analogy is: Suppose the function to be fitted is a straight line (here is H (x) h (x)), then using a straight line (residual structure x x) and some tiny polylines (f (x) f (x)) to stack is definitely easier to optimize than simply using a straight line or a line fit.

2 For example, mapping 5 to 5.1
If the normal network, then is f′ (5) =5.1 F ' (5) =5.1
After introducing residuals, H (5) =5.1 H (5) =5.1, H (5) =f (5) +5 H (5) =f (5) +5, then F (5) =0.1 f (5) = 0.1
As you can see, normal network input output "gradient" is only 2%, and the residual network map f f increased by 100%

3 This article from another perspective to understand residuals: as a voting system

　　　　　　
　　　　　　　　　　　　　　　　　　　　　　　Fig. 6. Residual network can be decomposed into multiple-clock path combination networks

As shown in Figure 6, the residual network is actually a combination of many parallel subnets. So although the surface of the resnet can be very deep, but the combination of most of the network path is actually several in the middle of the path length.

Figure 7 Multiplies the number of networks that are included in each path length by the gradient value of each path, and counts the path that ResNet really works on.

　　　　　　　　　　　　　　　　　　　
　　　　　　　　　　　　　　　　　　　　　Figure 7. The path that really works is less than 20 levels long

Therefore, "resnet only looks very deep on the surface, in fact the network is very shallow." "ResNet does not really solve the problem of the gradient of the depth network, its essence is a multiplayer voting system." Code Implementation

The author releases the network model under Caffe on the GitHub, and introduces the implementation of the third party in other platforms.

"1" he K, Zhang X, Ren S, et al. Deep residual learning for image recognition[c]//proceedings of the IEEE Conference on Computer vision and Pattern recognition. 2016:770-778.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Deep Residual network ResNet

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Deep Residual network ResNet

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support