Thesis study: Deep residual learning for image recognition

Last Update:2018-10-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Directory

I. Overview
II. Degradation
Iii. Solution & deep Residual learning
Iv. Implementation & Shortcut connections

Home page
Https://github.com/KaimingHe/deep-residual-networks

TensorFlow implementation:
Https://github.com/tensorpack/tensorpack/tree/master/examples/ResNet

In fact, TensorFlow has built-in ResNet:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/slim/python/slim/nets/resnet_v1.py

CVPR Best Paper Award , 2018 was cited over 12,900 times.

Problem solved: make deep networks easier to train .

To ease the training of networks is substantially deeper than those used previously.

I. Overview

First, stacking more layers does make feature extraction more efficient .

Deep networks naturally integrate low/mid/highlevel features [+] and classifiers in an end-to-end multilayer fashion, and The "levels" of features can is enriched by the number of stacked layers (depth).

But the main difficulty with the network being too deep is that gradients disappear or explode :

An obstacle to answering this question is the notorious problem of vanishing/exploding gradients [1, 8], which hamper Convergence from the beginning.

The predecessors ' acceleration methods are mainly standardized layers and regular initialization :

This problem, however, have been largely addressed by normalized initialization [8, $, a] and intermediate Normalizat Ion layers [+], which enable networks with tens of layers-start converging for stochastic gradient descent (SGD) with backpropagation [22].

Specifically why the standardized layer can speed up training, refer to this blog and related papers.

When the network is deeper, a new problem arises. We call it degradation :

, the training error of deep network is higher than that of shallow network when the accuracy rate is basically saturated.
Experiments have shown that this degradation is becoming more and more severe as the network deepens .

Is this because it was fitted ?
If it is overfitting, then the training error should not rise with the network deepening (the training error should be very low).

We continue to study the problem.

II. Degradation

We first train a shallower architecture, which can output the desired results.
Then we copy the shallower architecture, plus a layer or multilayer network, to get a deeper model,

We train deeper model again.
Ideally, added layers only needs to implement the identity mapping function simply, so that the training error does not drop and even may rise.

However, the experiment confirms that thedeeper model either takes too long or is less effective than expected .
This is an experimental explanation of the problem of deep network degradation .

Iii. Solution & deep Residual learning

In order to solve the degradation problem, we introduced the deep residual learning . The fundamental idea is:

Instead of hoping each few stacked layers directly fit a desired underlying mapping, we explicitly let these Laye Rs fit a residual mapping.

For example, assuming the original mapping is \ (\mathscr H (\MATHRM x) \) , then we want the map that the nonlinear layer really learns to be:
\[\MATHSCR F (\mathrm x): = \mathscr H (\mathrm x)-\MATHSCR x\]

Go back to the example in the previous section.
We want the additional layer to learn the identity mapping, which is still very difficult to train because it is a non-linear layer .
However, if we are learning the residual mapping, that is, the total zero residuals, it is obviously much easier .

Thought is similar to SVM, but you can't think of it!!!

Iv. Implementation & Shortcut connections

Thought has, concrete how to achieve it?
Can't help: He Dashen too awesome!!!!

Back to just the example. Assume:

The target mapping of added layers is \ (\mathscr h\) ;
The output of the original shallower architecture is the input of \ (\mathscr h\) \ (\mathrm x\) .

To force the previous nonlinear layer to learn the residuals , we assume that the network output is the condition of the residuals.
At this point, we should sum the output \ (\mathscr H (\MATHRM x) \) of the network with the original input \ (\MATHRM x\) before calculating the loss.
So the network is as follows:

So the connection is possible, so the reverse propagation algorithm can be applied.

Of course, why learning "all 0" is simpler, without detailed theoretical analysis, and requires a lot of experimental proof.

The experimental results on the right graph show that the degradation of the left graph is effectively solved.

Thesis study: Deep residual learning for image recognition

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Thesis study: Deep residual learning for image recognition

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Thesis study: Deep residual learning for image recognition

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support