Residual Networks <2015 ICCV, ImageNet image classification top1>

Last Update:2016-04-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article describes the residual Networks of the champions--msra He Keming team in the classification task in Imagenet. In fact, MSRA is imagenet this year's big winner, not only in the classification task, MSRA also used residual networks win imagenet detection, localization, As well as the detection and segmentation on the Coco data set, this article simply analyzes the residual Networks.

Directory
————————————
1. Motivation
2. Network structure
3. Experimental results
4. Important Reference

1. Motivation

The author first throws this question, the deeper the neural network is not the deeper the better.
According to our general experience, as long as the network does not train fly (that is, the first in the LSTM proposed vanishing/exploding problem), but also fit, it should be the deeper the better.

But in this case, the network deepened, accuracy but declined, said the situation is degradation. As shown (see [1]):

Training/testing error on the Cifar-10. The network has increased from 20 to 56, and error has risen.

Supposedly we have a shallow net, but in the case of fitting to the depth of a few more layers how to say also not worse than the results of shallow, so degradation that not all networks are so easy to optimize, this article motivation is through the "deep Residual network "solves the degradation problem.

2. Network structure

Shortcut Connections

In fact, this idea and highway networks (Jurgen Schmidhuber article) very similar, even to solve the problem (degradation) are the same. Highway Networks borrowed the concept of gate in lstm, in addition to the normal nonlinear mapping H (x, Wh), but also set a path from the x directly to Y, with T (x, Wt) as gate to grasp the weight between the two, as shown in the following formula:

Y=H(X,WH)⋅T(x,wt)+x-(1−t(x,wt))

Shortcut is meant to be a shortcut, in this case, the more connected, such as the above highway networks from the X directly to Y connection. In fact, early in the inception layer of googlenet, there is this expression:

Residual networks, the author converts the connection with participation rights in the highway network into a fixed weighted connection, i.e.

y< Span id= "mathjax-span-40" class= "Mo" >=h (x , w H) ⋅wt+x

Residual learning

So far, we have not mentioned the meaning of residual in residual networks. What does this "residual" mean? We want to:
If a complex nonlinear mapping H (x) can be approached with a few layers of network, it is also possible to approximate its residual function with these layers of network:F(x)=H(x )−x, but we "guess" optimized residual mapping than direct optimization h (x) simple.

Readers are recommended to take a look at the reference paper, which is listed at the end of this article, and the authors say the advantages compared to the highway network are:

x	highway Network	residual Network	comments
Gate parameters	Parametric variable WT	No parameters, dead, convenient and no residual network comparison	Does not have the superiority, the parameter is few and data-independent, the result certainly will not be optimal, the article experiment part also compares the effect, actually is with the parameter the error is smaller, but WT This variable has nothing to do with solving degradation problems
Close the door?	It is possible to close the door (T(x,Wt)=0 )	Won't close.	T(x,Wt)∈[0,1] , but generally not for 0

Therefore, this comparison is still more far-fetched. Anyway, it's not easy to tell a story.

34-Layer Residual network

Network construction ideas: basically keep each layer complexity unchanged, that is, which layer down-sampling, the filter number * *, the network is too big, here do not paste, we see paper to it, paper painted a 34-story full convolutional network, without a few layers behind FC, No wonder that the 152-layer network is less computationally than the 16-19-layer vgg.

Here is the tricksof the implementation section in the article:

Picture resize: Short side length Random.randint (256,480)
Cropping: 224*224 random sampling, with horizontal flipping
Reduced mean value
Standard color extensions [2]
Conv and activation plus batch normalization[3]
Help solve vanishing/exploding problems
minibatch-size:256
Learning-rate: initial 0.1, error is flat LR is divided by 10
Weight decay:0.0001
momentum:0.9
Useless dropout[3]

Actually look down all is quite regular method.

3. Experimental results

Layer 34 vs. 18 Network: During the training process,
34 Plain net (without residual function) is greater than the error of 18 layer plain net
34 layer Residual NET (without residual function) is smaller than the error of 18-tier residual net, which is smaller than the 34-tier plain net 3.5% (TOP1)
18-tier residual net is faster than 18-tier plain net convergence
Residual the settings of the function:
A) in the H (x) and X dimensions are not the same, with 0 filling complement
B) in H (x) and x dimensions, wT
C) Any shortcut with WT
Loss Effect: A>b>c

4. Important Reference

[1]. Highway Networks
[2]. ImageNet classification with deep convolutional neural Networks
[3]. Batch Normalization
[4]. Vgg

from:http://blog.csdn.net/abcjennifer/article/details/50514124

Residual Networks <2015 ICCV, ImageNet image classification top1>

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Residual Networks <2015 ICCV, ImageNet image classification top1>

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Residual Networks <2015 ICCV, ImageNet image classification top1>

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support