Residual Networks <2015 ICCV, ImageNet image classification top1>

Source: Internet
Author: User

This article describes the residual Networks of the champions--msra He Keming team in the classification task in Imagenet. In fact, MSRA is imagenet this year's big winner, not only in the classification task, MSRA also used residual networks win imagenet detection, localization, As well as the detection and segmentation on the Coco data set, this article simply analyzes the residual Networks.

Directory
————————————
1. Motivation
2. Network structure
3. Experimental results
4. Important Reference

1. Motivation

The author first throws this question, the deeper the neural network is not the deeper the better.
According to our general experience, as long as the network does not train fly (that is, the first in the LSTM proposed vanishing/exploding problem), but also fit, it should be the deeper the better.

But in this case, the network deepened, accuracy but declined, said the situation is degradation. As shown (see [1]):




Training/testing error on the Cifar-10. The network has increased from 20 to 56, and error has risen.


Supposedly we have a shallow net, but in the case of fitting to the depth of a few more layers how to say also not worse than the results of shallow, so degradation that not all networks are so easy to optimize, this article motivation is through the "deep Residual network "solves the degradation problem.



2. Network structure

Shortcut Connections

In fact, this idea and highway networks (Jurgen Schmidhuber article) very similar, even to solve the problem (degradation) are the same. Highway Networks borrowed the concept of gate in lstm, in addition to the normal nonlinear mapping H (x, Wh), but also set a path from the x directly to Y, with T (x, Wt) as gate to grasp the weight between the two, as shown in the following formula:


Y=H(X,WH)⋅T(x,wt)+x-(1−t(x,wt))




Shortcut is meant to be a shortcut, in this case, the more connected, such as the above highway networks from the X directly to Y connection. In fact, early in the inception layer of googlenet, there is this expression:




Residual networks, the author converts the connection with participation rights in the highway network into a fixed weighted connection, i.e.


y< Span id= "mathjax-span-40" class= "Mo" >=h (x , w H) ⋅wt+x  



Residual learning

So far, we have not mentioned the meaning of residual in residual networks. What does this "residual" mean? We want to:
If a complex nonlinear mapping H (x) can be approached with a few layers of network, it is also possible to approximate its residual function with these layers of network:F(x)=H(x )−x, but we "guess" optimized residual mapping than direct optimization h (x) simple.

Readers are recommended to take a look at the reference paper, which is listed at the end of this article, and the authors say the advantages compared to the highway network are:

x highway Network residual Network comments
Gate parameters Parametric variable WT No parameters, dead, convenient and no residual network comparison Does not have the superiority, the parameter is few and data-independent, the result certainly will not be optimal, the article experiment part also compares the effect, actually is with the parameter the error is smaller, but WT This variable has nothing to do with solving degradation problems
Close the door? It is possible to close the door (T(x,Wt)=0 ) Won't close. T(x,Wt)∈[0,1] , but generally not for 0



Therefore, this comparison is still more far-fetched. Anyway, it's not easy to tell a story.


34-Layer Residual network

Network construction ideas: basically keep each layer complexity unchanged, that is, which layer down-sampling, the filter number * *, the network is too big, here do not paste, we see paper to it, paper painted a 34-story full convolutional network, without a few layers behind FC, No wonder that the 152-layer network is less computationally than the 16-19-layer vgg.

Here is the tricksof the implementation section in the article:

    1. Picture resize: Short side length Random.randint (256,480)
    2. Cropping: 224*224 random sampling, with horizontal flipping
    3. Reduced mean value
    4. Standard color extensions [2]
    5. Conv and activation plus batch normalization[3]
      Help solve vanishing/exploding problems
    6. minibatch-size:256
    7. Learning-rate: initial 0.1, error is flat LR is divided by 10
    8. Weight decay:0.0001
    9. momentum:0.9
    10. Useless dropout[3]

Actually look down all is quite regular method.



3. Experimental results
  1. Layer 34 vs. 18 Network: During the training process,  
    34 Plain net (without residual function) is greater than the error of 18 layer plain net  
    34 layer Residual NET (without residual function) is smaller than the error of 18-tier residual net, which is smaller than the 34-tier plain net 3.5% (TOP1)  
    18-tier residual net is faster than 18-tier plain net convergence

  2. Residual the settings of the function:
    A) in the H (x) and X dimensions are not the same, with 0 filling complement
    B) in H (x) and x dimensions, wT
    C) Any shortcut with WT
    Loss Effect: A>b>c


4. Important Reference

[1]. Highway Networks
[2]. ImageNet classification with deep convolutional neural Networks
[3]. Batch Normalization
[4]. Vgg

from:http://blog.csdn.net/abcjennifer/article/details/50514124

Residual Networks <2015 ICCV, ImageNet image classification top1>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.