Identity mappings in Deep residual Networks (translated)

Source: Internet
Author: User

For reference only, if there is no translation in place please point out.

Thesis address: Identity mappings in Deep residual Networks

Address: http://blog.csdn.net/wspba/article/details/60750007 Summary

As a very deep network framework, the deep residual network has been shown to be very good in precision and convergence. In this paper, we analyze the method of calculating propagation behind the residual block (residual building blocks), indicating that when the jump connection (skip connections) and the additional activation items both use the identity map (identity mappings), Forward and back signals can be passed directly from one block to any other block. A series of "ablation" experiments (ablation experiments) have also validated the importance of these identity mappings. This prompts us to propose a new residual unit, which makes training simpler and improves the generalization of the network. We report the results of the 1001-layer ResNet on the CIFAR-10 (4.62\% error) and CIFAR-100, and the 200-layer resnet on the imagenet. The code can be downloaded from the https://github.com/KaimingHe/resnet-1k-layers. introduce

The Deep Residual network (resnets) consists of many "residuals" units. Each unit (Fig.1 (a)) can be expressed as:
Yl=h (XL) +f (XL,WL), Xl+1=f (YL), \begin{gather} {Y}_{l} = h ({x}_{l}) + \mathcal{f} ({x}_{l}, \mathcal{w}_l), \nonumber\\ {x }_{l+1} = f ({y}_{l}) \nonumber, \end{gather}
Where XL {x}_{l} and xl+1 {x}_{l+1} are inputs and outputs of the L Unit, F \mathcal{f} represents a residual function. In He2016, H (XL) =xl h ({x}_{l}) = {X}_{l} represents an identity map and F F represents Relu.

Fig.1 (a) the original residual unit, (b) the residual unit in this paper, and the training curve of the right: 1001 layer resnets on the CIFAR-10. The real line corresponds to the test error (the Y axis on the right), and the dotted line corresponds to the training loss (the y-axis on the left). The unit proposed in this paper makes the training of ResNet-1001 more simple.

More than 100 layers of resnets show high precision in many of the identification challenges of the Imagenet \cite{russakovsky2015} and Mscoco \cite{lin2014} races. The core idea of Resnets is to learn an extra F \mathcal{f} that corresponds to the residual function of H (XL) H ({x}_{l}), and the key to this idea is the identity map H (XL) =xl h ({x}_{l}) = {x}_{l}. This is accomplished by an identical hop connection ("shortcut").

In this paper, we do not just create a "direct" computational propagation path in the whole network to analyze the depth residual network in the residual unit. Our derivation shows that if H (XL) H ({x}_{l}) and F (yl) F ({y}_{l}) are identical mappings, then in the forward and reverse phases, the signal can be passed directly from one cell to any other. Our experiment shows that when the frame is close to both states, the training becomes simpler.

In order to understand the effect of jumping connections, we analyze and compare H (x

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.