For reference only, if there is no translation in place please point out.
Thesis address: Identity mappings in Deep residual Networks
Address: http://blog.csdn.net/wspba/article/details/60750007 Summary
As a very deep network framework, the deep residual network has been shown to be very good in precision and convergence. In this paper, we analyze the method of calculating propagation behind the residual block (residual building blocks), indicating that when the jump connection (skip connections) and the additional activation items both use the identity map (identity mappings), Forward and back signals can be passed directly from one block to any other block. A series of "ablation" experiments (ablation experiments) have also validated the importance of these identity mappings. This prompts us to propose a new residual unit, which makes training simpler and improves the generalization of the network. We report the results of the 1001-layer ResNet on the CIFAR-10 (4.62\% error) and CIFAR-100, and the 200-layer resnet on the imagenet. The code can be downloaded from the https://github.com/KaimingHe/resnet-1k-layers. introduce
The Deep Residual network (resnets) consists of many "residuals" units. Each unit (Fig.1 (a)) can be expressed as:
Yl=h (XL) +f (XL,WL), Xl+1=f (YL), \begin{gather} {Y}_{l} = h ({x}_{l}) + \mathcal{f} ({x}_{l}, \mathcal{w}_l), \nonumber\\ {x }_{l+1} = f ({y}_{l}) \nonumber, \end{gather}
Where XL {x}_{l} and xl+1 {x}_{l+1} are inputs and outputs of the L Unit, F \mathcal{f} represents a residual function. In He2016, H (XL) =xl h ({x}_{l}) = {X}_{l} represents an identity map and F F represents Relu.
Fig.1 (a) the original residual unit, (b) the residual unit in this paper, and the training curve of the right: 1001 layer resnets on the CIFAR-10. The real line corresponds to the test error (the Y axis on the right), and the dotted line corresponds to the training loss (the y-axis on the left). The unit proposed in this paper makes the training of ResNet-1001 more simple.
More than 100 layers of resnets show high precision in many of the identification challenges of the Imagenet \cite{russakovsky2015} and Mscoco \cite{lin2014} races. The core idea of Resnets is to learn an extra F \mathcal{f} that corresponds to the residual function of H (XL) H ({x}_{l}), and the key to this idea is the identity map H (XL) =xl h ({x}_{l}) = {x}_{l}. This is accomplished by an identical hop connection ("shortcut").
In this paper, we do not just create a "direct" computational propagation path in the whole network to analyze the depth residual network in the residual unit. Our derivation shows that if H (XL) H ({x}_{l}) and F (yl) F ({y}_{l}) are identical mappings, then in the forward and reverse phases, the signal can be passed directly from one cell to any other. Our experiment shows that when the frame is close to both states, the training becomes simpler.
In order to understand the effect of jumping connections, we analyze and compare H (x