The structure of the residual network is used in a recent paper, so let's look at how the residual network works. The depth of the residual network can reach a heinous depth, how much more specific how good I did not say much.
Background
We all know that a deeper network can produce better results, but training a very deep network has always been a problem, mainly due to the following points: gradient vanishing, scale uneven initialization. In order to solve these problems, many researchers have proposed a number of solutions, but did not solve the problem very well. He Cai found the following condition:
With the increase of network depth, the network performance decreases, and not only the test error increases, the training error is also greater, indicating that this is not caused by overfitting. This kind of phenomenon is unreasonable. Suppose there is a good network A, which is to build a deeper than a network B, so that the first part of B is exactly the same as a, the subsequent network layer only implements the identity mapping, so b worst-case will get the same network performance as a, and not worse than a. This is also the idea of deep residual network, since the part after B is completely identity mapping, you can add this priori information when you train the network, so in the construction of the network when the shortcut link, that is, the output of each layer is not the traditional neural network input mapping, but the map and input overlay. Such as:
This article refers to the HTTP://CAFFECN.CN/?/ARTICLE/4
Deep residual Network