The reverse propagation algorithm was first proposed in the 70 's, but until 1986, a famous paper was jointly published by David Rumelhart, Geoffrey Hinton, and Ronald Williams (learning Representations by back-propagating errors), the importance of this algorithm was fully realized.
The most basic primer:
The basic computational process of reverse propagation is understood through a simple example:
English version: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
Chinese version: http://www.cnblogs.com/charlotte77/p/5629865.html
Note: The Chinese version is appended with the detailed code implementation, Python version.
A detailed explanation of the inverse propagation algorithm, very good material
Http://www.offconvex.org/2016/12/20/backprop/https://github.com/rasbt/python-machine-learning-book/blob/master /FAQ/VISUAL-BACKPROPAGATION.MD Extended Reading
English version: https://medium.com/@karpathy/YES-YOU-SHOULD-UNDERSTAND-BACKPROP-E2F06EAB496B#.I6D27YDS7
Chinese version:
https://mp.weixin.qq.com/s__biz=MzA3MzI4MjgzMw==&mid=2650721602&idx=2&sn= f18e2d3a23dec485350611651e571031
Note: There is a small error in the Chinese version, when z=0.5, its local gradient z* (1-z) will reach the maximum value of 0.25, should be the maximum value of 0.25, after all, is the opening down of two functions. In other words, each time the S-shape function, the gradient will be reduced by 1/4, resulting in a network in the lower layer of training compared to the higher level much slower.