The author says: Before having studied once, but after a period of time, many details place already blurred. Recently deduced again, in order to retain as far as possible the derivation idea, specially writes this blog post. On the one hand for their future memories, on the other hand to communicate with you to learn.
For this blog post, the following description:
1. This blog does not guarantee that the derivation process is completely correct, if there is a problem, please correct me.
2. If necessary, welcome to reprint, the only request is please specify the source.
This paper will start with the basic neural network structure, step by step derivation, finally get a neural network using BP algorithm to train the complete process, as well as the middle of the formula used in the derivation. Neural Network
The structure of the neural network is shown in the following figure, which is composed of three parts: input layer, hidden layer (for convenience, the figure gives a layer, in fact, can have multi-layer) and output layer. For each layer, it is made up of several units (neurons). The neurons in the adjacent two layers are all connected, but there is no connection between them in the same layer. Now the parameters are described: x=[(x (1)) T, (x (2)) T,..., (x (m)) T]t x=\left[(x^{(1)}) ^t, (x^{(2)}) ^t,\ldots, (x^{(M)}) ^t\right]^t is the original input dataset. For a single input sample, x (i) =[x (i) 1,x (i) 2,..., x (i) n]t x^{(i)}=\left[x^{(i)}_1,x^{(i)}_2,\ldots,x^{(i)}_n\right]^t, i.e. each sample has n n features that correspond to the number of neurons in the input layer of the neural network. A labeled Training sample set (x (1) is usually required to train the network. Y (1)), (x (2), Y (2)),..., (x (m), Y (m))} \left\{(x^{(1)},y^{(1)}), (x^{(2)},y^{(2)}), \ldots, (x ^{(M)},y^{(M)}) \right\}, the total number of samples is M M. The actual parameters of the network θ= (w,b) \theta= (w,b), W W represents the connection weights between the layers, b b denotes the bias, for example, W (l) IJ