First, prepare
In order to understand the neural network more deeply, the author adopts the pure C + + handwritten method, in which the operation of the Matrix is called OpenCV, and the dataset comes from the public dataset A1A.
Experimental environment: Visual Studio 2017 opencv3.2.0 A1A Data set
This article closely follows the previous article depth study practice (i)--logistic regression. Ii. The basis of neural network
The standard neural network structure is shown in the following diagram, in fact, the enhanced version of the logistic regression above (that is, adding a few hidden layers), the basic idea has not changed. For more detailed introduction of the principle, here is recommended Wunda in-depth study series of courses.
The following is a three-layer neural network (pictured above) combined with a A1A dataset, describes the general steps to build: Initialize parameters W1, W2, W3 and B1, B2, B3, because the dimension of the A1A dataset has 123 characteristics, so the Input_layer dimension is (123,m) and M is the sample number , such as the training set is 1065, and the number of intermediate hidden neurons in the three-layer neural network constructed by us is (64,16,1), so the initialization parameter matrix W1 (123,64), W2 (64,16), W3 (16,1) and biased real numbers B1, B2, B3. Multiply W and x (Matrix multiplication, x to upper output, first as sample input), plus bias B (for real numbers), get Z. Activate z, select Activation function Relu in the hidden layer (you can better prevent the gradient explosion, and the result is good), the output layer select sigmoid limit output, their image is as follows: After the forward propagation above, define the loss function, where the cross entropy cost function is used. Reverse propagate, and update parameters.
Forward Propagation Basic formula:
Here the superscript l represents the layer, and superscript I represents the first few samples (corresponding to the A1A dataset, the first few lines), such as a[0] A [0] a^{[0]}, which represents the 0-level input (i.e., the sample input).
Z[1]=w[1]a[0]+b[1] (1) (1) Z [1] = W [1] A [0] + b [1] z^{[1]} = w^{[1]}a^{[0]} +b^{[1]}\tag{1}
A[1]=relu ( Z[1]) (2) (2) A [1] = R e l u (Z [1]) a^{[1]} = Relu (Z^{[1]}) \tag{2}
Z[2]=w[2]a[1]+b[2] (3) (3) Z [2] = W [2] A [1] + b [2] z^{[2]} = w^{[2]}a^{[1]} +b^{[2]}\tag{3}