This article by @ Star Shen Ge ice not to produce, reprint please indicate author and source.
Article Link: http://blog.csdn.net/xingchenbingbuyu/article/details/53677630
Weibo: http://weibo.com/xingchenbing
In the previous blog Net class design and neural network initialization, most of them are relatively simple. Because the most important thing is to generate various matrices and initialize them. The focus and core of neural network is the content of this paper--forward and reverse propagation two big computational processes. The forward propagation of each layer contains the weighted sum (convolution?) respectively. Linear operation and the nonlinear operation of the activation function. The inverse propagation mainly uses the BP algorithm to update the weight value. This article is also divided into two parts.
First, the forward process
As mentioned earlier, the forward process is divided into two parts: linear and nonlinear operations. relatively simple.
The linetype operation can be represented by y = wx+b, where x is the input sample, here is the nth-level single-column matrix, W is the weight matrix, and y is the result matrix after the weighted sum, and the size is the same as the single-column matrix of the N+1 layer. B is biased, with default initialization of all 0. It's not hard to infer (ghosts know how long I've been pushing!) ), the size of W is (n+1). Rows * N.rows. As in the previous article, the code implementation that generated the weights matrix is the same:
Nonlinear operations can be represented by o=f (Y). Y is the y that gets above. O is the output of the first n+1 layer. f is the activation function we've been talking about. The activation function is generally a nonlinear function. The value of its existence is to provide the neural network with nonlinear modeling capability. There are many types of activation functions, such as sigmoid functions, tanh functions, relu functions, and so on. The advantages and disadvantages of various functions can be referred to more professional papers and other more professional information.
We can first look at the code of the Forward Function forward ():
Forwardvoid Net::farward () {for (int i = 0; i < layer_neuron_num.size ()-1; ++i) {Cv::mat product = weights[i] * Layer [i] + bias[i];layer[i + 1] = activationfunction (product, activation_function);}}
The
two sentences in the For loop are the non-linear operations of the linear and activation functions mentioned above.
The activation function activationfunction () implements different kinds of activation functions, which can be selected using the second parameter. The code is as follows:
Activation Functioncv::mat net::activationfunction (Cv::mat &x, std::string func_type) {activation_function = Func_type;cv::mat fx;if (Func_type = = "Sigmoid") {FX = sigmoid (x);} if (Func_type = = "Tanh") {FX = Tanh (x);} if (Func_type = = "ReLU") {FX = ReLU (x);} return FX;}
The
more detailed parts of each function are in the Function.h and Function.cpp files. Do not omit the table, interested in your visit to GitHub.
Again, the net class given in the previous blog is streamlined, and there may be some member variables that were not present in the previous net class. The definition of the complete net class is still on GitHub.
Second, the reverse propagation process
The principle of reverse propagation is the rule of chain derivation, in fact, it is the principle of compound function derivation of high-number middle school. This is only used when deriving the formula. The specific derivation process I recommend a look at the following tutorial, with the method of illustration, forward propagation and reverse propagation of the performance of the clear, highly recommended!
Principles of training multi-layer neural network using backpropagation.
One will take a picture from this article to illustrate the code for weight updates. Before that, let's take a look at what the code for the inverse propagation function backward () looks like:
Forwardvoid Net::backward () {Calcloss (Layer[layer.size ()-1], Target, output_error, loss);d Eltaerror (); Updateweights ();}
you can see that the main is three lines of code, which is called three functions:
The first function Calcloss () calculates the output error and the objective function, and the mean value of the sum of squares of all output errors as the objective function to minimize.
The second function deltaerror () calculates the delta error, which is the middle delta1*df () part.
The third function updateweights () updates the weights, which is the formula used to update weights.
Here is a screenshot from the previously highly recommended article:
PS: When I write here, I always feel that I have a problem with my program, and I have seen it a few times. And the program really can be trained. It feels strange, if someone sees it, be sure to tell me. )
just look at the code for the Updateweights () function:
Update weightsvoid net::updateweights () {for (int i = 0; i < weights.size (); ++i) {Cv::mat delta_weights = Learning_rat E * (delta_err[i] * LAYER[I].T ()); Weights[i] = weights[i] + delta_weights;}}
The core of the two lines of code should still be able to compare clearly reflected in the weight of the formula updated. The ETA in the formula in the figure is often referred to as the learning rate. This is often adjusted when training neural network parameters.
The part that calculates the output error and Delta error is purely a mathematical operation and is lackluster. But put the code below, because it feels strange is this part, I hope the master guidance.
The Calcloss () function is in the Function.cpp file:
Objective functionvoid Calcloss (Cv::mat &output, Cv::mat &target, Cv::mat &output_error, float &loss) {if (Target.empty ()) {std::cout << "Can ' t find the target Cv::matrix" << Std::endl;return;} Output_error = Target-output;cv::mat Err_sqrare;pow (output_error, 2., err_sqrare); Cv::scalar err_sqr_sum = SUM (err_ Sqrare); loss = Err_sqr_sum[0]/(float) (output.rows);}
Deltaerror () in the Net.cpp (seems to feel the wrong way, as if there is no problem AH):
Compute Delta errorvoid Net::d eltaerror () {delta_err.resize (Layer.size ()-1); for (int i = Delta_err.size ()-1; I >= 0; i--) {delta_err[i].create (layer[i + 1].size (), Layer[i + 1].type ());//cv::mat dx = Layer[i+1].mul (1-layer[i+1]); cv::Mat DX = derivativefunction (layer[i + 1], activation_function)//output layer Delta Errorif (i = = Delta_err.size ()-1) {Delta_ Err[i] = Dx.mul (output_error);} else //hidden Layer Delta Error{cv::mat weight = Weights[i];cv::mat weight_t = weights[i].t (); Cv::mat delta_err_1 = de Lta_err[i];d Elta_err[i] = Dx.mul ((weights[i + 1]). T () * delta_err[i + 1]);}}}
It is important to note that the calculation formula of the output layer and the hidden layer is different when calculating.
At this point, the core part of the neural network has been realized. The rest is to think about how to train. This time you can still write a small program to do several forward propagation and reverse propagation if you wish. Still, the ghost knows how long it took me to debug before I could spread the word!
Not to be continued ...
C + + from zero to the second-forward propagation and reverse propagation of deep neural networks