Author: Cold Young Yang
Time: December 2015.
Source: http://blog.csdn.net/han_xiaoyang/article/details/50321873
Disclaimer: Copyright, reprint please contact the author and indicate the source 1. Introduction
In fact, in the beginning to say this part of the content, I was rejected, because I think there is a way to write a summary of the feeling of a few lessons. And the general intuitive understanding of the reverse communication algorithm is a derivative of a chain of law. But to understand this part and the details of the neural network design and adjust the optimization is useful, so bite the bullet and write it.
problem description and motivation:
As we all know, we are in the given image pixel vector x and the corresponding function f (x) f (x), and then we want to be able to compute the gradient of F F on X x (∇f (x) \nabla f (x))
We want to solve this problem because in the neural network, F f corresponds to the loss function L L, and the input x x corresponds to the training sample data and the weight of the neural network W. To give a special case, the loss function can be the SVM loss function, and the input corresponds to the sample data (Xi,yi), I=1 ... N (x_i,y_i), I=1 \ldots N and weights W and offset entry b b. The point to note is that in our scenario, we usually think that the training data is given, and the weight is the variable we can control. Therefore, in order to update the weight of such parameters, so that the loss function is the smallest value, we usually calculate F F on the parameter w,b w,b gradient. But it is sometimes useful to calculate its gradient on Xi x_i, for example if we want to visualize and understand what neural networks are doing. 2. High-number gradient/partial Guide Foundation
Okay, now we're going to go over the math class, starting with the simplest example, if f (x,y) =xy f (x,y) =xy, then we can ask for the partial derivative of x x and Y y of this function, as follows:
F (x,y) =xy→∂f∂x=y∂f∂y=x f (x,y) = x y \hspace{0.5in} \rightarrow \hspace{0.5in} \frac{\partial f}{\partial x} = y \hspace{0. 5in} \frac{\partial f}{\partial y} = X
2.1 Explanation
We know what the partial derivative actually means: a variation rate of a function near the current point in the dimension of the given variable. Is
DF (x) dx=