Neural network detailed detailed neural networks

Source: Internet
Author: User

BP algorithm of neural network, gradient test, random initialization of Parameters neural Network (backpropagation algorithm,gradient checking,random initialization) one, cost function
for a training set, the cost function is defined as:


where the red box is circled by a regular term, K: the number of output units is the number of classes, L: The total number of neural network layer,: the number of units on the first layer (excluding offset units),: Represents the weight on the edge of the layer.
second, the error inverse propagation (backpropagation, referred to as BP)
with the cost function, our present is obviously found to be able to make the smallest parameters, in order to use gradient descent or other optimization algorithms, we need to calculate:

directly with the above formula can find out, the key is how to calculate. BP algorithm can find the best weight, the following is the basic principle of BP algorithm. Let's take a look at forward propagation (don't draw, give a picture of NG drawing directly)



next look at the BP algorithm: defined as the residual of the L-level element I. The goal of the BP algorithm is to minimize . For example, , the mean square error is:. In the gradient drop, each iteration is updated according to the following formula:.
the idea of the BP algorithm is as follows: Given a sample , all the activation values in the neural network are calculated first based on the forward conduction (forward propagation). For each node I of the first layer, we can calculate its "residuals", which indicates how much the node has affected the residual of the final output value. For the last layer of the output layer, we can directly calculate the residual between the output of the neural network and the actual category value. So what's the point of dealing with hidden layers? We will compute these nodes as inputs based on the weighted average of the residuals on the node's first layer. therefore, for the last layer of output layer L (denoted by L for the last layer of output layer):


for hidden layers:

It is therefore possible to:

in this way, we can obtain the partial derivative: so.
the formula derivation of BP algorithm has already been introduced, we can deduce it on paper by ourselves. The following is a summary of the implementation process of the BP algorithm (directly misappropriation Ng's diagram bar):




the above is the details of the BP algorithm principle. In summary, it is: 1. Use forward propagation to calculate the "activation value" of each layer. 2. Calculate the residual of each output unit of the last layer, namely the output layer. 3, calculate the residual of the first node. 4, calculate the partial derivative we need.
speaking so much, let's take a look at the example of Ng, which makes it possible to visualize the details of the implementation process of the BP algorithm:




third, gradient test (gradient checking)
The BP algorithm is very detailed and complex, so it is very error-prone and difficult to check out. Gradient testing is therefore required, and the gradient test can be very confident in verifying that the BP algorithm you are implementing is correct. Gradient test as shown (picture from NG machine learning lesson):


Therefore, the gradient test is used for each parameter:



Because in the process of the BP algorithm, we can compare this approximation with the derivative, if the two are the same or very close, we can confirm that our implementation of the BP algorithm is correct. One point to note: When you train a BP neural network, be sure to turn off gradient checking because the gradient test executes very, very slowly.

Iv. parameter randomization initialization (random initialization)
It is possible to initialize the zeros (n,1) in both linear and logistic regression. But it is not possible to do so in a neural network, because if the parameters are initialized, it means that the input weights for each cell of the hidden layer are the same, so the cell values of the hidden layer will be the same after each update. This means that all hidden layer units are calculating the same characteristics, which is completely superfluous. Take a picture and visualize it:


because when the neural network parameters (weights) are initialized, random initialization is required, that is, the value of the random initialization, the range is.
on the BP algorithm of neural network, gradient test, the parameter random initialization is introduced, we can combine my previous a neural network to get started knowledge http://blog.csdn.net/u012328159/article/details/ 51143536 See, believe can have a basic understanding of neural network.


Note: Provide some reference material to everyone, can better help you understand the neural network better.
    • Talk about the inverse conduction algorithm http://deeplearning.stanford.edu/wiki/index.php/%E5%8F%8D%E5%90%91%E4%BC%A0%E5%AF%BC%E7%AE%97%E6%B3%95
    • Video, speaking of neural network http://work.caltech.edu/telecourse.html

Neural network detailed detailed neural networks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.