Stanford Machine Learning Open Course Notes (6)-Neural Network Learning

Source: Internet
Author: User

Public Course address:Https://class.coursera.org/ml-003/class/index 

INSTRUCTOR:Andrew Ng

1. Cost Function ( Cost functions )

The last lecture introduced the multiclass classification problem. The difference between the multiclass classification problem and the binary classification problem lies in that there are multiple output units, which are summarized as follows:

At the same time, we also know the price functions of Logistic regression as follows:

The first half represents the difference between the real value and the hypothetical value, and the second half represents the deviation item for normalization of the coefficient. Like this, we can define the cost function of a neural network:

The distance between the actual value and the hypothetical value is defined as the sum value between all samples and output categories, followed by the Normalized Deviation of the weight.

2. Backpropagation Algorithm ( Reverse PropagationAlgorithm )

Since we have already given the form of a cost function, the old idea is to minimize the number of parameters:

To perform gradient descent, We need to list the inputs and outputs of each layer of the neural network. The representation method is the same as the previous one:

Here we need to define an error variable Delta to indicate the influence of the node on the occurrence of the final error. For the last layer, we can directly define the error value. For the hidden layer above, we can only solve the problem through reverse derivation, which is also the origin of the reverse propagation algorithm.

For detailed derivation process, see Wikipedia:

Http://en.wikipedia.org/wiki/Backpropagation

The gradient descent algorithm can be described as follows:

Use Delta to indicate the global error. Each layer corresponds to a delta (l ). Then D is introduced as the result of evaluate the parameters of the cost function. Whether J on the left is equal to 0 affects whether there is a final deviation.

3. Backpropagation intuition (reverse propagation example)

This section provides an example of using the BP algorithm to calculate the weight of a neural network. First, we need to define the network structure and some input/output representations:

At the same time, we will simplify the cost function here, so we do not need to consider the final normalization deviation items:

For the I-th sample, cost (I) is defined as follows. If you are familiar with the derivation process of Delta, you can find:

At the same time, for each layer, the Delta component is equal to all the Delta weighting and of the next layer. The weight is the callback parameter:

4. Gradient checking (gradient check)

In the process of solving the problem, check the gradient to determine if there is any problem with our code. For the following figure, take a point around the vertex (Θ + ε) and (Θ-ε), then the derivative (gradient) of a vertex is approximately equal to (J (Θ + ε) -J (Gini-ε)/(2 ε ):

The following formula is used to evaluate each parameter:

Since we can always get the derivative D (derivative) of J (derivative) in the BP algorithm, we can compare this approximate value with D, if the two results are similar, the code is correct; otherwise, the error occurs.

Note the following points:

5. Random initialization (random initialization)

For the theta parameter initialization problem, the simplest idea is to assign a value of 0 first:

However, this assignment makes no difference in hiding nodes at the beginning of the computation. Just like this, the calculation process and result of A1 and A2 are the same, which is equivalent to a single node, causing waste. To break this situation, you can perform random Initialization on Theta:

6. putting it together (General)

What do we need to do to train a neural network?

First, select the network structure:

Then the training weight includes initialization and BP algorithms:

Finally, check whether the trained parameters are correct. Use the gradient check method mentioned above:

This completes the training process of a neural network.

-------------------------------------------------- Weak split line ----------------------------------------------

The focus of this section is the BP algorithm. However, because there are not many details about the deduction in the video, the conclusion is not impressive and you must deduce it yourself. After the algorithm is added, the neural network has a self-learning process. You only need to define the structure and initial values of the neural network.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.