Derivation of neural network and inverse propagation algorithm

Source: Internet
Author: User

Note: Because of their own drawing is too difficult to draw, the basic illustrations in this paper from the algorithm dumpling machine class, please do not reprint

1. Common Machine learning Model:

In fact, basically all basic machine learning models can be summed up in the following characteristics: According to a function, the input is calculated and output. The graphical representation is:

When our G (h) is the sigmoid function, it is the classifier of a logistic regression. When G (h) is a function that can only take 0 or 1 values, it is a perceptual machine. So the problem is, this type of model has obvious flaws: when the model is not linear, or the selected features are incomplete (or inaccurate), the above classifier effect is not particularly gratifying. The following example:

We can easily use a perceptron model (Perceptron algorithm) to implement a logic and (and), logical or (or) and logical or inverse of the Perceptron model (Perceptron model algorithm link), because the above three models are linearly divided. However, if we use the Perceptron model to implement a logical non-XOR (the same as 1, the difference is 0), all the output of our training model will be wrong, the model is not linear!


2. Neural Network Introduction:

We can construct the following models:

(where a represents logic with, B is logical or inverse, C is logical OR)

The above model is a simple neural network, we have constructed three perceptron, and the output of two perceptron as another input to perceive it, to achieve the logical non-XOR or model we want to solve the above linear non-point problem. So how does the problem work? In fact, the essence of the neural network is that each layer of hidden layer (in addition to the input and output nodes, the following introduction) of the generation of new features, new features here to generate new features, know that the latest features can be a good representation of the model so far . This solves the problem that the linear irreducible or the feature selection is insufficient or imprecise. (previously introduced the essence of linear irreducible is the feature is not enough)

The model structure of the neural network is as follows:

(blue, red, yellow represent input layer, shadow layer, output layer, respectively)

Each of the training models in the neural networks we introduce here is a logistic regression model, that is, G (h) is the sigmoid function.

We can represent the neural network as follows:

3. Calculation of neural network prediction Results (hypothesis function) and calculation of costfunction

The calculation of prediction results is not much different from the normal logistic regression calculation. It is only sometimes necessary to use the output of some logistic regression as input to other logistic regression models, such as the output of the above example:

So what is the difference between the costfunction calculation and the costfunction calculation of logistic regression?

The costfunction of logistic regression are as follows:

The essence of these equations is to estimate the error of the predicted result and the actual mark, but our neural network model sometimes outputs more than one, so the error estimate of the neural network needs to add all the costfunction of the output layer:


K: Represents the output of the first few.

Add: Neural networks can solve a few types of problems?

Theoretically, when the output unit has only one, can solve the 2 classification problem, when the output unit is 2 can solve the 4 classification problem, and so on ...

In essence, when we have three output units, we can solve the three classification problem ([1,0,0],[0,1,0],[0,0,1]), why is this design? Stay white for the time being, solve later

PS: Interview questions: An output machine, 15% may output 1,85% output 0, construct a new machine, so that the 0,1 output is the same as possible? A: Let output two outputs 01 for 0, 10 for 1, the rest to discard

4. Training of Neural networks

Here also with logistic regression, so-called training is to adjust the weight of W, let us once again the neural network costfunction write out!

W represents the characteristic weights of all layers, Wij (l) represents the first and second elements of the L-layer and the characteristic weights of the J-features.

M represents the number of samples, K represents the number of output units

HW (x (i)) k represents the I sample at the output layer of the K-sample output Y (i) K for the first I sample of the K output

Then with logistic regression, all w updates are available. The difficulty lies in how the partial derivative here is asked. First of all, the chain-type derivation rule:

So we can have:

The next problem is the theta, when we ask for the error rate of change is the last layer (the last layer is the first layer of the output layer) and only one output neuron is seen:

Multiple additions can be

So how is the rate of neuronal change in the middle level calculated? We need to study the relationship between the L layer and the + 1 layer, such as:

The first z of the L layer and the first I of the L layer of the relationship is to take a sigmod function, but the first L-layer I A and its corresponding w multiplied by the other node and its weight of the product constitutes the z of the l+1 layer, good awkward, very difficult to understand Ah, look at the formula:


This is generally the case, the specific steps are:

1. The forward propagation algorithm is used to calculate the output of each neuron.

2. For each output of the output layer, the corresponding error is calculated

3. The error rate of each neuron is calculated as:

4. Calculate the differential of the costfunction, i.e.:

5. Code:

Not yet, write it up and attach another blog post.




Derivation of neural network and inverse propagation algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.