Fully-connected BP neural network

Source: Internet
Author: User

"Fully connected BP neural network"

This paper mainly describes the forward propagation and error reverse propagation of the fully connected BP neural network, all of which are used by Ng's machine learning. An all-connected neural network diagram is given.

1 forward propagation 1.1 forward propagation

The input and output of the neurons in the l -layer were calculated.

1.1.1 Paranoid 1 o'clock

Vector overall form:

Component form:

1.1.2 Paranoia is B

Vector overall form:

Component form:

1.2 Network Error 1.2.1 Paranoia entry is 1 o'clock

For an input sample, its output is, it corresponds to the actual output should be, then, the sample corresponding error E is

(1)

Note that the output of the K -neurons in the output layer can be calculated as follows:

(2)

Then, the error E can be expanded to the form of the hidden layer (the L-1 layer)

(3)

It is also noted that the output of the J neurons of the hidden layer (layer L-1) can be calculated as follows:

(4)

Then, the error E is further expanded to the hidden layer ( L-2 layer)

(5)

It can be found thatE is a function of the weight value.

1.2.2 Paranoia is b

For an input sample, its output is, it corresponds to the actual output should be, then, the sample corresponding error E is

(6)

Note that the output of the K -neurons in the output layer can be calculated as follows:

(7)

Then, the error E can be expanded to the form of the hidden layer (the L-1 layer)

(8)

It is also noted that the output of the J neurons of the hidden layer (layer L-1) can be calculated as follows:

(9)

Then, the error E is further expanded to the hidden layer ( L-2 layer)

(10)

It can be found thatE is a function of weights and bigotry.

2 sensitivity in reverse propagation of errors

The sensitivity of a layer is defined as the partial derivative of the network error to the input of the layer, i.e.

2.1 The sensitivity of the 2.1.1 output layer to a 1 o'clock bias

The sensitivity of the K neurons in the output layer (layer L ) is defined as follows:

In order to calculate this sensitivity, the chain rule is used to introduce intermediate variables (the output of the K -neurons in the L -layer):

(11)

First, calculate:

Then, calculate:

Here F is the sigmoid function, which has:

This can be achieved by:

(12)

Then, the sensitivity of all neurons in the L -layer is:

(13)

2.1.2 Other Layers

The sensitivity of the J -Neurons in the section L-1 is calculated as follows:

In order to calculate this sensitivity, the chain rule is used to introduce intermediate variables (the output of the J -neurons of the L-1 layer):

(14)

First, calculate:

which

Then there are:

Then, calculate:

This can be achieved by:

(15)

which

So, the sensitivity of all neurons in the L-1 layer is

(16)

The above derivation is based on the sensitivity of the l -layer to calculate the sensitivity of the l-1 layer, then the use of recursion method can be used to obtain the sensitivity of the L -layer (l=l-1,..., 2):

(17)

2.2 Sensitivity when Paranoid is B

In the derivation process, only one change, that is, the hidden layer of the calculation of the following changes, but the results have not changed, so the final sensitivity of the calculation formula will not affect:

3 Gradient Calculation 3.1 single sample (paranoid 1 o'clock) gradient

At this time, the parameter to be optimized is only the element in the weight matrix, and the partial derivative of the weight matrix of the error E to the L layer is calculated:

For one of these elements, the calculation is as follows:

So, the entire derivation matrix is calculated as follows:

That

3.2 Gradient of a single sample (when the paranoid is B)

The parameters to be optimized at this time are the elements in the weights matrix and the paranoid B;

First, the derivative of the weight matrix of the L -layer is calculated by the error E :

For one of these elements, the calculation is as follows:

So, the entire derivation matrix is calculated as follows:

Next, the partial derivative of the bias matrix of the error E to the L -layer is calculated:

For one of these elements, the calculation is as follows:

So, the whole paranoid is calculated as follows:

3.3mGradient Solution for a sample (no other penalties are added)

As mentioned earlier, for a single sample, its cost function is E, now there are m training samples, its cost function should be the cost of all samples of the mean value of the function, with Ei to denote the first i A cost function for a training sample (that is, the cost function that has been used before), andE represents the cost function for all samples, they have the following relationship:

Then there are:

(18)

If there is a paranoid b , then there is

(19)

If there are m samples, the first and the computed results are matrices, each of which is the sensitivity and output value of the l -layer corresponding to each sample. Then, the gradient values corresponding to the m samples can be computed as follows:

(1) Paranoia is 1

(20)

(2) The paranoid is B

(21)

(22)

4 addition of regularization and sparse items after 4.1 network error

After adding regularization and sparse items, the network error is calculated as follows:

(23)

which

The calculation methods of J1,J2 and J3 are as follows:

The formula for calculating the relative entropy of J neurons in the k -Hidden layer is as follows:

(24)

Wherein: For the k hidden layer J neurons in relation to the input sample of the excitation value, and for the K hidden layer J neurons relative to all input sample excitation value of the mean value.

4.2 Partial derivative of network cost function

The partial derivative of the network cost function:

which

(1) Paranoia is 1 o'clock

(25)

(2) When the paranoid is B

(26)

4.3 Calculation of sensitivity

After adding weight penalty and sparse term, the sensitivity calculation of the output layer is not changed, and the sensitivity formula of the remaining layers becomes as follows:

(27)

5 Calculation process
    1. Using forward propagation algorithm to calculate the excitation value of each layer

    2. Calculate the cost function of the entire network

      Utilization Type (at)

    3. Using the inverse propagation algorithm to calculate the sensitivity of each layer

    4. Calculate the gradient of the cost function to the weight matrix and the paranoid term

      Utilization Type (+) calculate the gradient of the cost function to the weight matrix and the paranoid term

Fully-connected BP neural network

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.