Fully-connected BP neural network

Last Update:2016-04-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

"Fully connected BP neural network"

This paper mainly describes the forward propagation and error reverse propagation of the fully connected BP neural network, all of which are used by Ng's machine learning. An all-connected neural network diagram is given.

1 forward propagation 1.1 forward propagation

The input and output of the neurons in the l -layer were calculated.

1.1.1 Paranoid 1 o'clock

Vector overall form:

Component form:

1.1.2 Paranoia is B

Vector overall form:

Component form:

1.2 Network Error 1.2.1 Paranoia entry is 1 o'clock

For an input sample, its output is, it corresponds to the actual output should be, then, the sample corresponding error E is

(1)

Note that the output of the K -neurons in the output layer can be calculated as follows:

(2)

Then, the error E can be expanded to the form of the hidden layer (the L-1 layer)

(3)

It is also noted that the output of the J neurons of the hidden layer (layer L-1) can be calculated as follows:

(4)

Then, the error E is further expanded to the hidden layer ( L-2 layer)

(5)

It can be found thatE is a function of the weight value.

1.2.2 Paranoia is b

For an input sample, its output is, it corresponds to the actual output should be, then, the sample corresponding error E is

(6)

Note that the output of the K -neurons in the output layer can be calculated as follows:

(7)

Then, the error E can be expanded to the form of the hidden layer (the L-1 layer)

(8)

It is also noted that the output of the J neurons of the hidden layer (layer L-1) can be calculated as follows:

(9)

Then, the error E is further expanded to the hidden layer ( L-2 layer)

(10)

It can be found thatE is a function of weights and bigotry.

2 sensitivity in reverse propagation of errors

The sensitivity of a layer is defined as the partial derivative of the network error to the input of the layer, i.e.

2.1 The sensitivity of the 2.1.1 output layer to a 1 o'clock bias

The sensitivity of the K neurons in the output layer (layer L ) is defined as follows:

In order to calculate this sensitivity, the chain rule is used to introduce intermediate variables (the output of the K -neurons in the L -layer):

(11)

First, calculate:

Then, calculate:

Here F is the sigmoid function, which has:

This can be achieved by:

(12)

Then, the sensitivity of all neurons in the L -layer is:

(13)

2.1.2 Other Layers

The sensitivity of the J -Neurons in the section L-1 is calculated as follows:

In order to calculate this sensitivity, the chain rule is used to introduce intermediate variables (the output of the J -neurons of the L-1 layer):

(14)

First, calculate:

which

Then there are:

Then, calculate:

This can be achieved by:

(15)

which

So, the sensitivity of all neurons in the L-1 layer is

(16)

The above derivation is based on the sensitivity of the l -layer to calculate the sensitivity of the l-1 layer, then the use of recursion method can be used to obtain the sensitivity of the L -layer (l=l-1,..., 2):

(17)

2.2 Sensitivity when Paranoid is B

In the derivation process, only one change, that is, the hidden layer of the calculation of the following changes, but the results have not changed, so the final sensitivity of the calculation formula will not affect:

3 Gradient Calculation 3.1 single sample (paranoid 1 o'clock) gradient

At this time, the parameter to be optimized is only the element in the weight matrix, and the partial derivative of the weight matrix of the error E to the L layer is calculated:

For one of these elements, the calculation is as follows:

So, the entire derivation matrix is calculated as follows:

That

3.2 Gradient of a single sample (when the paranoid is B)

The parameters to be optimized at this time are the elements in the weights matrix and the paranoid B;

First, the derivative of the weight matrix of the L -layer is calculated by the error E :

For one of these elements, the calculation is as follows:

So, the entire derivation matrix is calculated as follows:

Next, the partial derivative of the bias matrix of the error E to the L -layer is calculated:

For one of these elements, the calculation is as follows:

So, the whole paranoid is calculated as follows:

3.3mGradient Solution for a sample (no other penalties are added)

As mentioned earlier, for a single sample, its cost function is E, now there are m training samples, its cost function should be the cost of all samples of the mean value of the function, with Ei to denote the first i A cost function for a training sample (that is, the cost function that has been used before), andE represents the cost function for all samples, they have the following relationship:

Then there are:

(18)

If there is a paranoid b , then there is

(19)

If there are m samples, the first and the computed results are matrices, each of which is the sensitivity and output value of the l -layer corresponding to each sample. Then, the gradient values corresponding to the m samples can be computed as follows:

(1) Paranoia is 1

(20)

(2) The paranoid is B

(21)

(22)

4 addition of regularization and sparse items after 4.1 network error

After adding regularization and sparse items, the network error is calculated as follows:

(23)

which

The calculation methods of J1,J2 and J3 are as follows:

The formula for calculating the relative entropy of J neurons in the k -Hidden layer is as follows:

(24)

Wherein: For the k hidden layer J neurons in relation to the input sample of the excitation value, and for the K hidden layer J neurons relative to all input sample excitation value of the mean value.

4.2 Partial derivative of network cost function

The partial derivative of the network cost function:

which

(1) Paranoia is 1 o'clock

(25)

(2) When the paranoid is B

(26)

4.3 Calculation of sensitivity

After adding weight penalty and sparse term, the sensitivity calculation of the output layer is not changed, and the sensitivity formula of the remaining layers becomes as follows:

(27)

5 Calculation process

Using forward propagation algorithm to calculate the excitation value of each layer
Calculate the cost function of the entire network
Utilization Type (at)
Using the inverse propagation algorithm to calculate the sensitivity of each layer
Calculate the gradient of the cost function to the weight matrix and the paranoid term
Utilization Type (+) calculate the gradient of the cost function to the weight matrix and the paranoid term

Fully-connected BP neural network

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Fully-connected BP neural network

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Fully-connected BP neural network

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support