BP neural network algorithm Learning

Last Update:2014-09-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

BP (Back Propagation) network is a multi-layer feed-forward Network trained by the error inverse propagation algorithm, which was proposed by a team of scientists led by Rumelhart and mccelland in 1986, it is one of the most widely used neural networks. The BP network can learn and store a large number of input-output mode ing relationships without revealing mathematical equations describing such ing relationships beforehand.

The structure of a neural network is shown below.

The topological structure of the BP neural network model includes the input layer, the hidden layer, and the output layer ). The number of neurons in the input layer is determined by the dimension of the sample attribute, and the number of neurons in the output layer is determined by the number of samples. The number of layers in the hidden layer and the number of neurons in each layer are specified by the user. Each layer contains several neurons, each of which includes a threshold value to change the activity of neurons. The arc in the network indicates the weight between the first layer of neurons and the second layer of neurons. Each neuron has input and output. The input and output at the input layer are the property values of the training samples.

For the input of the hidden layer and the output layer, the connection from unit I on the previous layer to Unit J is the output of unit I on the previous layer, but the threshold value of Unit J.

The output of neurons in the neural network is calculated through the activation function. This function uses a symbolic representation unit to represent the neuron activity. Activation functions generally use simoid functions (or logistic functions ). The output of neurons is:

In addition, a neural network has the concept of learning rate (l), which usually takes the value between 0 and 1 and helps to find the global minimum. If the learning rate is too small, the learning will be very slow. Assuming that the learning rate is too high, it is possible to swing between unsuitable solutions today.

The basic elements of the neural network are clearly explained. Let's take a look at the Learning Process of the BP algorithm:

Bptrain (){

Permission and threshold for network initialization.

While termination condition does not meet {

For samples, each training sample X {

// Forward Propagation Input

For hide or output layer each unit J {

; // The net input of the computing unit J relative to the previous layer I; // the output of the computing unit J

}

// Backward propagation Error

For output layer, each unit J {

; // Calculation error

}

For from the last to the first hidden layer, for each unit of the hidden layer J {

; // K is the neuron in the next layer of J.

}

For Network each permission {

; // Value-added permission

; // Permission update

}

For network every deviation {

; // Value-added for deviations

; // Updated Deviation

}

The basic algorithm flow is:

1. initialize the network weight and neuron threshold (the simplest method is random initialization)

2. Forward Propagation: Calculate the input and output of hidden-layer neurons and output-layer neurons based on the formula.

3. Backward propagation: the weights and thresholds are corrected based on the formula.

Until the termination conditions are met.

Note the following points in the algorithm:

1. There is a neuron error.

For the neurons in the output layer, the actual output is the Unit J, but the true output of the known class labels of J Based on the given training samples.

For hidden layer neurons, the Unit K is connected from Unit K in the next higher layer to Unit J, but the error of unit K.

The weight increment is, the threshold increment is, and the learning rate is among them.

The gradient descent algorithm is used for the derivation of the gradient. The premise of derivation is to ensure the minimum mean variance of the output unit ., P indicates the total number of samples, M indicates the number of neurons in the output layer, which is the actual output of samples and the output of neural networks.

The idea of gradient descent is the derivative of the logarithm.

For the output layer:

That is.

For hidden layers:

= Is the formula for calculating the error of the hidden layer.

2. There are multiple forms of termination conditions:

§ All the values in the previous cycle are too small to be smaller than a specified threshold value.

§ The percentage of samples incorrectly classified in the previous cycle is less than a threshold value.

§ The number of periods exceeds the predefined period.

§ The mean square error between the output value and the actual output value of the neural network is smaller than a certain threshold value.

Generally, the accuracy of the last termination condition is higher.

There are still some practical problems in the process of actually using the BP neural network:

1. sample processing. For the output, if there are only two types, the output is 0 and 1, and only 0 and 1 are output when it tends to be positive and negative infinity. Therefore, the conditions can be relaxed as needed. When the output is greater than 0.9, it is regarded as 1, and when the output is less than 0.1, it is considered as 0. For the input, the sample also needs to be normalized.

2. Select the network structure. The number of layers and neurons in the hidden layer determine the network scale. The network scale is closely related to the performance learning effect. Large scale, large computing capacity, and may lead to over-fitting; however, small scale may also lead to underfitting.

3. Select an initial weight and threshold. The initial value has an impact on the learning result. It is also important to select an appropriate initial value.

4. Incremental Learning and batch learning. The above algorithms and mathematical derivation are based on batch learning. Batch learning is suitable for offline learning and has good learning stability. incremental learning is used for online learning, it is more sensitive to the noise of Input Samples and is not suitable for input modes with dramatic changes.

5. There are other options for incentive functions and error functions.

In general, there are more options for the BP algorithm than the limit algorithm, and there is usually a larger optimization space for specific training data than the limit algorithm.

BP neural network algorithm Learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

BP neural network algorithm Learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

BP neural network algorithm Learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support