BP (Back Propagation)The network was proposed by a team of scientists headed by Rumelhart and mccelland in 1986. It is a multi-layer feed-forward Network trained by the error inverse propagation algorithm and is one of the most widely used neural network models. The BP network can learn and store a large number of input-output mode ing relationships without revealing the mathematical equations that describe this ing relationship beforehand.
The structure of a neural network is as follows.
The topological structure of the BP neural network model includes the input layer, the hidden layer, and the output layer ). The number of neurons in the input layer is determined by the dimension of the sample attribute, and the number of neurons in the output layer is determined by the number of samples. The number of layers in the hidden layer and the number of neurons in each layer are specified by the user. Each layer contains several neurons, each of which contains a threshold value to change the activity of neurons. The arc in the network indicates the weight between the first layer of neurons and the second layer of neurons. Each neuron has input and output. The input and output at the input layer are the property values of the training samples.
Input to the hidden layer and output layer
Among them, it is the right to connect from unit I on the previous layer to Unit J; it is the output of unit I on the previous layer; it is the threshold value of Unit J.
The output of neurons in the neural network is calculated through the activation function. This function uses a symbolic representation unit to represent the neuron activity. Activation functions generally use simoid functions (or logistic functions ). The output of neurons is:
In addition, a neural network has the concept of learning rate (l), which usually takes the value between 0 and 1 and helps to find the global minimum. If the learning rate is too small, the learning will be slow. If the learning rate is too high, it may swing between inappropriate solutions.
The basic elements of the neural network are clearly explained. Let's take a look at the Learning Process of the BP algorithm:
Bptrain (){
Permission and threshold for network initialization.
While termination condition does not meet {
For samples, each training sample X {
// Forward Propagation Input
For hidden or output layer, each unit J {
;//
Compared with the previous layer I, the net input of the computing unit J
;//
Output of Computing Unit J
}
// Backward propagation Error
For output layer, each unit J {
;//
Calculation Error
}
For from the last to the first hidden layer, for each unit of the hidden layer J {
;//
K is the neuron in the next layer of J.
}
For Network
{
; // Value-added permission
; // Permission update
}
For Network Deviation
{
;
// Value-added Deviation
;//
Deviation update
}
}
}
The basic algorithm flow is:
1. initialize the network weight and neuron threshold (the simplest method is random initialization)
2. Forward Propagation: Calculate the input and output of hidden-layer neurons and output-layer neurons based on the formula.
3. Back Propagation: the weights and thresholds are modified based on the formula.
Until the termination conditions are met.
Note the following points in the algorithm:
1. There is a neuron error.
For the output-layer neurons, where, is the actual output of Unit J
Instead, J is based on the real output of the known class labels of the given training sample.
For hidden layer neurons, the connection from Unit K to Unit J in the next higher layer is the error of unit K.
The increment of the weight is, the increment of the threshold is, where the learning rate is.
The gradient descent algorithm is used for derivation. The premise of derivation is to ensure the minimum mean variance of the output unit ., P indicates the total number of samples, M indicates the number of neurons in the output layer. P indicates the actual output of samples, and M indicates the output of neural networks.
The idea of gradient descent is the derivative of the logarithm.
For the output layer:
Which is.
For hidden layers:
Where = is the formula for calculating the error of the hidden layer.
2. There may be multiple forms of termination conditions:
§ All values in the previous cycle are too small to be smaller than a specified threshold.
§ The percentage of samples incorrectly classified in the previous cycle is less than a threshold value.
§ The number of periods exceeds the predefined period.
§ The mean square error between the output value and the actual output value of the neural network is smaller than a certain threshold value.
Generally, the accuracy of the last termination condition is higher.
There are still some practical problems in the process of actually using the BP neural network:
1. sample processing. For the output, if there are only two types, the output is 0 and 1. The output is 0 and 1 only when it tends to be positive or negative infinity. Therefore, the conditions can be relaxed as needed. If the output is greater than 0.9, 1 is considered as the output, and 0 is considered as the output <0.1. For the input, the sample also needs to be normalized.
2. Select the network structure. The number of layers and neurons in the hidden layer determine the network scale. The network scale is closely related to the performance learning effect. Large scale, large computing capacity, and may lead to over-fitting; but small scale may also lead to underfitting.
3. Select an initial weight and threshold. The initial value has an impact on the learning result. It is also important to select an appropriate initial value.
4. Incremental Learning and batch learning. The above algorithms and mathematical derivation are based on batch learning. Batch learning is suitable for offline learning and has good learning stability. incremental learning is used for online learning, it is sensitive to the noise of Input Samples and is not suitable for input modes with dramatic changes.
5. There are other options for incentive functions and error functions.
In general, there are many options for the BP algorithm, and there is usually a lot of room for optimization for specific training data.
Full text reference: http://blog.csdn.net/sealyao/article/details/6538361
BP (Back Propagation)The network was proposed by a team of scientists headed by Rumelhart and mccelland in 1986. It is a multi-layer feed-forward Network trained by the error inverse propagation algorithm and is one of the most widely used neural network models. The BP network can learn and store a large number of input-output mode ing relationships without revealing the mathematical equations that describe this ing relationship beforehand.
The structure of a neural network is as follows.
The topological structure of the BP neural network model includes the input layer, the hidden layer, and the output layer ). The number of neurons in the input layer is determined by the dimension of the sample attribute, and the number of neurons in the output layer is determined by the number of samples. The number of layers in the hidden layer and the number of neurons in each layer are specified by the user. Each layer contains several neurons, each of which contains a threshold value to change the activity of neurons. The arc in the network indicates the weight between the first layer of neurons and the second layer of neurons. Each neuron has input and output. The input and output at the input layer are the property values of the training samples.
Input to the hidden layer and output layer
Among them, it is the right to connect from unit I on the previous layer to Unit J; it is the output of unit I on the previous layer; it is the threshold value of Unit J.
The output of neurons in the neural network is calculated through the activation function. This function uses a symbolic representation unit to represent the neuron activity. Activation functions generally use simoid functions (or logistic functions ). The output of neurons is:
In addition, a neural network has the concept of learning rate (l), which usually takes the value between 0 and 1 and helps to find the global minimum. If the learning rate is too small, the learning will be slow. If the learning rate is too high, it may swing between inappropriate solutions.
The basic elements of the neural network are clearly explained. Let's take a look at the Learning Process of the BP algorithm:
Bptrain (){
Permission and threshold for network initialization.
While termination condition does not meet {
For samples, each training sample X {
// Forward Propagation Input
For hidden or output layer, each unit J {
;//
Compared with the previous layer I, the net input of the computing unit J
;//
Output of Computing Unit J
}
// Backward propagation Error
For output layer, each unit J {
;//
Calculation Error
}
For from the last to the first hidden layer, for each unit of the hidden layer J {
;//
K is the neuron in the next layer of J.
}
For Network
{
; // Value-added permission
; // Permission update
}
For Network Deviation
{
;
// Value-added Deviation
;//
Deviation update
}
}
}
The basic algorithm flow is:
1. initialize the network weight and neuron threshold (the simplest method is random initialization)
2. Forward Propagation: Calculate the input and output of hidden-layer neurons and output-layer neurons based on the formula.
3. Back Propagation: the weights and thresholds are modified based on the formula.
Until the termination conditions are met.
Note the following points in the algorithm:
1. There is a neuron error.
For the output-layer neurons, where, is the actual output of Unit J
Instead, J is based on the real output of the known class labels of the given training sample.
For hidden layer neurons, the connection from Unit K to Unit J in the next higher layer is the error of unit K.
The increment of the weight is, the increment of the threshold is, where the learning rate is.
The gradient descent algorithm is used for derivation. The premise of derivation is to ensure the minimum mean variance of the output unit ., P indicates the total number of samples, M indicates the number of neurons in the output layer. P indicates the actual output of samples, and M indicates the output of neural networks.
The idea of gradient descent is the derivative of the logarithm.
For the output layer:
Which is.
For hidden layers:
Where = is the formula for calculating the error of the hidden layer.
2. There may be multiple forms of termination conditions:
§ All values in the previous cycle are too small to be smaller than a specified threshold.
§ The percentage of samples incorrectly classified in the previous cycle is less than a threshold value.
§ The number of periods exceeds the predefined period.
§ The mean square error between the output value and the actual output value of the neural network is smaller than a certain threshold value.
Generally, the accuracy of the last termination condition is higher.
There are still some practical problems in the process of actually using the BP neural network:
1. sample processing. For the output, if there are only two types, the output is 0 and 1. The output is 0 and 1 only when it tends to be positive or negative infinity. Therefore, the conditions can be relaxed as needed. If the output is greater than 0.9, 1 is considered as the output, and 0 is considered as the output <0.1. For the input, the sample also needs to be normalized.
2. Select the network structure. The number of layers and neurons in the hidden layer determine the network scale. The network scale is closely related to the performance learning effect. Large scale, large computing capacity, and may lead to over-fitting; but small scale may also lead to underfitting.
3. Select an initial weight and threshold. The initial value has an impact on the learning result. It is also important to select an appropriate initial value.
4. Incremental Learning and batch learning. The above algorithms and mathematical derivation are based on batch learning. Batch learning is suitable for offline learning and has good learning stability. incremental learning is used for online learning, it is sensitive to the noise of Input Samples and is not suitable for input modes with dramatic changes.
5. There are other options for incentive functions and error functions.
In general, there are many options for the BP algorithm, and there is usually a lot of room for optimization for specific training data.