1. Neural networks

Roughly speaking, a neural network is a set of connected input/output units. Each connection is associated with a weight. In the learning phase, by adjusting these weights, we can predict the correct class labels of input tuples for learning. Due to the connection between units, neural network learning is also called connectionist learning ).

Neural networks require a long training time, so they are more suitable for applications with long enough training time. It requires a large number of parameters, such as the network topology or "structure", which are usually determined by experience. Neural networks are often criticized for their poor interpretability. However, the advantages of neural networks include their high tolerance to noise data and their ability to classify untrained data models. You can also use them when you lack the knowledge about the relationship between attributes and classes. Unlike most decision tree algorithms, they are ideal for continuous value input and output.

2. Multi-layer feed-forward neural network

The back propagation algorithm is used to learn from multiple layers of feed-forward neural networks. It iteratively learns a set of weights used for prediction of tuples and class labels. A multilayer feed-forward neural network consists of an input layer, multiple active hidden layers, and an output layer.

Each layer consists of several units. The input of the network corresponds to the observed attribute of each training tuples. The input colleagues provide the units that constitute the input layer. These inputs are Weighted through the input layer and are also provided to the second layer of "neuron-like" called the hidden layer. The output of the hidden layer unit can be input to another hidden layer, and so on. The number of hidden layers is arbitrary. Although there is usually only one layer in practice, theoretical verification shows that two layers of neural networks can simulate any non-linear structure. The weighted output of the hidden layer serves as the unit input to form the output layer. The output layer publishes network predictions for the given tuples.

The unit name of the input layer is used as the input unit. Units of the hidden layer and output layer are sometimes called neural nodes or output units due to their symbolic biological basis. Therefore, we call it a two-layer neural network. (Do not calculate the input layer, because it is only used to pass the input value to the next layer). Similarly, networks that contain two hidden layers are called Layer-3 neural networks. The network is feed-forward, because the weights are not sent back to the input unit or the output unit at the previous layer. The network is fully connected, if each unit provides input to each unit at the next layer.

Each output unit uses the weighted sum of the output from the previous layer as the input. It applies a non-linear (activation) function to weighted input. Multi-layer feed-forward neural networks can use class prediction as the input for nonlinear combination modeling. From a statistical point of view, they perform nonlinear regression. Given enough hidden units and enough training samples, a multi-layer feed-forward neural network can approach any function.

3. Define the network topology

Before starting training, you must determine the network topology, indicating the number of units in the input layer, the number of hidden layers, the number of units in each hidden layer, and the number of units in the output layer.

Normalization of input measurements for each attribute in the training tuples will help accelerate the learning process. Generally, the input values are normalized so that they fall between 0.0 and 1.0. The discrete value attribute can be re-encoded so that each domain value has an input unit.

Neural networks can be used for classification (predicting the class label of a given tuples) and numerical prediction (predicting continuous value output ). For classification, an output unit can be used to represent two classes (value 1 indicates one class, and value 0 indicates another class ). If there are more than two classes, each class uses one output unit.

4. Backward propagation

The training tuples are processed iteratively to compare the network prediction of each tuples with the actual known target values. The target value can be a known class label (for classification) or continuous value (for prediction) of the training tuples ). For each training sample, modify the weight to minimize the mean deviation between the network prediction and the actual target value. This modification is implemented "backward", that is, from the output layer to the first hidden layer through each hidden layer (thus called backward propagation ). Although not guaranteed, weights will eventually converge and the learning process stops.

Algorithm: Backward propagation. Uses backward propagation algorithms to learn classification or prediction neural networks

Input:

D: a dataset composed of training tuples and their associated target values;

L: learning rate;

Network: multi-layer feed-forward network.

Output: The trained neural network.

Method:

Initialize the ownership and bias of the network.

While termination condition does not meet {

For D, each training tuples X {

For each input layer Unit j {

Oj = Ij;

For hide or each cell in the output layer j {

Ij = & Sigma; wij * Oj + & theta; j; // The Net input of the previous layer I and the computing unit j.

Oj = 1/(1 + exp (-Ij); // use the logistic function or the output of the computing unit j of the S (sigmoid) function. This function is very promising, converts the number in R to (0, 1). The Function image is similar to the growth curve of middle school microorganisms.

}

// The following logic describes the backward propagation error.

For each unit j in the output layer

Errj = Oj * (1-Oj) * (Tj-Oj); // calculate the error of the output layer.

For from the last to the first hidden layer, for each unit j of the hidden layer

Errj = Oj * (1-Oj) * & Sigma; Errk * wjk; // calculate the error of the next high-level k.

For network each permission wij {

& Delta; Wij = (L) * Errj * Oi; // Weight increment

Wij = Wij + & Delta; Wij; // update weight

}

Every bias in for network & Theta; ij {

& Delta; & Theta; j = (L) * Errj; // bias increment

& Theta; j = & Theta; j + & Delta; & Theta; j; // bias update

}

}

}

}

Condition of termination: the training stops. If

All & Delta; Wij values in the previous cycle are too small, smaller than a specified threshold, or

The percentage of tuples incorrectly classified in the previous cycle is less than a certain threshold, or

Exceeds the specified period.

5. Validity

The validity of the calculation depends on the time used by the training network. Given | D | tuples and w weights, the O (| D | * w) time is required for each cycle. However, in the worst case, the number of cycles may be the index of the number of input element groups n. In practice, the practice required for network convergence is very uncertain. There are some techniques to speed up training. For example, a simulated annealing technique can be used to ensure optimal convergence to the whole drama.

6. How to classify

In order to classify the unknown tuples X, the tuples are input to the trained network to calculate the net input and output of each unit. (Errors do not need to be calculated or their backward propagation). If each class has an output node, the node with the highest output value determines the prediction class label of X. If there is only one output node, if the output value is greater than or equal to 0.5, it can be regarded as a positive class, and if the output value is less than 0.5, it can be regarded as a negative class.

Classification Summary of backward propagation neural networks