Artificial neural Network (ANN) is a mathematical model of distributed parallel information processing, which mimics the behavioral characteristics of animal neural networks. This kind of network relies on the complexity of the system, by adjusting the connections between the large number of nodes, so as to achieve the purpose of processing information, and has the ability of self-learning and self-adaptation. This article mainly introduced the Neural Network Theory Foundation and the Python realization elaboration, has the certain reference value, needs the friend may refer under, hoped can help everybody.
First, multilayer feedforward neural network
The multilayer Feedforward Neural network consists of three parts: the output layer, the hidden layer and the output layer, each layer is composed of elements;
The input layer is passed by the instance eigenvector of the training set, passing through the weight of the connecting node into the next layer, the output of the previous layer is the input of the next layer, the number of hidden layers is arbitrary, the input layer is only one layer, and the output layer is only one layer;
Besides the input layer, the number of layers of hidden layer and output layer and N, the neural network is called N-layer neural network, such as 2-layer neural network;
A weighted sum of one layer is used to transform the output according to the nonlinear equation. Theoretically, if there are enough hidden layers and large enough training sets, any equation can be simulated;
Ii. Designing neural network structures
Before using neural networks, it is necessary to determine the number of layers of the neural network and the number of units per layer.
In order to accelerate the learning process, eigenvectors usually need to be normalized between 0 and 1 before the input layer is passed in.
Discrete variables can be encoded into a value that can be assigned to each input unit corresponding to a characteristic value
For example: Eigenvalue a May go to three values (A0,A1,A2), then 3 input units can be used to represent an
If a=a0, then the cell value of A0 is 1, the remainder takes 0;
If A=A1, then the cell value of A1 is 1, the remainder takes 0;
If A=A2, then the cell value of A2 is 1, the remainder takes 0;
The neural network solves both the classification (classification) problem and the regression (regression) problem. For classification problems, if there are two classes, you can use an output unit (0 and 1) to represent two classes, and if the extra two classes, each category is represented by an output unit, so the number of units in the output layer is usually the number of categories.
There is no clear rule to design the best number of hidden layers, generally according to the experimental test error and accuracy rate to improve the experiment.
Three, cross-validation method
How to calculate the accuracy rate? The simplest way is through a set of training sets and test sets, the training set through training to get the model, the test set input model to get the test results, the test results and test set of the real label to compare, get accurate rate.
A common approach in machine learning is the cross-validation approach. A set of data is not divided into 2 parts, possibly divided into 10 parts,
1th time: 1th as a test set, the remaining 9 as a training set;
2nd time: 2nd as a test set, the remaining 9 as a training set;
......
So after 10 training, get 10 groups of accuracy, the 10 groups of data averaged to obtain the average accuracy of the results. Here 10 is a special case. In general, the data is divided into K-parts, called the algorithm is k-foldcrossvalidation, that is, each time the selection of a part of K as a test set, the remaining k-1 as a training set, repeat K times, the final average accuracy rate, is a more scientific and accurate method.
Four, BP algorithm
The instance of the training set is processed by iteration;
The difference between the predicted value and the real value after the neural network is compared.
The inverse direction (from the output layer and the hidden layer and the input layer) to minimize the error, to update the weight of each connection;
4.1, the algorithm detailed introduction
Input: Data set, learning rate, a multilayer neural network architecture;
Output: A well-trained neural network;
Initialize weights and biases: random initialization between 1 and 1 (or other), with one bias per unit; For each training instance x, perform the following steps:
1. Forward transmission from the input layer:
Combined with a neural network for analysis:
From the input layer to the hidden layer:
From the hidden layer to the output layer:
A summary of two formulas can be obtained:
IJ is the current layer cell value, OI is the cell value of the previous layer, wij is between two layers, connecting two cell values of the weight value, Sitaj for each layer of the bias value. We want to transform the output of each layer into a non-linear conversion, as follows:
The current layer output is ij,f as a non-linear conversion function, also known as an activation function, defined as follows:
That is, the output of each layer is:
This allows the output value of each layer to be obtained by entering the value forward.
2, according to the error reverse transmission for the output layer: where TK is the true value, OK is the predicted value
For hidden layers:
Weight update: Where L is the learning rate
Favor Update:
3. Termination conditions
The emphasis on the update is lower than a certain threshold value;
The predicted error rate is lower than a certain threshold value;
To achieve a predetermined number of cycles;
4. Nonlinear transformation function
The above mentioned nonlinear conversion function f, in general, can be used in two kinds of functions:
(1) tanh (x) function:
Tanh (x) =sinh (x)/cosh (x)
Sinh (x) = (exp (x)-exp (-X))/2
Cosh (x) = (exp (x) +exp (-X))/2
(2) Logical function, which is used in this paper is the logical function
The python implementation of BP neural network
You need to import the NumPy module first
Import NumPy as NP
Defining a nonlinear conversion function, as it is also necessary to use the derivative form of the function, defines
def tanh (x): return Np.tanh (x) def tanh_deriv (x): return 1.0-np.tanh (x) *np.tanh (x) def logistic (x): Return 1/(1 + np.exp (-X)) def logistic_derivative (x): return Logistic (x) * (1-logistic (x))
The design of the BP Neural network (several layers, the number of units per layer), the use of object-oriented, mainly to choose which nonlinear function, as well as the initialization of weights. Layers is a list that contains the number of units in each layer.
Class Neuralnetwork: def __init__ (self, layers, activation= ' tanh '): "" " :p Aram layers:a list containing The number of units in each layer. Should is at least the values :p Aram activation:the activation function to be used. Can be "logistic" or "Tanh" "" " if activation = = ' Logistic ': self.activation = Logistic Self.activation_deriv = logistic_derivative elif activation = = ' Tanh ': self.activation = Tanh Self.activation_deriv = Tanh_deriv self.weights = [] for I in range (1, Len (layers)-1): Self.weights.append ((2*np.random.random ((layers[i-1] + 1, layers[i] + 1))-1) *0.25 ( Np.random.random ((Layers[i] + 1, layers[i + 1]))-1) *0.25)
Implementation algorithm
def fit (self, x, y, learning_rate=0.2, epochs=10000): X = np.atleast_2d (x) temp = Np.ones ([x.shape[0], x.shape[1 ]+1]) temp[:, 0:-1] = x x = temp y = Np.array (y) for K in range (epochs): i = Np.random.randint (x.shape [0]) A = [X[i]] for L in range (Len (self.weights)): a.append (Self.activation (Np.dot (a[l), self.weights[l])) error = Y[i]-a[-1] deltas = [ERROR * SELF.ACTIVATION_DERIV (A[-1])] for L in range (Len (a)-2, 0,-1):
deltas.append (Deltas[-1].dot (self.weights[l). T) *self.activation_deriv (A[l]) deltas.reverse () for I in range (len (self.weights)): layer = Np.atleast_2d (A[i]) delta = np.atleast_2d (Deltas[i]) self.weights[i] + = learning_rate * layer. T.dot (Delta)
Implementing predictions
Def predict (self, x): x = Np.array (x) temp = Np.ones (x.shape[0]+1) temp[0:-1] = x a = temp for l in RA Nge (0, Len (self.weights)): a = Self.activation (Np.dot (A, self.weights[l])) return a
We give a set of numbers to make predictions, and the program files above us save the name BP
From BP import Neuralnetworkimport numpy as np nn = Neuralnetwork ([2,2,1], ' tanh ') x = Np.array ([[0,0], [0,1], [1,0], [all] ]) y = Np.array ([1,0,0,1]) Nn.fit (x,y,0.1,10000) for i in [[0,0], [0,1], [1,0], [[]]: print (i, nn.predict (i))
The results are as follows:
([0, 0], array ([0.99738862])) ([0, 1], array ([0.00091329])) ([1, 0], array ([0.00086846])) ([1, 1], array ([0.99751259]))