1 What is a neural network
Artificial Neural Networks (Artificial Neural Networks, abbreviated as Anns) are also referred to as neural networks (NNs) or as connection models (Connection model), which mimic the behavior characteristics of animal neural networks, The mathematical model of distributed parallel information processing is presented. This kind of network relies on the complexity of the system, through the adjustment of the internal connections between a large number of nodes, so as to achieve the purpose of processing information.
2 Neural Network algorithm
The BP (back propagation) network was presented by a group of scientists led by Rumelhart and McCelland in 1986, and is a multi-layer Feedforward network trained by error inverse propagation algorithm, which is one of the most widely used neural network models at present. The BP network can learn and store a large number of input-output pattern mapping relationships without having to reveal the mathematical equations that describe the mapping relationship beforehand.
The BP neural network model topology includes the input layer, the hidden layer, and the output layer. The number of input layer neurons is determined by the dimension of the sample attributes, and the number of neurons in the output layer is determined by the number of sample categories. The number of layers in the hidden layer and the number of neurons per layer are specified by the user. The pattern is as follows:
Before you can understand this structure, you need to understand the perceptron first.
2.1 Perceptron
Suppose that sample X has three Boolean type attribute {X1,X2,X3} now to resolve the two classification problem for x. The target of classification is: when at least two of the three inputs are 0 o'clock, take-1, and vice versa, Y takes 1.
is one of the simplest neural network structures ——— perceptron. The perceptron contains two types of nodes: Several input nodes are used to represent input properties: An output node is used to provide model output. Nodes in a neural network structure are often called neurons or cells. In the Perceptron, each node is connected to the output node through a weighted chain. This weighted chain is used to simulate the strength of neural bond connections between neurons. As with the biological nervous system, training a perceptron model is equivalent to continually adjusting the weights of the chain until the input and output relationships of the training data can be fitted.
For this example, let's say we have three full-time 0.3, and the output node has a bias factor of 0.4. Then the output of the model calculates the working hours as follows
y={1 if 0.3x1+0.3x2+0.3x3-0.4>0
-1 if 0.3x1+0.3x2+0.3x3-0.4<0
In this way, we get a simple classifier.
Obviously, the output model of the Perceptron is
y= sign (w1x1+w2x2+). WNXN-T)
So how do you get the weights and the bias factor?
2.2 Learning Perceptron Model
During the training phase of the perceptron, we need to constantly adjust the weight parameter w to make the output consistent with the actual output of the training sample. The algorithm is outlined below
1 another d={(xi,yi) |I=1,2..N} for the training sample set
2 Initialize W with random values
3 Repeat
4 for each sample (Xi,yi)
5 Calculate Forecast Output Yi (k)
6 for each weight value WJ
7 Update Weights WJ (k+1) =wj (k) +t (Yi-yi (k)) Xij (1)
8 End
9 End
Until meet termination conditions
W (k) is the corresponding weight at the K-cycle. The parameter T is the learning rate leaning rates. Xij is the J attribute of the first sample. The Expediency value update formula can be seen intuitively that the new weight value WJ (k+1) equals the old weight WJ (k) plus a quantity proportional to the forecast error. The learning rate T is between 0 and 1.
Here our Perceptron model is linear, but also linear. This algorithm will not take effect if you are facing a linear non-point condition. (It is not possible to learn such a linear hyper-plane to tell the two samples apart).
Of course, in order to ensure convergence to an optimal solution, obviously the learning rate T should not be too large.
2.3 Multi-layer artificial neural network Artificial neural network The complex aspects of the perceptron are reflected in the following points: 1) The input and output layers of the network may contain multiple middle layers, which are called hidden layers, and the nodes of hidden layers are called hidden nodes. As shown in. , &NB Sp , &NB Sp in feedforward neural networks, the nodes of each layer are connected only to the nodes in the next layer. Perceptron is a single-layer feedforward neural network, because it has only one node layer-the output layer-for complex mathematical operations. In a recursive neural network, a node connected to the same layer or a layer of nodes is allowed to connect to the nodes in the previous layers. 2) In addition to symbolic functions, the network can also use other activation functions. such as linear function, S-type function, double-surface tangent function and so on. These activation functions allow the output values of the hidden nodes and output nodes to be linearly related to the input parameters. These additional complexities allow multi-layered neural networks to model more complex relationships between input and output variables. For the XOR problem mentioned in the example above, the instance can divide the input space into its respective classes with two hyper-planes &NB Sp   ; , &NB Sp , &NB Sp the because the Perceptron can only construct a hyper-plane, it cannot find the optimal solution 。 The problem can be solved by using a two-layer feedforward neural network. , &NB Sp , &NB Sp Intuitively, we can think of each hidden node as a perceptron, each of which constructs two hyper-plane in Germany one, the output node simply synthesizes the results of each perceptron, Get the decision boundaries above. The feasibility of this method is described in (http://blog.csdn.net/pennyliang/article/details/6058645) to learn the full-time model of Ann, an effective algorithm is needed. The algorithm can converge to the positive when the training data is sufficient.The correct solution. A neural network weights learning method based on gradient descent is described below. 2.3.1 Learning Ann model Ann Learning algorithm is designed to determine a set of weights w, minimizing the squared and: of errors &nb Sp , &NB Sp Error squared and depends on W, because the Predictive class Y ' is a function that gives weights to hidden nodes and output nodes. In most cases, the output of the Ann is a nonlinear function of the parameter due to the selection of the activation function. In this case, the greedy algorithm based on gradient descent method can be used to solve the optimization problem effectively. The weight update formula used by the gradient descent method can be written in: &NBSP ; &NBSP;&NBS The second in p; is that weights should be increased along the direction of decreasing the overall error term. However, because the error function is nonlinear, the gradient descent method may fall into the local minimum value.
The gradient descent method can be used to learn the weights of output nodes and hidden nodes in neural networks. You can use a technique called reverse propagation to solve the problem. Each iteration of the algorithm consists of two phases: the forward and the back stages. In the forward phase, the value of each neuron in the network is computed using the weights obtained from the previous iteration. The calculation is forward, that is, the output of the first k neuron is computed, and then the output of the K+1 layer is computed. In the back phase, the weights are updated in the opposite direction, i.e. the weights of the k+1 layer are updated first, and the weights of the K-level are updated. Using the inverse propagation method, the error of the k+1 neuron can be used to estimate the error of the K-layer neurons. Design problems in 2.3.2ANN learning before training a neural network to learn classification tasks, you should consider the following design question 1) The number of input layer nodes. Typically the number of attributes for the sample 2) the output Layer node count. For a 2-class problem, an output node is sufficient. For K-Class problems, a K output node is required. 3) Select the network topology. For example, the number of hidden layers and hidden nodes, feedforward or recursive network structure. 4) Initialize weights and bigotry. Often randomly. 5) Remove the training samples with missing values, or replace them with the most reasonable values.
Reference "Introduction to Data Mining" 5.4 Artificial neural networks
Deep Learning Preparatory Course: Neural network