Preface
I have been dealing with neural networks (ANN) for a long time. I used to learn the principles. I have done a BPN exercise. I have not summarized it systematically. I recently read the torch source code, I have a better understanding of MLP, and I have made a summary by writing what I learned!
Features of ANN
(1) high concurrency
Artificial Neural Networks are made up of many parallel combinations of the same simple processing unit. Although each unit has simple functions, a large number of simple unit operations are performed in parallel, the ability and effect to process information is astonishing.
(2) highly nonlinear global effect
A neural network system is composed of a large number of simple neurons. Each neuron receives a large number of inputs from other neurons and produces outputs that affect other neurons through nonlinear input and output relationships. In this way, the network is mutually constrained to implement nonlinear ing from the input state space to the output state space. The network evolution follows the global action principle, from the input state to the final state. From a global point of view, the overall network performance is not a simple superposition of local network performance, but a collective behavior. The computer follows the serial local routing operation principle, and each step of computing is closely related to the previous calculation, and it will affect the next step. The problem is gradually solved through algorithms.
(3) Good fault tolerance and Lenovo memory functions
Artificial neural networks use their own network structure to implement information memory. The information stored in the memory is stored in the weights between neurons. The stored information cannot be seen from a single weight, so it is a distributed storage method. This makes the network highly fault tolerant and supports information processing in clustering analysis, feature extraction, and mode recovery.
Artificial Neural Networks with strong adaptive and self-learning functions can obtain the weights and structures of networks through training and learning, and present strong self-learning ability and adaptability to the environment.
Ann Classification
BPN (Back Propagation net)
Here I will introduce the Back-Propagation Network BPN-the back-propagation error of the drive item. A back propagation neural network is a multi-layer network that trains weights for non-linear differential functions, and is a type of forward neural network. BP network is mainly used:
1) function approximation and Prediction Analysis: train a network using input vectors and corresponding output vectors to approach a function or predict unknown information;
2) Pattern Recognition: associate a specific output vector with the input vector;
3) classification: classifies input vectors in the appropriate way defined;
4) Data Compression: reduces the output vector dimension for transmission and storage.
For example, a three-layer BPN structure is as follows:
It consists of three layers: input layer, hidden layer, and output layer. The unit of each layer is connected to all units of the adjacent layer, and the unit of the same layer is not connected. After a pair of learning samples are provided to the network, the activation value of the neuron is transmitted from the input layer to the output layer through the middle layer, and the neurons in the output layer obtain the input response of the network. Next, adjust the connection weights layer by layer from the output layer to the input layer to reduce the target output and actual error.
Sensor (perception) Type
Linear ---- linear sensor
Tanh ---- hyperbolic tangent function
Sigmoid ---- hyperbolic function
Softmax ---- 1/(E (net) * E (WI * Xi-shift ))
Log-softmax ---- log (1/(E (net) * E (WI * XI )))
Exp ---- exponential function
Softplus ---- log (1 + E (WI * XI ))
Gradient Descent
Delta learning method
Incremental Gradient Descent
MLP Defects
1.The selection of the number of hidden nodes in the network is still a world challenge (Google,
Elsevier, CNKI );
2.The "trial-and-error" method must be used to stop the threshold, learning rate, and momentum constant, which is extremely time-consuming (hands-on experiment );
3.Slow learning speed;
4.It is easy to fall into local extreme values, and learning is insufficient.
An example of an application is to use BPN to implement the following problems. The programming language uses C #.
Static void main (string [] ARGs) <br/>{< br/> int ROW = 2; // number of training samples <br/> int n_in = 2; // number of input values <br/> int n_out = 1; // number of output values <br/> int n_hidden = 1; // number of hidden layer units </P> <p> double ETA = 0.3; // learning rate <br/> double Alfa = 0.9; // momentum item <br/> int [,] t_ex = new int [row, n_in + n_out]; // training sample </P> <p> // Initial Value Training sample <br/> t_ex [0, 0] = 1; <br/> t_ex [0, 1] = 0; <br/> t_ex [0, 2] = 1; <br/> t_ex [1, 0] = 0; <br/> t_ex [1, 1] = 1; <br/> t_ex [1, 2] = 0; </P> <p> double [] delta_out = new double [n_out]; // output unit error <br/> double [] delta_hidden = new double [n_hidden]; // hide the unit error </P> <p> double [] o_out = new double [n_out]; // output of the output unit <br/> double [] o_hidden = new double [n_hidden]; // output of the Hidden Unit </P> <p> double [,] w_out = new double [n_out, n_hidden + 1]; // weight of the output unit <br/> double [,] w_hidden = new double [n_hidden, n_in + 1]; // implicit unit weight </P> <p> double [,] delta_w_out = new double [n_out, n_hidden + 1]; // weight difference of the output unit <br/> double [,] delta_w_hidden = new double [n_hidden, n_in + 1]; // weight difference of implicit Unit <br/> // Initialize all network weights to 0.1 <br/> for (INT I = 0; I <n_out; I ++) <br/> {<br/> for (Int J = 0; j <n_hidden + 1; j ++) <br/>{< br/> w_out [I, j] = 0.1; <br/> delta_w_out [I, j] = 0; <br/>}< br/> for (INT I = 0; I <n_hidden; I ++) <br/>{< br/> for (Int J = 0; j <n_in + 1; j ++) <br/> {<br/> w_hidden [I, j] = 0.1; <br/> delta_w_hidden [I, j] = 0; <br/>}< br/> // iterative training <br/> bool over = true; <br/> int itera_time = 0; // number of iterations </P> <p> while (over) <br/> {<br/> // perform training <br/> for (INT I = 0; I <row; I ++) <br/>{< br/> // calculate the output of the hidden layer Unit <br/> double net = 0; <br/> for (Int J = 0; j <n_hidden; j ++) <br/> {<br/> net = w_hidden [J, 0]; <br/> // calculate the output value <br/> for (INT r = 1; r <n_in + 1; r ++) <br/> {<br/> net + = w_hidden [J, R] * t_ex [I, R-1]; <br/>}< br/> o_hidden [J] = 1.0/(1 + math. exp (-net); <br/>}< br/> // output of the computing output layer Unit <br/> for (Int J = 0; j <n_out; j ++) <br/>{< br/> net = w_out [J, 0]; <br/> for (INT r = 1; r <n_hidden + 1; r ++) <br/>{< br/> net + = w_out [J, R] * o_hidden [R-1]; <br/>}< br/> o_out [J] = 1.0/(1 + math. exp (-net); <br/>}< br/> // calculate the error items of the units in the output layer <br/> for (Int J = 0; j <n_out; j ++) <br/>{< br/> delta_out [J] = o_out [J] * (1-o_out [J]) * (t_ex [I, n_in + J]-o_out [J]); <br/>}< br/> // calculate the error items of hidden layer elements <br/> for (Int J = 0; j <n_hidden; j ++) <br/>{< br/> double sum_weight = 0; <br/> for (int K = 0; k <n_out; k ++) <br/>{< br/> for (INT h = 0; H <n_hidden + 1; H ++) <br/> {<br/> sum_weight + = delta_out [k] * w_out [K, H]; <br/>}< br/> delta_hidden [J] = o_hidden [J] * (1-o_hidden [J]) * sum_weight; <br/>}< br/> // update the weight of each network unit <br/> // update the weight of the output layer <br/> for (Int J = 0; j <n_out; j ++) <br/> {<br/> delta_w_out [J, 0] = ETA * delta_out [J] + Alfa * delta_w_out [J, 0]; <br/> w_out [J, 0] = w_out [J, 0] + delta_w_out [J, 0]; </P> <p> for (int K = 1; k <n_hidden + 1; k ++) <br/>{< br/> delta_w_out [J, k] = ETA * delta_out [J] * o_hidden [k-1] + Alfa * delta_w_out [j, k]; <br/> w_out [J, k] = w_out [j, k] + delta_w_out [j, k]; <br/>}< br/> // update the hidden layer weight <br/> for (Int J = 0; j <n_hidden; j ++) <br/> {<br/> delta_w_hidden [J, 0] = ETA * delta_hidden [J] + Alfa * delta_w_hidden [J, 0]; <br/> w_hidden [J, 0] = w_hidden [J, 0] + delta_w_hidden [J, 0]; <br/> for (int K = 1; k <n_in + 1; k ++) <br/> {<br/> delta_w_hidden [j, k] = ETA * delta_hidden [J] * t_ex [I, K-1] + Alfa * delta_w_hidden [J, k]; <br/> w_hidden [j, k] = w_hidden [j, k] + delta_w_hidden [j, k]; <br/>}</P> <p> // training 1000 end cycles <br/> itera_time ++; <br/> If (itera_time = 1000) over = false; <br/>}</P> <p> // output the weights after training <br/> console. writeline ("output layer weight:"); <br/> for (INT I = 0; I <n_out; I ++) <br/> {<br/> for (Int J = 0; j <n_hidden + 1; j ++) <br/>{< br/> console. writeline (w_out [I, j]); <br/>}< br/> console. writeline ("implicit layer weight:"); <br/> for (INT I = 0; I <n_hidden; I ++) <br/>{< br/> for (Int J = 0; j <n_in + 1; j ++) <br/>{< br/> console. writeline (w_hidden [I, j]); <br/>}< br/>}