MLP (Multi-Layer Neural Network) Introduction

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Preface

I have been dealing with neural networks (ANN) for a long time. I used to learn the principles. I have done a BPN exercise. I have not summarized it systematically. I recently read the torch source code, I have a better understanding of MLP, and I have made a summary by writing what I learned!

Features of ANN

(1) high concurrency

Artificial Neural Networks are made up of many parallel combinations of the same simple processing unit. Although each unit has simple functions, a large number of simple unit operations are performed in parallel, the ability and effect to process information is astonishing.

(2) highly nonlinear global effect

A neural network system is composed of a large number of simple neurons. Each neuron receives a large number of inputs from other neurons and produces outputs that affect other neurons through nonlinear input and output relationships. In this way, the network is mutually constrained to implement nonlinear ing from the input state space to the output state space. The network evolution follows the global action principle, from the input state to the final state. From a global point of view, the overall network performance is not a simple superposition of local network performance, but a collective behavior. The computer follows the serial local routing operation principle, and each step of computing is closely related to the previous calculation, and it will affect the next step. The problem is gradually solved through algorithms.

(3) Good fault tolerance and Lenovo memory functions

Artificial neural networks use their own network structure to implement information memory. The information stored in the memory is stored in the weights between neurons. The stored information cannot be seen from a single weight, so it is a distributed storage method. This makes the network highly fault tolerant and supports information processing in clustering analysis, feature extraction, and mode recovery.

Artificial Neural Networks with strong adaptive and self-learning functions can obtain the weights and structures of networks through training and learning, and present strong self-learning ability and adaptability to the environment.

Ann Classification

BPN (Back Propagation net)

Here I will introduce the Back-Propagation Network BPN-the back-propagation error of the drive item. A back propagation neural network is a multi-layer network that trains weights for non-linear differential functions, and is a type of forward neural network. BP network is mainly used:

1) function approximation and Prediction Analysis: train a network using input vectors and corresponding output vectors to approach a function or predict unknown information;

2) Pattern Recognition: associate a specific output vector with the input vector;

3) classification: classifies input vectors in the appropriate way defined;

4) Data Compression: reduces the output vector dimension for transmission and storage.

For example, a three-layer BPN structure is as follows:

It consists of three layers: input layer, hidden layer, and output layer. The unit of each layer is connected to all units of the adjacent layer, and the unit of the same layer is not connected. After a pair of learning samples are provided to the network, the activation value of the neuron is transmitted from the input layer to the output layer through the middle layer, and the neurons in the output layer obtain the input response of the network. Next, adjust the connection weights layer by layer from the output layer to the input layer to reduce the target output and actual error.

Sensor (perception) Type

Linear ---- linear sensor

Tanh ---- hyperbolic tangent function

Sigmoid ---- hyperbolic function

Softmax ---- 1/(E (net) * E (WI * Xi-shift ))

Log-softmax ---- log (1/(E (net) * E (WI * XI )))

Exp ---- exponential function

Softplus ---- log (1 + E (WI * XI ))

Gradient Descent

Delta learning method

Incremental Gradient Descent

MLP Defects

1.The selection of the number of hidden nodes in the network is still a world challenge (Google,
Elsevier, CNKI );

2.The "trial-and-error" method must be used to stop the threshold, learning rate, and momentum constant, which is extremely time-consuming (hands-on experiment );

3.Slow learning speed;

4.It is easy to fall into local extreme values, and learning is insufficient.

An example of an application is to use BPN to implement the following problems. The programming language uses C #.

Static void main (string [] ARGs) { int ROW = 2; // number of training samples int n_in = 2; // number of input values int n_out = 1; // number of output values int n_hidden = 1; // number of hidden layer units double ETA = 0.3; // learning rate double Alfa = 0.9; // momentum item int [,] t_ex = new int [row, n_in + n_out]; // training sample // Initial Value Training sample t_ex [0, 0] = 1; t_ex [0, 1] = 0; t_ex [0, 2] = 1; t_ex [1, 0] = 0; t_ex [1, 1] = 1; t_ex [1, 2] = 0; double [] delta_out = new double [n_out]; // output unit error double [] delta_hidden = new double [n_hidden]; // hide the unit error double [] o_out = new double [n_out]; // output of the output unit double [] o_hidden = new double [n_hidden]; // output of the Hidden Unit double [,] w_out = new double [n_out, n_hidden + 1]; // weight of the output unit double [,] w_hidden = new double [n_hidden, n_in + 1]; // implicit unit weight double [,] delta_w_out = new double [n_out, n_hidden + 1]; // weight difference of the output unit double [,] delta_w_hidden = new double [n_hidden, n_in + 1]; // weight difference of implicit Unit // Initialize all network weights to 0.1 for (INT I = 0; I <n_out; I ++) { for (Int J = 0; j <n_hidden + 1; j ++) { w_out [I, j] = 0.1; delta_w_out [I, j] = 0; } for (INT I = 0; I <n_hidden; I ++) { for (Int J = 0; j <n_in + 1; j ++) { w_hidden [I, j] = 0.1; delta_w_hidden [I, j] = 0; } // iterative training bool over = true; int itera_time = 0; // number of iterations while (over) { // perform training for (INT I = 0; I <row; I ++) { // calculate the output of the hidden layer Unit double net = 0; for (Int J = 0; j <n_hidden; j ++) { net = w_hidden [J, 0]; // calculate the output value for (INT r = 1; r <n_in + 1; r ++) { net + = w_hidden [J, R] * t_ex [I, R-1]; } o_hidden [J] = 1.0/(1 + math. exp (-net); } // output of the computing output layer Unit for (Int J = 0; j <n_out; j ++) { net = w_out [J, 0]; for (INT r = 1; r <n_hidden + 1; r ++) { net + = w_out [J, R] * o_hidden [R-1]; } o_out [J] = 1.0/(1 + math. exp (-net); } // calculate the error items of the units in the output layer for (Int J = 0; j <n_out; j ++) { delta_out [J] = o_out [J] * (1-o_out [J]) * (t_ex [I, n_in + J]-o_out [J]); } // calculate the error items of hidden layer elements for (Int J = 0; j <n_hidden; j ++) { double sum_weight = 0; for (int K = 0; k <n_out; k ++) { for (INT h = 0; H <n_hidden + 1; H ++) { sum_weight + = delta_out [k] * w_out [K, H]; } delta_hidden [J] = o_hidden [J] * (1-o_hidden [J]) * sum_weight; } // update the weight of each network unit // update the weight of the output layer for (Int J = 0; j <n_out; j ++) { delta_w_out [J, 0] = ETA * delta_out [J] + Alfa * delta_w_out [J, 0]; w_out [J, 0] = w_out [J, 0] + delta_w_out [J, 0]; for (int K = 1; k <n_hidden + 1; k ++) { delta_w_out [J, k] = ETA * delta_out [J] * o_hidden [k-1] + Alfa * delta_w_out [j, k]; w_out [J, k] = w_out [j, k] + delta_w_out [j, k]; } // update the hidden layer weight for (Int J = 0; j <n_hidden; j ++) { delta_w_hidden [J, 0] = ETA * delta_hidden [J] + Alfa * delta_w_hidden [J, 0]; w_hidden [J, 0] = w_hidden [J, 0] + delta_w_hidden [J, 0]; for (int K = 1; k <n_in + 1; k ++) { delta_w_hidden [j, k] = ETA * delta_hidden [J] * t_ex [I, K-1] + Alfa * delta_w_hidden [J, k]; w_hidden [j, k] = w_hidden [j, k] + delta_w_hidden [j, k]; } // training 1000 end cycles itera_time ++; If (itera_time = 1000) over = false; } // output the weights after training console. writeline ("output layer weight:"); for (INT I = 0; I <n_out; I ++) { for (Int J = 0; j <n_hidden + 1; j ++) { console. writeline (w_out [I, j]); } console. writeline ("implicit layer weight:"); for (INT I = 0; I <n_hidden; I ++) { for (Int J = 0; j <n_in + 1; j ++) { console. writeline (w_hidden [I, j]); } }

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MLP (Multi-Layer Neural Network) Introduction

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

MLP (Multi-Layer Neural Network) Introduction

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support