Transfer from http://www.cnblogs.com/heaad/archive/2011/03/07/1976443.html
The main contents of this paper include: (1) Introduce the basic principle of neural network, (2) Aforge.net the method of realizing Feedforward neural Network, (3) Matlab to realize the method of Feedforward neural network.
Section 0 section, introduction example
In this paper, Fisher's Iris data set is used as a test data set of neural network program. The Iris dataset can be found in Http://en.wikipedia.org/wiki/Iris_flower_data_set. Here is a brief introduction to the iris DataSet:
There are a number of iris flowers, known as the iris flower can be divided into 3 varieties, it is necessary to classify them. The length of calyx, calyx width, petal length and petal width of iris flower of different species vary. We present a number of known species of iris flower calyx length, calyx width, petal length, petal width data.
One solution is to use existing data to train a neural network as a classifier.
If you just want to use C # or MATLAB to quickly implement a neural network to solve your problem, or already understand the fundamentals of neural networks, skip to section Two-neural network implementations.
first section, Neural network Fundamentals
1. Artificial Neural element (Artificial Neuron) model
Artificial neurons are the basic elements of neural networks, and their principles can be expressed as:
Figure 1. Artificial neural element model
The X1~XN is an input signal from other neurons, wij represents the connection weights from neuron j to neuron I,θ represents a threshold (threshold), or is called bias (bias). The relationship between the output of the neuron I and the input is expressed as:
Yi represents the output of neuron I, the function f is called the activation function (Activation functions) or the transfer function (Transfer functions), net is called net activation (NET Activation). If the threshold is regarded as the weight wi0 of one of the input x0 of neuron I, the above formula can be simplified to:
If the input vector is represented by x, the weight vector is represented by W:
X = [x0, x1, x2, ..., xn]
The output of the neuron can be expressed in the form of a vector multiplication:
If the net activation net of the neuron is positive, the neuron is said to be active or excited (fire), and if net activation net is negative, the neuron is suppressed.
This "threshold weighted sum" neuron model in Figure 1 is called the m-p model (Mcculloch-pitts models), also known as a processing unit of the Neural network (PE, processing Element).
2. Common activation functions
The selection of activation function is an important link in the process of constructing neural network, the following is a brief introduction to the commonly used activation functions.
(1) linear functions (Liner function)
(2) bevel functions (Ramp function)
(3) threshold Functions (Threshold function)
The above 3 activation functions are linear functions, and the following is a description of two commonly used nonlinear activation functions.
(4) S - shape Functions (Sigmoid function)
The function's Guide function:
(5) bipolar S - shape function
The function's Guide function:
The image of the S-shape function and the bipolar S-shape function is as follows:
Figure 3. S-shape function and bipolar s-shape function image
The main difference between the S-shape function and the S-shape function is the value range of the function, the value of the bipolar S-shape function is ( -1,1), and the S-shape function domain is (0,1).
Because the S-shape function and the bipolar S-shape function are both conductive (the derivative function is a continuous function), it is suitable for use in the BP neural network. (BP algorithm requires activation function to be guided)
3. Neural network model
Neural networks are networks that are interconnected by a large number of neurons. According to the interconnection of neurons in the network, the common network structure can be divided into the following 3 categories:
(1) feedforward Neural network ( feedforward neural Networks)
Feedforward networks are also referred to as forward networks. This kind of network only has the feedback signal in the training process, but in the classification process the data can only forward, until arrives the output layer, the layer does not have the backward feedback signal, therefore is called the Feedforward network. The perceptual Machine (perceptron) and BP neural network belong to Feedforward network.
Figure 4 is a 3-layer feedforward neural network, where the first layer is the input unit, the second layer is called the hidden layer, the third layer is called the output layer (the input unit is not a neuron, so the figure has 2 layers of neurons).
Figure 4. Feedforward Neural Networks
For a 3-layer feedforward neural network n, if x is the input vector of the network, W1~W3 represents the connection weight vector of each layer of the network, and F1~F3 represents the activation function of the 3 layer of the neural network.
Then the output of the first neuron of the neural network is:
O1 = F1 (XW1)
The output from the second layer is:
O2 = F2 (F1 (XW1) W2)
The output of the output layer is:
O3 = F3 (F2 (F1 (XW1) W2) W3)
If the activation function F1~F3 selects the linear function, then the output O3 of the neural network will be the linear function of the input x. Therefore, to make the approximation of the higher function, the appropriate nonlinear function should be chosen as the activation function.
(2) feedback Neural network ( Feedback neural Networks)
Feedback neural network is a kind of neural network which has feedback connection from output to input, its structure is much more complex than Feedforward network. Typical feedback neural networks are: Elman networks and Hopfield networks.
Figure 5. Feedback Neural Network
(3) self-Organizing Network (SOM, self-organizing neural Networks)
Self-organizing neural network is a non-tutor learning network. It automatically changes the parameters and structure of the network by self-organizing and adaptively searching for the intrinsic laws and intrinsic properties of the samples.
Figure 6. Self-Organizing Network
4. How neural networks work
The operation process of neural network is divided into two states: learning and working.
(1) learning State of neural networks
Network learning mainly refers to the use of learning algorithms to adjust the connection between neurons, so that the network output more in line with the actual. The learning algorithms are divided into two categories: Tutor Learning (supervised learning) and non-tutor learning (unsupervised learning) .
a mentor Learning algorithm feeds a set of training sets (training set) into the network, adjusting connection rights based on the difference between the actual output of the network and the expected output. The main steps to having a mentor learning algorithm include:
1) Take a sample from the sample set (AI,BI);
2) Calculate the actual output of the network o;
3) seeking d=bi-o;
4) According to D adjustment weight matrix W;
5) Repeat the process for each sample until the error does not exceed the specified range for the entire sample set.
BP algorithm is a kind of excellent learning algorithm with tutor.
No tutor learns the statistical characteristics contained in the collection of samples and is stored in the network in the form of connection rights between neurons.
Hebb Learning Law is a classical non-tutor learning algorithm.
(2) working state of Neural Networks
The connection right between neurons is constant, and neural networks are used as classifiers and predictors.
The following is a brief introduction to the Hebb learning rate and Delta learning rules.
(3) no tutor learning algorithm:Hebb Learning rate
The core idea of the Hebb algorithm is that when two neurons are in a state of excitation, the connection between the two is strengthened, otherwise it is weakened.
In order to understand the Hebb algorithm, it is necessary to introduce the reflex test briefly. Pavlov's reflex test: Every time a dog is given a bell before it is fed, the dog will link the bell to the food. The dog will drool if it rings but does not give food.
Figure 7. The conditioned reflex experiment of Pavlov
Inspired by the experiment, Hebb's theory argues that the link between neurons excited at the same time is enhanced. For example, when a neuron is excited when the bell rings, and at the same time the presence of the food stimulates another nearby neuron, the connection between the two neurons is intensified, thus remembering that there is a connection between the two things. Conversely, if two neurons are not always excited synchronously, the connection between them will be weaker.
Hebb Learning Law can be expressed as:
where Wij represents the connection right of neuron j to neuron I, Yi and YJ are the outputs of two neurons, and a is a constant for learning speed. If Yi and yj are simultaneously activated, that is, Yi and yj are both positive, then the wij will increase. If Yi is activated and the YJ is suppressed, i.e. Yi is positive yj is negative, then Wij will become smaller.
(4) Tutorial Learning algorithm:Delta Learning rules
Delta Learning Rules is a simple tutor learning algorithm, which adjusts the connection right according to the actual output of the neuron and the desired output, and its mathematical representation is as follows:
Wherein WIJ represents the connection right of the neuron J to neuron I, DI is the desired output of neuron I, Yi is the actual output of neuron I, XJ denotes the neuron J state, if the neuron J is active, XJ is 1, and if it is suppressed, XJ is 0 or 1 (depending on the activation function). A is a constant that represents the speed of learning. Assuming that Xi is 1, if di is greater than Yi, then wij will increase, if di is smaller than Yi, then Wij will be smaller.
The delta rule simply says that if the actual output of the neuron is larger than the desired output, the weights of all connections that are positive are reduced, and the weights of all connections with negative inputs are increased. Conversely, if the actual output of the neuron is smaller than expected output, increase the weight of all connections with positive input, and reduce the weight of all connections with negative input. The magnitude of this increase or decrease is calculated based on the above equation.
(5) Learning algorithms with mentors: BP algorithm
Feedforward Neural Networks using BP learning algorithms are often referred to as BP networks.
Figure 8. Three-layer BP neural network structure
BP Network has strong nonlinear mapping ability, and a 3-layer BP neural network can achieve approximation to any nonlinear function (according to Kolrnogorov theorem). A typical 3-layer BP neural network model is shown in 7.
The learning algorithm of BP network occupies a large space, and I intend to introduce it in the next article.
The second section, neural network implementation
1. Data preprocessing
Before training the neural network, it is necessary to preprocess the data, and an important preprocessing method is normalization processing. The following is a brief introduction to the principle and method of normalization processing.
(1) What is normalization?
Data normalization is the mapping of data to [0,1] or [ -1,1] intervals or smaller intervals, such as (0.1,0.9).
(2) Why should normalization be processed?
<1> input data units are not the same, some of the data may be particularly large, resulting in a slow convergence of neural networks, long training time.
Input with a large range of <2> data can be too large in the pattern classification, and the input with small data range may be small.
<3> because the value of the activation function of the neural network output layer is limited, it is necessary to map the target data of the network training to the domain of the activation function. For example, if the output layer of the neural network uses the S-shape activation function, because the value of the S-shape function is limited to (0,1), that is, the output of the neural network can only be limited to (0,1), so the output of the training data will be normalized to the [0,1] interval.
The <4>s activation function is flat outside the (0,1) interval, and the sensitivity is too small. For example the S-shape function f (X) when the parameter a=1, F (100) and F (5) are only 0.0067.
(3) normalization algorithm
A simple and fast normalization algorithm is a linear conversion algorithm. There are two common forms of linear conversion algorithms:
<1>
y = (x-min)/(Max-min)
where min is the minimum value of x, Max is the maximum value of x, the input vector is x, and the normalized output vector is Y. The data is normalized to the [0, 1] interval, which is applied when the activation function takes the S-shape function (the range is (0,1)).
<2>
y = 2 * (x-min)/(max-min)-1
This formula normalized the data to the [-1, 1] interval. This applies when the activation function takes a bipolar S-shape function (the range is ( -1,1)).
(4) Matlab Data Normalization processing function
In MATLAB, the normalized processing data can be used Premnmx, Postmnmx, tramnmx 3 of these functions.
<1> Premnmx
Syntax: [PN,MINP,MAXP,TN,MINT,MAXT] = Premnmx (p,t)
Parameters:
Matrix of Pn:p matrix by row Normalization
Minp,maxp:p minimum, maximum value for each row of the matrix
Matrix of Tn:t matrix by row Normalization
Mint,maxt:t minimum, maximum value for each row of the matrix
function: The matrix p,t normalized to [ -1,1], mainly used for normalization processing training data set.
<2> Tramnmx
Syntax: [PN] = Tramnmx (P,MINP,MAXP)
Parameters:
Minp,maxp:premnmx function calculates the minimum and maximum value of a matrix
PN: Normalized matrix
Function: Mainly used for normalization of the input data to be classified.
<3> Postmnmx
Syntax: [p,t] = Postmnmx (PN,MINP,MAXP,TN,MINT,MAXT)
Parameters:
The MINP,MAXP:PREMNMX function calculates the minimum value, maximum value, of the P-matrix per row
The MINT,MAXT:PREMNMX function calculates the minimum value, maximum value, of the T-matrix per row
Function: The scope of the matrix Pn,tn map before the regression of the processing. The POSTMNMX function is mainly used to map the output of neural networks to the data range before regression.
2. using Matlab to implement neural networks
Using MATLAB to establish a feedforward neural network will mainly use the following 3 functions:
NEWFF: Feedforward Network creation function
Train: Training A neural network
SIM: Using the network for emulation
The following is a brief introduction to the use of these 3 functions.
(1) newff function
<1>NEWFF function Syntax
The NEWFF function parameter list has a number of optional parameters, which can be referenced in MATLAB's help documentation, which describes a simple form of the NEWFF function.
Syntax: NET = NEWFF (A, B, {C}, ' Trainfun ')
Parameters:
A: A nx2 matrix, the first row element is the minimum and maximum value of the input signal XI;
B: A K-Koriyuki vector, whose elements are the number of nodes in the network;
C: A k-dimensional string line vector, each component is the corresponding layer neuron activation function ;
Trainfun: The training algorithm used for learning rules.
<2> Common activation Functions
The usual activation functions are:
a) linear functions (Linear transfer function)
f (x) = X
The string for the function is ' Purelin '.
b) logarithmic S-shaped transfer functions (logarithmic sigmoid transfer function)
The string for the function is ' logsig '.
c) hyperbolic tangent S-shape function (hyperbolic tangent sigmoid transfer function)
This is the bipolar S-shape function mentioned above.
The string for the function is ' tansig '.
The Toolbox\nnet\nnet\nntransfer subdirectory in the installation directory of MATLAB has a definition description of all activation functions.
<3> Common training Functions
The common training functions are:
Traingd: Gradient Descent bp training function (Gradient descent backpropagation)
TRAINGDX: Gradient Descent adaptive learning rate training function
<4> Network configuration Parameters
Some important network configuration parameters are as follows:
Net.trainparam.goal: Target error of neural network training
Net.trainparam.show: Shows the period of intermediate results
Net.trainparam.epochs: Maximum number of iterations
NET.TRAINPARAM.LR: Learning Rate
(2) Train function
Network Training learning function.
Syntax: [NET, tr, Y1, E] = train (NET, X, Y)
Parameters:
X: Network actual input
Y: Network should have output
TR: Training Tracking information
Y1: Network actual output
E: Error Matrix
(3) Sim function
Syntax: Y=sim (NET,X)
Parameters:
NET: Network
X: Input to the network Kxn matrix, where K is the number of network inputs, n is the number of data samples
Y: Output matrix qxn, where Q is the number of network outputs
(4) Matlab BP Network example
I divided the iris dataset into 2 groups of 75 samples each with 25 samples per flower in each group. One group served as a training sample for the above procedure and the other as a test sample. For the convenience of training, 3 categories of flowers are numbered as three-to-three.
Use this data to train a 4 input (corresponding to 4 characteristics respectively), 3 output (respectively, to the probability that the sample belongs to a variety of the potential size) of the forward network.
The MATLAB program is as follows:
% Read training data
[F1,f2,f3,f4,class] = Textread (' trainData.txt ', '%f%f%f%f%f ', 150);
Percent Normalization of characteristic values
[Input,mini,maxi] = Premnmx ([F1, F2, F3, F4] ');
% Construction Output matrix
s = Length (class);
Output = Zeros (s, 3);
For i = 1:s
Output (I, Class (i)) = 1;
End
% Create a neural network
NET = NEWFF (Minmax (Input), [3], {' Logsig ' Purelin '}, ' Traingdx ');
% Set Training parameters
Net.trainparam.show = 50;
Net.trainparam.epochs = 500;
Net.trainparam.goal = 0.01;
NET.TRAINPARAM.LR = 0.01;
% Start Training
NET = train (NET, input, output ');
% Read test data
[T1 t2 t3 t4 c] = textread (' testData.txt ', '%f%f%f%f%f ', 150);
% of test data normalized
Testinput = Tramnmx ([T1,t2,t3,t4] ', MinI, MaxI);
% emulation
Y = SIM (NET, testinput)
% statistics identify correct rate
[S1, s2] = size (Y);
Hitnum = 0;
For i = 1:s2
[M, Index] = max (Y (:, i));
if (Index = = C (i))
Hitnum = Hitnum + 1;
End
End
sprintf (' Recognition rate is%3.3f%% ', ' hitnum/s2 ')
The recognition rate of the above procedures is stable at about 95%, training 100 times to achieve convergence, the training curve as shown:
Figure 9. Training performance
(5) effect of parameter setting on performance of neural network
In my experiment, by adjusting the number of nodes in the hidden layer, I chose the inactive activation function to set different learning rates.
<1> number of hidden layer nodes
The number of hidden layer nodes has little effect on the recognition rate, but the number of nodes increases the computational capacity and makes training slow.
<2> selection of the activation function
The activation function has a significant effect on the recognition rate or the rate of convergence. The precision of S-shape function is much higher than that of linear function in the approximation of the higher curve, but the computational amount is much larger.
<3> the choice of learning rate
The learning rate affects the speed of network convergence and the convergence of networks. The learning rate setting is small to ensure the convergence of the network, but the convergence is relatively slow. On the contrary, the high learning rate setting may make the network training not convergent and affect the recognition effect.
3. using aforge.net to implement neural networks
(1) aforge.net Introduction
Aforge.net is a C # implementation of the open-source architecture for AI, computer vision and other fields. The Neuro directory under the Aforge.net source code contains a neural network class library.
Aforge.net Home: http://www.aforgenet.com/
Aforge.net Code Download: http://code.google.com/p/aforge/
The class diagram for the Aforge.neuro project is as follows:
Figure 10. Class diagram of Aforge.neuro class library
Here are a few of the basic classes in Figure 9:
Abstract base class for neuron-neurons
Abstract base class of layer-layer, consisting of multiple neurons
Abstract base class of network-neural network, composed of multiple layers (layer)
Iactivationfunction-Interface for activation functions (activation function)
Iunsupervisedlearning-Interface for tutorial-free learning (unsupervised learning) algorithm isupervisedlearning-Interface with tutor Learning (supervised learning) algorithm
(2) Use Aforge Establish BP Neural Network
Using Aforge to build a BP neural network will use the following classes:
<1> sigmoidfunction:s-shaped neural network
Constructor: Public sigmoidfunction (double alpha)
The parameter alpha determines the degree of steepness of the S-shaped function.
<2> activationnetwork: Neural network class
constructor function:
Public Activationnetwork (iactivationfunction function, int inputscount, params int[] neuronscount)
: Base (Inputscount, Neuronscount.length)
Public virtual double[] Compute (double[] input)
Parameter meaning:
Inputscount: Number of inputs
Neuronscount: Indicates the number of neurons in each layer
<3> BACKPROPAGATIONLEARNING:BP Learning Algorithm
constructor function:
Public backpropagationlearning (Activationnetwork network)
Parameter meaning:
Network: The neural net object to be trained
The Backpropagationlearning class requires a user-set property with the following 2:
Learningrate: Learning Rate
Momentum: Impulse Factor
Here is a code to build a BP network with Aforge.
Create a multilayer neural network, using the S-shape activation function, each layer has 4,5,3 neurons//(where 4 is the number of inputs, 3 is the number of outputs, 5 is the number of middle-tier nodes)
Activationnetwork network = new Activationnetwork (
New Sigmoidfunction (2), 4, 5,3);
Creating a Training algorithm object
Backpropagationlearning teacher = new
Backpropagationlearning (network);
Setting the learning rate and impulse coefficient of BP algorithm
Teacher. Learningrate = 0.1;
Teacher. Momentum = 0;
int iteration = 1;
Iterative Training 500 Times
While (Iteration < 500)
{
Teacher. Runepoch (Traininput, trainoutput);
++iteration;
}
Using the trained neural network to classify, T is the input data vector
Network.compute (t) [0]
The iris data is classified by the program, and the recognition rate can reach about 97%.
Click to download source code
Article from: http://www.cnblogs.com/heaad/
Reprint please keep the source, thx!
Reference Documents
[1] Andrew Kirillov. Neural Networks on C #. [Online].
Http://www.codeproject.com/KB/recipes/aforge_neuro.aspx 2006.10
[2] Sacha Barber. Ai:neural Network for Beginners. [Online].
Http://www.codeproject.com/KB/recipes/NeuralNetwork_1.aspx 2007.5
[3] Richard O. Duda, Peter E. Hart and David G. Stork. Pattern classification. Mechanical industry press. 2010.4
[4] Wikipedia. Iris Flower Data set. [Online].
Http://en.wikipedia.org/wiki/Iris_flower_data_set
Getting Started with neural network programming