1. Data preprocessing
before training the neural network, it is necessary to preprocess the data, and an important preprocessing method is normalization processing. The following is a brief introduction to the principle and method of normalization processing.
(1) What is normalization?
Data normalization is the mapping of data to [0,1] or [ -1,1] intervals or smaller intervals, such as (0.1,0.9).
(2) Why should normalization be processed?
<1> input data units are not the same, some of the data may be particularly large, resulting in a slow convergence of neural networks, long training time.
Input with a large range of <2> data can be too large in the pattern classification, and the input with small data range may be small.
<3> because the value of the activation function of the neural network output layer is limited, it is necessary to map the target data of the network training to the domain of the activation function. For example, if the output layer of the neural network uses the S-shape activation function, because the value of the S-shape function is limited to (0,1), that is, the output of the neural network can only be limited to (0,1), so the output of the training data will be normalized to the [0,1] interval.
The <4>s activation function is flat outside the (0,1) interval, and the sensitivity is too small. For example the S-shape function f (X) when the parameter a=1, F (100) and F (5) are only 0.0067.
(3) Normalization algorithm
A simple and fast normalization algorithm is a linear conversion algorithm. There are two common forms of linear conversion algorithms:
<1>
y = (x-min)/(Max-min)
where min is the minimum value of x, Max is the maximum value of x, the input vector is x, and the normalized output vector is Y. The data is normalized to the [0, 1] interval, which is applied when the activation function takes the S-shape function (the range is (0,1)).
<2>
y = 2 * (x-min)/(max-min)-1
This formula normalized the data to the [-1, 1] interval. This applies when the activation function takes a bipolar S-shape function (the range is ( -1,1)).
(4) MATLAB data normalization processing function
In MATLAB, the normalized processing data can be used Premnmx, Postmnmx, tramnmx 3 of these functions.
<1> Premnmx
Syntax: [PN,MINP,MAXP,TN,MINT,MAXT] = Premnmx (p,t)
Parameters:
Matrix of Pn:p matrix by row Normalization
Minp,maxp:p minimum, maximum value for each row of the matrix
Matrix of Tn:t matrix by row Normalization
Mint,maxt:t minimum, maximum value for each row of the matrix
function: The matrix p,t normalized to [ -1,1], mainly used for normalization processing training data set.
<2> Tramnmx
Syntax: [PN] = Tramnmx (P,MINP,MAXP)
Parameters:
Minp,maxp:premnmx function calculates the minimum and maximum value of a matrix
PN: Normalized matrix
Function: Mainly used for normalization of the input data to be classified.
<3> Postmnmx
Syntax: [p,t] = Postmnmx (PN,MINP,MAXP,TN,MINT,MAXT)
Parameters:
The MINP,MAXP:PREMNMX function calculates the minimum value, maximum value, of the P-matrix per row
The MINT,MAXT:PREMNMX function calculates the minimum value, maximum value, of the T-matrix per row
Function: The scope of the matrix Pn,tn map before the regression of the processing. The POSTMNMX function is mainly used to map the output of neural networks to the data range before regression.
2. Using MATLAB to implement neural networks
Using MATLAB to establish a feedforward neural network will mainly use the following 3 functions:
NEWFF: Feedforward Network creation function
Train: Training A neural network
SIM: Using the network for emulation
The following is a brief introduction to the use of these 3 functions.
(1) NEWFF function
<1>NEWFF function syntax
The newff function parameter list has a number of optional parameters, which can be referenced in MATLAB's help documentation, which describes a simple form of the NEWFF function.
Syntax: NET = NEWFF (A, B, {C}, ' Trainfun ')
Parameters:
A: A nx2 matrix, the first row element is the minimum and maximum value of the input signal XI;
B: A K-Koriyuki vector, whose elements are the number of nodes in the network;
C: A k-dimensional string line vector, each component is the corresponding layer neuron activation function ;
Trainfun: The training algorithm used for learning rules.
<2> Common activation functions
The usual activation functions are:
a) linear functions (Linear transfer function)
f (x) = X
The string for the function is ' Purelin '.
b) Logarithmic S-shaped transfer functions (logarithmic sigmoid transfer function)
the string for the function is ' logsig '.
c) Hyperbolic tangent S-shape function (hyperbolic tangent sigmoid transfer function)
This is the bipolar S-shape function mentioned above.
The string for the function is ' tansig '.
The Toolbox\nnet\nnet\nntransfer subdirectory in the installation directory of MATLAB has a definition description of all activation functions.
<3> Common training functions
the common training functions are:
Traingd: Gradient Descent bp training function (Gradient descent backpropagation)
TRAINGDX: Gradient Descent adaptive learning rate training function
<4> Network Configuration parameters
Some important network configuration parameters are as follows:
Net.trainparam.goal: Target error of neural network training
Net.trainparam.show: Shows the period of intermediate results
Net.trainparam.epochs: Maximum number of iterations
NET.TRAINPARAM.LR : Learning rate
(2) Train function
Network Training learning function.
Syntax: [NET, tr, Y1, E] = Train (NET, X, Y)
Parameters:
X: Network actual input
Y: Network should have output
TR: Training Tracking information
Y1: Network actual output
E: Error Matrix
(3) Sim function
Syntax: Y=sim (NET,X)
Parameters:
NET: Network
X: Input to the network Kxn matrix, where K is the number of network inputs, n is the number of data samples
Y: Output matrix qxn, where Q is the number of network outputs
(4) Matlab BP Network example
I divided the iris dataset into 2 groups of 75 samples each with 25 samples per flower in each group. One group served as a training sample for the above procedure and the other as a test sample. For the convenience of training, 3 categories of flowers are numbered as three-to-three.
Use this data to train a 4 input (corresponding to 4 characteristics respectively), 3 output (respectively, to the probability that the sample belongs to a variety of the potential size) of the forward network.
The MATLAB program is as follows:
% read training data [F1,f2,f3,f4,class] = Textread (' trainData.txt ', '%f%f%f%f%f ', 150);% eigenvalue normalization [Input,mini,maxi] = Premnmx ([F1, F2, F3, F4] ') ;% construction output Matrix S = Length (class); output = Zeros (s, 3 ); For i = 1:s output (I, Class (i) ) = 1; end% Create a neural network Net = NEWFF (Minmax (Input), [3], {' Logsig ' Purelin '}, ' Traingdx '); % Set Training Parameters net.trainparam.show = Net.trainparam.epochs = Net.trainparam.goal = 0.01; net.trainParam.lr = 0.01;% start Training n ET = train (NET, input, output ');% read test data [T1 T2 t3 t4 c] = textread (' testData.txt ', '%f%f%f%f%f ', 150);% Test Data Normalization Testinput = Tramnmx ([T1,t2,t3,t4] ', MinI, MaxI);% emulation Y = SIM (NET, testinput)% statistical recognition accuracy [S1, s2] = size (Y); hitnum = 0; for I = 1:s2 [m, Index] = max (Y (:, i)); if (Index = = C (i) ) hitnum = hitnum + 1; endendsprintf (' Recognition rate is%3.3f%% ', ' hitnum/s2 ')
The recognition rate of the above procedures is stable at about 95%, training 100 times to achieve convergence, the training curve as shown:
Figure 9. Training performance
(5) Effect of parameter setting on neural network performance
in my experiment, by adjusting the number of nodes in the hidden layer, I chose the inactive activation function to set different learning rates .
<1> number of hidden layer nodes
The number of hidden layer nodes has little effect on the recognition rate, but the number of nodes increases the computational capacity and makes training slow.
<2> Selection of activation functions
The activation function has a significant effect on the recognition rate or the rate of convergence. The precision of S-shape function is much higher than that of linear function in the approximation of the higher curve, but the computational amount is much larger.
<3> Choice of learning rate
The learning rate affects the speed of network convergence and the convergence of networks. The learning rate setting is small to ensure the convergence of the network, but the convergence is relatively slow. On the contrary, the high learning rate setting may make the network training not convergent and affect the recognition effect.
3. Using aforge.net to implement neural networks
(1) Aforge.net Introduction
Aforge.net is a C # implementation of the open-source architecture for AI, computer vision and other fields. The Neuro directory under the Aforge.net source code contains a neural network class library.
Aforge.net Home: http://www.aforgenet.com/
Aforge.net Code Download: http://code.google.com/p/aforge/
The class diagram for the Aforge.neuro project is as follows:
Figure 10. Class diagram of Aforge.neuro class library
Here are a few of the basic classes in Figure 9:
Abstract base class for neuron-neurons
Abstract base class of layer-layer, consisting of multiple neurons
Abstract base class of network-neural network, composed of multiple layers (layer)
Iactivationfunction-Interface for activation functions (activation function)
Iunsupervisedlearning-Interface for tutorial-free learning (unsupervised learning) algorithm isupervisedlearning-Interface with tutor Learning (supervised learning) algorithm
(2) using Aforge to establish BP neural network
using Aforge to build a BP neural network will use the following classes:
<1> sigmoidfunction:s-shaped neural network
Constructor: Public sigmoidfunction (double alpha)
The parameter alpha determines the degree of steepness of the S-shaped function.
<2> activationnetwork: Neural network class
constructor function:
Public Activationnetwork (iactivationfunction function, int inputscount, params int[] neuronscount)
: Base (Inputscount, Neuronscount.length)
Public virtual double[] Compute (double[] input)
Parameter meaning:
Inputscount: Number of inputs
Neuronscount: Indicates the number of neurons in each layer
<3> BACKPROPAGATIONLEARNING:BP Learning Algorithm
constructor function:
Public backpropagationlearning (Activationnetwork network)
Parameter meaning:
Network: The neural net object to be trained
The Backpropagationlearning class requires a user-set property with the following 2:
Learningrate: Learning Rate
Momentum: Impulse Factor
Here is a code to build a BP network with Aforge.
//Create a multilayer neural network with an S-shaped activation function with 4,5,3 neurons in each layer(where 4 is the number of inputs, 3 is the number of outputs, and 5 is the number of middle-tier nodes) activationnetwork network =Newactivationnetwork (NewSigmoidfunction (2),4,5,3);//Create a training algorithm object backpropagationlearning teacher =New backpropagationlearning (network);/ / Set the learning rate and impulse coefficient of bp algorithm teacher.learningrate = 0.1; teacher. Momentum = 0; int iteration = 1 ;// iterative training 500 times while (Iteration < ) {teacher. Runepoch (Traininput, trainoutput); ++iteration; } //Using trained neural networks to classify, T is the input data vector network. Compute (t) [0]
The iris data is classified by the program, and the recognition rate can reach about 97%.
Neural network for "reprint"