In the previous pattern recognition study, the discriminant function J (.) The parameter is known, that is, the parameter form of the assumed probability density function is known. This section does not consider the exact form of the probability density function, and uses nonparametric methods to solve the discriminant function. Since linear discriminant functions have many excellent properties, we only consider the following forms of discriminant functions: they are either linear functions of each component of x , or a linear function of certain functions that take x as an argument. Before designing the Perceptron, there are a few basic concepts that need to be clarified:
discriminant function : Refers to a function of the linear combination of the various components of x:
If the sample has C class, there is a C discriminant function, this section only discusses the classification of the two categories of samples. The following decision rules are required:
second, the sample linear can be divided : In the feature space can be used in one or more linear interface to correctly separate several kinds of samples; for two kinds of sample points W1 and W2, the sample point set is represented as:, using a discriminant function to divide W1 and W2, It is necessary to use these sample sets to determine the weighted vector aof the discriminant function, which can take the augmented sample vector y, i.e. there is a suitable augmented weight vector a, which makes:
which
The sample is said to be linearly divided. As in the first figure is linear can be divided, and the second chart is not divided. All weights that satisfy the condition are called solution vectors.
Usually the solution limit: the introduction of margin B, which requires the solution vector to satisfy:
The addition of the margin B can prevent the optimization algorithm from converging to the boundary of the solution area to some extent.
Third, the Perceptron criterion function
The problem of the criterion function of constructing linear inequalities is considered here, so that the criterion function J (.) For:
Where Y is the sample set of the weighted vector A's wrong division. When and only if JP (A *) = min JP (a) = 0 o'clock, A * is the solution vector. This is the Perceptron (Perceptron) criterion function.
1. Basic Perceptron Design
The minimization of the Perceptron criteria function can be solved using the gradient descent iterative algorithm:
where k is the number of iterations, η is the step of adjustment. The weight vector of the next iteration adjusts the weight vector of the current moment to the negative gradient direction of the objective function.
That is, at each iteration, the wrong sample is superimposed on the weight vector by a coefficient. This gives you a perceptual algorithm.
2. Batch-processing Perceptron algorithm
3. Fixed incremental perceptron algorithm
In general, it is not the most efficient practice to fix all the wrong samples at once, and it is more common to fix incremental methods for only one sample or batch of samples at a time:
Convergence Analysis : As long as the training sample set is linearly divided, for arbitrary initial value A (1), after a finite number of iterations, the algorithm must converge. And when the sample is linearly non-tick, the perceptron algorithm cannot converge.
conclusion : The Perceptron is the simplest machine that can "learn", and is the most basic method to solve the linear division. is also the basis for many complex algorithms. The algorithm of Perceptron has many kinds of generalization, such as variable increment perceptron with headroom, batch processing margin relaxation algorithm, single sample margin relaxation algorithm and so on.
The following is the MATLAB code implemented by the batch perceptron algorithm and the fixed incremental perceptron algorithm, and gives four sets of data for testing:
% Batch Processor Perceptron algorithm function batchperceptron(W1, w2) Figure;plot (W1 (:,1), W1 (:,2),' ro '); hold On;grid On;plot (W2 (:,1), W2 (:,2),' B + ');% augmented eigenvectors y for all training samplesone =ones(Ten,1); y1 =[one W1]; y2 =[one W2]; W12 =[Y1;-y2];% augmented sample normalizationy =Zeros(size(W12,1),1);% split Sample set y initial zero matrix% Initialization ParametersA =[0 0 0];% [0 0 0];Eta =1; Time =0;% convergent Step number whileAny (y<=0) for I=1:size(Y,1) Y (I) = A * W12 (I,:)'; End A = a + sum (W12 (Find (y<=0),:));% modifier vector a time = time + 1;% convergence step if (time >=) break; Endend;if (Time >=) disp ('The target function cannot converge within the specified maximum number of iterations'); Disp (['The solution vector A of the batch Perceptron algorithm is:', Num2str (a)]); Else disp (['When the batch-processing perceptron algorithm converges, the solution vector A is:', Num2str (a)]); Disp (['Batch Perceptron algorithm convergence steps K is:', Num2str (time)]); End% finds the set of samples in coordinates so that the sample coordinates are printed xmin = min (min (W1 (:, 1)), Min (W2 (:, 1)), Xmax = Max (W1 (:, 1)), Max (W2 (:, 1)); Xindex = Xmin-1: (xmax-xmin)/100:xmax+1;yindex =-A (2) *xindex/a (3)-A (1)/A (3);p lot (Xindex,yindex), title ('Batch Perceptron algorithm for classification of two kinds of data');
% fixed incremental perceptron algorithm function fixedincrementperceptron(W1, w2) [N, D]=size(W1); Figure;plot (W1 (:,1), W1 (:,2),' ro '); hold On;grid On;plot (W2 (:,1), W2 (:,2),' B + ');% augmented eigenvectors y for all training samplesone =ones(Ten,1); y1 =[one W1]; y2 =[one W2]; W12 =[Y1;-y2];% augmented sample normalizationy =Zeros(size(W12,1),1);% split Sample set y initial zero matrix% Initialization ParametersA =[0 0 0]; Eta =1;% k = 0;Time =0;% of steps to convergeYK =Zeros(Ten,3); y = A *W12 '; whileSUM (y<=0) >0% for I=1:size (y,1)% y (i) = a * W12 (i,:) ';% End;y = A *W12 '; rej=[]; for I=1:2*n% This loop calculates a (k+1) = A (K) + sum {YJ is incorrectly classified} y (j) ifYI) <=0A = a + W12 (I,:); Rej =[Rej i];End End % fprintf (' After iter%d, a =%g,%g\n ', time, a); % RejTime = time +1;if((size(REJ) = =0) | (Time >= -)) Break;EndEnd;if(Time >= -)disp(' The target function cannot converge within the specified maximum number of iterations ');disp([Thesolution vector A for the fixed increment perceptron algorithm is: ', Num2str (a)]);Else disp([' fixed increment perceptron algorithm converges when the solution vector A is: ', Num2str (a)]);disp([' fixed increment perceptron algorithm convergence step kt is: ', Num2str (time)]);End% find samples in the coordinates of the central area, in order to print the sample coordinate mapxmin = min (min (W1 (:,1), Min (W2 (:,1)); xmax = max (max (W1) (:,1), Max (W2 (:,1)); Xindex = xmin-1:(xmax-xmin)/ -: xmax+1;% Yindex =-A (2) *xindex/a (3)-A (1)/A (3);Yindex =-A (2) *xindex/a (3)-A (1)/A (3);p lot (Xindex,yindex), title (' fixed incremental perceptron algorithm for classification of two types of data ');
Close all;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% sensor Experiment%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%W1 = [0.1 1.1;... 6.8 7.1;...-3.5-4.1;... 2.0 2.7;... 4.1 2.8;... 3.1 5.0;...-0.8-1.3;... 0.9 1.2;... 5.0 6.4;... 3.9 4.0];W2 = [7.1 4.2;...-1.4-4.3;... 4.5 0.0;... 6.3 1.6;... 4.2 1.9;... 1.4-3.2;... 2.4-4.0;... 2.5-6.1;... 8.4 3.7;... 4.1-2.2];W3 = [-3.0-2.9;... 0.54 8.7;... 2.9 2.1;...-0.1 5.2;...-4.0 2.2;...-1.3 3.7;...-3.4 6.2;...-4.1 3.4;...-5.1 1.6;... 1.9 5.1];W4 = [-2.0-8.4;...-8.9 0.2;...-4.2-7.7;...-8.5-3.2;...-6.7-4.0;...-0.5-9.2;...-5.3-6.7;...-8.7-6.4;...-7.1-9.7;...-8.0-6.3]; Batchperceptron (W1, W2); Fixedincrementperceptron (W1, W3);
Pattern Recognition: Realization of Perceptron