Perception Machine
assumption: The input space is x ? Rn output space is y={+1,-1}
Classified as a perceptron by the following 2.1 functions
where w is the weight vector and B is biased
Sign (.) is a symbolic function, as shown below:
Perceptron is a kind of linear classification model, which belongs to discriminant model.
The hypothetical space of the Perceptron model is defined in the feature space.
All linear classification models (linear classification model) or linear classifiers (linear classifier)
That is, the function set {f|f (x) = WX +b}
the Model parameter w,b is obtained by the Perceptual Machine Model 2.1.
Perceptron prediction, through learning the perceptual machine model, for the new input instance gives its corresponding output category.
?
?
?
Perceptron Learning Strategy
Assuming that the training data set is linearly separable , the object of perceptual learning is to obtain a separate Hyper plane which can completely separate the positive and negative instance points of the training set.
In order to determine the Perceptron model parameter W, b needs to determine a learning strategy that defines (experience) the loss function and minimizes the loss function .
A natural selection of loss functions is the total number of false classification points . Such loss function is not a continuous w,b function of parameter, and is not easy to optimize.
Another choice of loss function is the total distance from the error classification point to the super plane S , which is used by the perceptron.
The distance s is defined as follows:
|| w| | is the L2 Norm of W.
Do not consider 1/| | w| |, the loss function of the perceptron is defined as follows:
????????????
where M is the collection of the wrong classification points.
This loss function is the experiential risk function of perceptual machine learning .
The loss of function L (W,B) is non-negative.
Without the wrong classification point, the loss function value is 0.
The fewer the wrong classification points, the closer the error classification points are from the super plane, the smaller the loss function value.
The loss function of a particular sample point: The linear function of the parameter w,b when the error is classified, which is 0 when correctly classified.
Therefore, given the training dataset T, the loss function L (w,b) is a continuous, w,b function of the .
The strategy of perceptual machine learning is to Select the model parameter w,b which makes the loss function type 2.4 Minimum in the hypothesis space , namely the perceptual machine model.
?
?
Perceptron Learning Algorithm
Perceptual machine learning algorithm is the driver of mis-classification
Using random gradient descent method (stochastic gradient descent)
first, arbitrarily select a super plane, W 0 , b 0 , and then the objective function is minimized by the gradient descent method .
In the minimization process, a random classification point is randomly selected to decrease its gradient.
So the loss of the function L (w,b) gradient by:
?
randomly select a mis-classification point (xi, yi) to update the W,B:
?
The type of η (0< η ≤ 1) is the step, in the statistical learning is also called the learning rate (learning rates).
Thus, the iteration can expect the loss of function L (w,b) to decrease until it is 0.
Perceptron algorithm Primitive Form
Input: Training data set t={(X1,y1), (x2,y2),..., (Xn,yn)},
where XI ? x=rn, Yi ? y={-1,+1},i=1,2,..., N;
Learning Rate η (0<η ≤ 1);
Output: w,b; Perceptron model F (x) =sign (w x+b).
(1) Select the initial value w0,b0
(2) Selecting data in the training set (Xi,yi)
(3) If Yi (w xi+b) ≤0
?
(4) go to (2) until there is no wrong classification point in the training set.
?
The dual form of the perceptron can be calculated by pre-accumulating the inner product of the training set and stored as a matrix.
That is the gram Matrix
????????????
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Statistical learning Methods (Perceptron)