The 2nd Chapter Perception Machine
The Perceptron is a linear classification model of class Two classification, whose input is the characteristic vector of an instance, and the perceptual machine corresponds to the separation of the examples into positive and negative two classes in the input space (feature space), which belongs to the discriminant model. A loss function is introduced based on the error classification, and the loss function is minimized by the gradient descent method, and the Perceptron model is obtained. Perceptual machine learning algorithm is divided into primitive form and dual form, which is the foundation of Neural network and support vector machine.
1. Perceptron model
Perceptual Machine Definition:
Assuming that the input space (the feature space) is x, the output space is the value of Y,y is +1 and-1, the input x represents the instance of the eigenvector, corresponding to the input space (feature space); Enter Y to represent the class of the instance. The following functions from the input space to the output space:
f (x) = sign (W x + b)
Called Perceptron, W, B is the model parameter, W is the weight or weight vector, B is biased, W X is expressed as the inner product. Geometrically, the W x+b=0 corresponds to a super-plane of the feature space, W is the normal vector of the super-plane, and B is the intercept of the super-plane. That is, finding a hyper-plane separates the positive and negative instances of the data.
2. Perceptual Machine Learning Strategy2.1 Important Definitions: linear scalability of Datasets (because the premise of perceptual machine learning is the data set's scalability)
2.2 Loss function
In order to find the super plane and to determine the model parameters of the Perceptron, a learning strategy is defined, that is, to define the loss function and minimize the loss function. 、
The loss function selects the total distance from the wrong classification point to the super plane S, so the distance from any point in the input space x0 to the super plane is first written out:
|| w| | Is the L2 norm of W
Second, for the mis-classified data (Xi,yi),-yi (W XI + B) > 0 is established, so the distance from the mis-classification point XI to the plane s can be written as:
Thus, it is assumed that the set of the error classification points of the hyper plane S is M. The total distance from all the mis-classification points to the super plane S is:
Without consideration, you get the loss function of the Perceptron:
where M is a set of false classification points, this loss function is the experiential risk function of perceptual machine learning. The fewer the wrong classification points, the closer the error classification points are from the super plane, the smaller the loss function value.
3. Perceptual Machine Learning Algorithm3.1 Original form
The loss function is obtained, and the next step is to minimize the loss function, namely:
The method uses gradient descent algorithm, the minimization process is not to make all of M in the classification of the gradient drop, but a random selection of a wrong classification point to reduce its gradient, assuming that the mis-classification point set M is fixed, then the loss of the function L (w,b) gradient by:
Randomly select a mis-classification point (xi,yi) to update the W,B:
η is the learning rate through iteration can expect the loss of the function L (W,B) is continuously reduced until 0, the following algorithm can be obtained:
Perceptual machine learning algorithm can be different by using different initial values or selecting different mis-classification points.
3.2 Dual Form
When the training data set is linear, the perceptual machine learning algorithm is convergent, and the error classification of the Perceptron algorithm on the training data set K satisfies the inequality:
Statistical learning Methods (2nd) Perceptual Machine Learning Notes