There is a period of time does not dry goods, home are to be the weekly lyrics occupied, do not write anything to become salted fish. Get to the point. The goal of this tutorial is obvious: practice. Further, when you learn some knowledge about machine learning, how to deepen the understanding of the content through practice. Here, we make an example from the 2nd-part perceptron of Dr. Hangyuan Li's statistical learning method, which leads to a general approach to learning. It is important to note that this tutorial is not intended to introduce a perceptual model, but rather to illustrate how to learn and practice a model, so the interpretation of the perceptron is not very detailed. The content of this tutorial is more basic, mainly for those who are interested in machine learning and have a preliminary understanding of it. Because of the special purpose of this article, and the author's level is limited, there is no serious or wrong statement, but also asked the great God many points. This article needs the reader's preparation: MATLAB (test model), Love machine learning brain (ah feed me the seriousness of the atmosphere. ) 。 First: Understanding Model Model Types
When we are learning a model, it is important that we understand the role of the model and its application. Here we will analyze the Perceptron:
The perceptual Machine (perceptron) is a linear classification model of class Two classification, which is input as the eigenvector of the instance, the output is the class of the instance, and the +1 and 12 values are taken. In the first sentence, we can set up a general framework for this model. First, the Perceptron is a class two classification model, which means that the perceptron can only classify two categories. Second, the perceptron is a linear classification model, which means that the data used by the Perceptron model must be linearly measurable. So far, we've learned a lot about the scope of the Perceptron: First, the perceptron is a discriminant model, it applies to the classification problem, and the number of categories that can be distinguished is 2; second, the perceptron is a linear classification model. If you still don't understand the type of problem the perceptron applies to, so I'm here for an example: in two dimensions, the perceptron is equivalent to a flat line on the plane, thus dividing the plane into two halves; in three-dimensional situations, the perceptron is equivalent to a knife in the space of a chopper, thus dividing the space into two categories. These two sentences are equivalent to the following sentence in their scope of application: The perceptual machine corresponds to the input space (characteristic space) to divide the example into positive and negative two kinds of separation hyperplane (three-dimensional bottom is "kitchen knife"), belong to discriminant model. Model
A function that maps an input space to an output space in the perceptual machine model:
W is the model's weight (weight), also known as the weight, B is the bias (bias). The sign function is defined as follows (in some places: SGN):
It should be noted that in general, the value of sign (0) is 0. In order to ensure that the model output is +1 or 1, it is stipulated that sign (0) =+1. According to the model, it is not difficult to see the geometrical meaning of the perceptual machine. A linear equation is the hyperplane of a separate space. where w is a normal vector of the plane (geometrically), B is its intercept. Training loss function
In short, minimize the loss function. First, define (experience) loss functions (see the original book 2.2.2 P27 for detailed procedures):
Loss function, which can be understood as an evaluation function for the degree of classification of perceptual model errors. With the loss function, we can transform the training perceptron problem into the minimization of loss function.
Normal form-gradient descent
Here, we use the gradient descent method (gradient descent) to minimize the variable random gradient descent method (stochastic gradient descent). The relationship between the two algorithms, the pros and cons are not in the scope of this article, so omitted. Biased solution gradient:
The algorithm steps are as follows: Select the initial hyperplane s, which is the selection. Randomly select a false classification point, update W, B. Wherein, α is the step of each iteration, also known as the learning rate. Repeat 2 until no classification point is reached.
It is not difficult to find that if the dataset is linearly measurable, then the loss function will eventually be equal to 0. Practice
Below, we will use the MATLAB to realize the perceptual machine. The first is the Model predictive function PREDICT.M (prevent duplicate sign, as SGN): MATLAB 1 2 3 4 5 function [y] = Predict (x, W, b) y = SGN (x * w + b); End
Then the loss function COSTFUNC.M. This uses a quantized approach to avoid loops: MATLAB 1 2 3 4 5 6 7 function [J] = Costfunc (W, b, X, y)% the cost function of Perceptron model . J = SUM ((X * w + b). * y); End
Next is the key part of the--SGD algorithm training model (TRAIN_SDG.M), first initialized: MATLAB 1 2 3 4