This lesson is mainly from how to judge a machine learning classification algorithm to fit the parameters is the best parameter extraction function interval and the definition of geometric interval.
1. Function interval
Assuming a hypothetical function, you know Y=1, and vice versa y=0. So when, we can be very sure that y=1, when, can be very certain to think of y=0. So in the classification algorithm, when we train the sample to get these two results, we can know that the selected parameters can well fit the data, can be confident that our classifier is consistent with data facts. So our data can elicit the definition of function interval.
Given a data case, the hypothetical function is (denoted by (w,b), expressed as B, represented as W, the result of the entire imaginary function is represented as { -1,1}), and we can define the function interval for this data case based on the parameter (W,B):
Therefore, if you want to get a value as large as possible in the function interval, at the time, you need a positive number as large as possible. At the time, you need a negative number that is as large as possible. So we can launch
When the function interval is large, the parameters chosen by the algorithm can better simulate the reality of the data to make better assumptions about the test data set.
On the given entire training data set, the function interval is:
2. Geometrical interval
Figure 1
Suppose the hypothetical function, represented by a line in Figure 1, is called separating the hyper-plane (the line used to separate the dataset, also called the decision boundary). All the data points in Figure 1 are on a two-dimensional plane, so this time the plane is separated into a straight line. However, if all the data points are in three-dimensional space, the plane is separated from the hyperspace plane. If the data is in an n-dimensional space, the hyperspace is separated from the n-1 dimension.
The farther the decision boundary is in the data point, the more credible the final prediction result is. The point A in Figure 1 is farthest from the decision boundary, stating that it belongs to Y=1, and that the C point is closest to the decision boundary and can be judged to be y=0 if the decision boundary is changed slightly. Therefore, the choice of separating the hyper-plane (decision boundary) depends on the interval between the closest point to the divider and the separating plane, which is the geometric interval, and the support vector is the closest point from the delimited hyper-plane. The larger the geometry interval, the more believable the classifier is.
Figure 2
According to Figure 2 can define the geometric interval, known as a, the imaginary function is, we know that W is a normal vector separating the plane, w/| | w| | is the unit normal vector that separates the superelevation plane. Point A can represent the case of Y=1, assuming ab=, so B (, 0). So we can get the following equation:
So the solution can get:
This solves only the y=1 situation, so the synthesis of y=-1 can be defined as the geometric interval of point a:
On the given entire training data set, the geometry interval is
3. Relationship between function interval and geometric interval
Function Interval/| | w| | = Geometric interval
function intervals are scaled with the scaling of W and B, but there is no point in selecting the parameters for the algorithm. The geometry interval does not scale with the scaling of W and B.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Lesson6 thoughts on "machine learning" at Stanford ——— 1, function interval and geometric interval