SVM is a widely used classifier, the full name of support vector machines , that is, SVM, in the absence of learning, my understanding of this classifier Chinese character is support/vector machines, after learning, Only to know that the original name is the support vector/machine, I understand this classifier is: by the sparse nature of a series of support vectors to get a better classifier, this classifier in the name is embodied in machine. Here are some of my knowledge points that need to be understood and mastered after learning the SVM theory
- function interval (functional margin) and geometry interval (geometric margin)
- Understanding of support vectors
- Solving the optimization problem of SVM
First, we give a general introduction to the beginning of SVM a type of diagram:
From this picture, it can be very clear that SVM realizes the classification of two kinds of data, in this picture, there is a middle horizontal line is we want to get the classifier, in the two-dimensional plane as a straight line (linear classification) or curve (non-linear classification), The high-dimensional space is represented as a super-plane (hyperplane).
Review machine learning (i)---the regression of supervised learning has knowledge about logistic regression: For classifier hypothesis, in the case of two classification, now for this function, with the expression of the super-plane: to replace, for the two classification problem, the need to explain is:
Contact the formula with the relationship between, when greater than 0, the data classification of the label is 1, and vice versa is set to-1. This leads to the definition of the function interval, functional margin:
Where the value is {+1,-1}, value is +1, in order to achieve a good classification effect, the values inside the parentheses need to be represented as positive, and is a larger positive value, while the values of-1, the value inside the parentheses need to behave as negative, and is a negative value with a large absolute size.
In order to compare the large function interval, to achieve a good confidence, that is, the classification results more accurate and credible, it is possible to simply increase the value of W and B, but such simple processing is meaningless, because our goal is to find a better super plane, it should be w,b is different, in three-dimensional space , W stands for the normal vector of the classification plane, and for the simple multiplication of the previous factor, the actual plane does not change, so this is not the result we need.
Then we need to draw out the geometry interval, geometric the concept of margin.
For, b Point is a point in the super-plane projection, for the coordinates of Point B, this is a very simple three-dimensional geometry problem, for the super-plane of the normal vector, W, we can easily find its unit vector, which can be cosine angle, Thus it is easy to obtain the coordinates of B according to the coordinates of a, namely:. Since the B Point falls on the super plane, the following results can be obtained:
After a simple linear algebra process, you can get:
Feeling is not a bit like having shown the feeling, very similar to the point to the plane distance of the formula, because it is likely to be negative, because there is less an absolute value symbol, and then after simple processing:
This guarantees a non-negative nature. This is the definition of geometric spacing .
You can see that the geometry interval and the function interval are very similar, but the distance from the point to the plane does not change because of the w,b , and at this point, the geometric interval is more meaningful and valuable than the function interval.
Therefore, for the above two classification problem, you can get the following abstract model: define
That is, by learning and training, you can get a group that makes it possible to reach the maximum.
===================================================================================
Next, we need to discuss the problem of solving the model proposed above.
Through the relationship between the function interval and the geometric interval, the transformation can be obtained:
There is a need for an in-depth discussion of the problem. This is also my understanding of SVM is always not quite clear of the problem, in the theory of SVM learning, usually there. The restrictions
For, we can take arbitrary scaling factor, here of course there is a certain limit. But to be sure, by scaling, you can make. Scaling at the same time has no effect on the geometry interval.
====================================== digression ===============================================
But there is one more problem: At first it is easy to get caught up in the vicious circle of the problem that, for the minimum interval, it is possible that in real situations, the interval of data that needs to be classified to the hyper plane may be less than 1. Then adding a minimum interval of 1 to such a limit, not to filter out such a part of the point, resulting in the calculation of the time not to consider this part of the point.
My understanding of the problem is that. First of all, this understanding of causality is problematic, first of all we are to maximize the minimum interval, so as to obtain, get the classifier. Then, before the expression of the plane is uncertain, the function interval is changed, and there is no filter-unfiltered argument. Adding "1" to the limit is just for the convenience of processing.
========================================================================================
After explaining the value of the problem, and then start the topic, the problem of solving the model. At that time, the above model solution could be converted to the solution of the model:
This problem is a typical convex optimization problem, which can be solved by the QP model.
At this point, the basic knowledge of SVM has been fully accounted for, the next one will be explained in the above-mentioned model of the solution problem. But I always feel that there are already given the key terms not given, what is it? Yes, it is the concept of support vectors.
========================================================================================
In the previous discussion, we discussed the problem of the value, and gave the restrictive conditions. For data that satisfies a value of 1, we call it a support vector. As shown in the following:
As can be seen from the first diagram of this article, the satisfying vectors, that is, the support vectors, are few, that is, sparse.
Machine learning (ii)---SVM learning: A theoretical basis for understanding