Author: Liangdas
Source: Simple, popular, machine learning http://blog.csdn.net/liangdas/article/details/44251469
Introduction:
In 1995, Cortes and Vapnik first presented support vector machines (SVM). Because it can adapt to the classification of small samples. Classification speed and so on, performance is not inferior to artificial neural network, so after that, people will use SVM in various fields. A large number of papers using SVM model are emerging, including domestic and foreign.
But. Until now, very few can see in a can to the principle of SVM is accurate, concrete, popular. A rigorous paper or material. So I decided to consult a lot of information. Combine your own thinking and understanding to write a series of articles about SVM.
Linear can be divided into problems:
in the classification problem. The simplest classification is the two classification problem, and this two classification problem is a linear sub-problem. So what is the objective function for a linear two classification problem? After the target function is determined. What methods do we use to solve this problem?
in one-dimensional space, which is an axis. To separate the two points that can be separated, we just need to find a point. 1:
Figure 11 Dimensional linear can be divided
in a two-dimensional space, to separate two linear separable points, we need to find a categorical line, 2:
Figure 22 Dimensional linear can be divided
in three-dimensional space, to separate the two linear separable points, we need to find a classification surface: 3:
Fig. 3 Three-dimensional linear can be divided
in n-dimensional space. To separate the two linearly separable points, we need to find a super plane (Hyper Plane).
In order to be intuitive, we analyze the above two-dimensional space as a sample: in Figure 2, there are a couple of points: light blue and black dots. Representing the different two categories, it is clear that we can find a straight line to separate the two types of points. In high school, the common linear equation expression is y=ax+b. We are now using vectors to represent, that is:, the vector (a. -1) is represented by the vector symbol W, the dimension vector (x. Y) is represented by the vector symbol X (where x is a vector). Then the linear equation becomes w*x+b=0.
Target function:
look again at Figure 2. It is possible to know that in this two-dimensional plane, there are very many lines that separate the two types of points. 4 See, then in these straight lines. Which is the best one? How do you choose the best line?
Fig. 4 Classification lines
in Figure 4. Let's look at the light blue dots first. Assuming that the distance between these points to the categorical line is greater, the classification line is farther away from the blue point. Then come a new point, assuming that this point is generated by the characteristic of the light blue point set (that is, it is not a very mysterious point relative to the blue point set). The point is also very likely to be the same as the blue Point set, distributed on the same side of the line. Distributed on the same side, indicating that it belongs to the same category as the blue set point.
With the same idea, in Figure 4, for the black set point, the line is far away from them as well as possible. So find the best classification line. is to find a straight line so that it is as large as the distance to the two category points the better.
Look back at the point-to-line distance from high school. Formula: Straight line (normal): ax+by+c=0 coordinates of another point (x0, y0). So the distance to this line is:
So. For the form of a linear wx+b=0, suppose there is a point x, then its distance to the straight line formula is . The vector is written in two-norm form, which is:
So the objective function of the two-dimensional linear sub-problem above can be abstracted into a mathematical expression such as the following:
Simplification of the objective function:
for a category to classify. For their own categories can be accurately classified, all want to classify the line to their own distance larger. When there are two kinds of time, there will be competition, the final result of the competition is to let the classification line, to two categories of equal distances. In the life. This is usually the way to achieve a balanced state.
5, we make two straight lines, parallel to the classification line. Let the two lines pass through the boundaries of the blue and black sets, respectively, at the same time. To satisfy the distance between these two straight lines to the middle category line, we can assume that the two linear equations are wx+b=c, wx+b=-c, respectively.
Figure 52 equal distances from category to categorical line
The formula is calculated using the distance of two parallel lines, then. Two categories set to the near distance of the categorical line, this formula, in fact, is the same as the distance formula above the point-to-line.
for a straight line, wx+b=c, we correspond proportionally to the reduction of their coefficients, such as to become. This line is the same line as the original line. Now turn it into a new straight line or a straight line. Other words. For a straight line wx+b=c, we can always find another new w1=w/c, b1=w/c. Make W1*x+b1=1, and the two lines in fact express the same line, the above Figure 5 can be turned into the following figure 6, for example:
Figure 62 equal distances from category to categorical line
from Figure 6, we combine the best categorical lines we are looking for. Is the idea of a line with the largest distance to a set of two categories of points, and we can abstract out the objective function at this point:
in this way, the objective function looks much simpler than it used to be.
Constraints on the objective function:
in Figure 6. For blue sample points, points that are not on the w1*x+b1=1 line. Must be in the line above the point, they meet w1*x+b1>1, so for all the blue sample points, they are satisfied w1*x+b1>=1. In the same vein, All Black sample points are satisfied with the w1*x+b1<=-1.
These two conditions are the limiting conditions we solve. That is when we look for the objective function above. The above two inequality constraints must be met.
Meet the conditions:
Looking at the inequality condition expression above, is there an impulse to combine two formulas into a single formula? In fact. Above the blue dot the corresponding inequality, we multiply the inequality on both sides of the same time by a y=1; The corresponding inequality of the black point is multiplied by a y=-1 at the same time on both sides of the equation. Then the two equations above can become.
But why is it possible to multiply one 1 on either side of an inequality and the other multiply by-1? Let's look at the corresponding inequality of the blue Dot, which is the expression of the dimension vector X of the sample point. For some training sample points. They are another known category labeled Volume label, here we call it y. Such words. For the Y value of the blue dot, we set it to 1. And the Y value of the black point, we set it as-1, so, to the above two inequalities, respectively, on both sides of the inequality multiplied by the category value Y, we can naturally combine two inequalities into the above one of the inequalities.
This is why we have defined two categories as 1 and 1 when we are generating SVM training samples!
Here, our objective function can be written as:
This kind of objective function is an optimization problem with inequality constraint condition. How does it solve it? The following section continues to explain the problem.
PS : Use or reprint please indicate the source. Use for commercial purposes is prohibited.
Assuming there is a need Word version, or a PDF version of, please contact me, QQ : 358536026
Copyright notice: This article Bo Master original articles, blogs, without consent may not be reproduced.
SVM explicit interpretation of 1__ linear variational problems