from this beginning I will start to try to translate the Alexey Nefedov "support Vector machines:a simple Tutorial" This textbook, this is our tutor highly recommended SVM textbook, has been feeling a face for a long time , simply open the pit translation, also when it is to deepen understanding, after all, I am also smattering, if the translation of the wrong place also hope that the big boys treatise, welcome suggestions, Welcome to discuss.
Well, that's it.
(a) Introduction
In this section, we will describe some of the concepts that are used to define support vector machines (SVM), which are essential for understanding SVM, assuming that the reader understands the knowledge of the vector's inner product and vector norm.
1.1 issues to be resolved
The problem is as follows:
Suppose there is a large (perhaps infinite) set of objects (Observations,patterns,etc.), which can be divided into two categories (or can be labeled as two classes), and some of these objects have already been sorted, and all we need to do is define an algorithm, Using these "samples" (i.e. training sets) that have been sorted into the algorithm for training, the algorithm can classify the group of objects in the case of the least error.
The objects that are classified are generally represented by vectors, although in theory the inner product or kernel function of SVM can be computed in any vector space, but generally the vector space V of the SVM classification object is generally n-dimensional coordinate space (the original is the real coordinate space Do not know how to translate the big guy advice)Rn, in this vector space, vector x is a set of data consisting of n digits XI, namely:
The above mentioned samples (i.e. training sets) that have been categorized are described as:
Where L is the total number of objects in the training set, and Yi belongs to the collection { -1,+1}, it represents two categories of the group object.
A classification algorithm (that is, a classifier) can be represented in the following form:
So if f (x) =1 is the classifier that classifies vector x as (or labeled as) category one, if f (x) =-1 represents the classifier classifying vector x as (or labeled) Category two.
1.2 About Hyper-plane
In the coordinate space Rn as described in the previous section, which is represented as an n-dimensional vector:
(1.1)
Or
(1.2)
The equation (1.1) and the equation (1.2) can describe a set of n-1 dimensions in an n-dimensional vector space consisting of vectors, given a non-zero vector and a scalar, in Rn All vectors can satisfy a set of vectors of equation (1.1) or (1.2), and we name the set as a super-plane (hyperplane). For example, the hyper plane of a one-dimensional vector space R is a point in space (that is, a line), and the R2 of a two-dimensional vector space is a line on a plane, and so on.
In the description of the hyper-plane and in the equation (1.1) (1.2) The vector ω is called the amount of the super-plane (normal vector), the scalar B is called the Super-plane intercept (intercept), Where the normal vector ω determines the direction of the hyper plane in the vector space (the original is orientation), | | ω| | and b ratio (not determined by B alone) determines the distance between the hyper-plane and the origin of the vector space Rn . The normal vector ω is orthogonal to all vectors that are parallel to the hyper plane (it feels strange to say that personal understanding is like the normal vector of a plane in a stereo geometry) or it can be said that if it exists and is satisfied, it is for any. As shown in the following:
for the R2 in the space, the position of the line in the figure can be inferred b<0,ω1>0,ω2>0
A super-plane can divide the vector space Rn into two parts, which are located on both sides of the super-plane, which can be called the positive half plane and the negative half plane (original: positive and negative half-spaces), denoted by the symbol as: and, for any vector, there is, Likewise for any vector, as shown in:
A hyper-plane can be defined by countless groups of Ω and B, if two hyper-planes and respectively defined by Ω1,B1 and ω2,b2, and satisfy and, then the hyper-plane and for the same super-plane. We can zoom in or out of a super-planar ω and B at any scale, and we can specify that Omega and B at that time are an "iconic" parameter to the super-plane. (Similarly,-ω and-B can be defined as arguments, but the position of the positive and negative half planes in the vector space is reversed.) )
The distance between a vector x and a plane in a vector space Rn can be calculated according to the following formula:
(1.3)
It is worth mentioning that a signed quantity, when the vector, at that time, obviously when the vector X is on the super-plane. If so, equation 1.3 becomes:
From Equation 1.3, we can see that the distance between the space far point O and the Super plane is, from this we can draw how Ω and B affect the position and direction of the super plane in the vector space. We will draw conclusions that will help us understand the next maximum margin hyperplane.
1. When b>0, the space coordinate origin will be located on one side of the super plane's positive half plane, that is, when b<0, the coordinate origin will be on one side of the negative half plane of the super plane, that is, the space coordinates are originally located on the super plane when the b=0 is reasonable.
2, when the absolute value of B increases, the super-plane will be shifted away from the direction of the origin of the space, and vice versa will move toward the direction of the space origin. As shown in the following:
3, if you change the value of the normal vector ω while maintaining its norm | | ω| | Unchanged, the super-plane will move around the center of the circle at the center of the origin, the radius of the circle is
(That is, the distance from the origin to the super-plane mentioned above). As shown in the following:
4. If you increase the value of the normal vector while maintaining the direction of the normal vector ω, the hyper-plane will shift toward the direction away from the center and vice versa. So the way to pan the plane is not only to increase the absolute value of the change, as shown in:
1.3 About I don't know how to translate Chinese but it feels very important margin
To sum up, a super-plane will divide its vector space into two half-planes in the following way, and the vectors on these two half-planes will be divided into c1,c2 two classes.
Or
(1.4)
And for the "samples" and the classified training sets that we are talking about in the first section, if there is at least one hyper-plane separating them into the corresponding c1,c2 of their own properties, we are linearly separable (linearly separable), for linear data sets, according to (1.4) can be derived corresponding to the decision function (the word is uncomfortable ...) ) as a classifier that can correctly classify the above vectors as follows:
(1.5)
so the question is: since there will always be countless hyper-planes (with different ω and B) between the two classes of linearly divisible classes in the vector space, which one should we choose as the appropriate classifier?
Answer: The SVM chooses the super plane with the largest margin as the classifier.
so the question comes again:what is margin?
A :margin is defined as follows:
If the super plane will divide the linearly c1,c2, then we define the margin of the super plane to the distance to C1 and the distance to C2. The mathematical expression is as follows:
The definition here is the minimum of the absolute value of the distance between the hyper-plane and all vectors classified as C (that is, Equation 1.3), the mathematical expression is as follows:
Another way to understand margin definition is that the value of margin equals the distance between C1 and C2 in the direction of the hyper-planar normal vector ω. A set of projections of all vector points classified as C1 in the direction of the hyper-planar normal vector ω, a set of projections of all vector points classified as C2 in the direction of the hyper-planar normal vector ω (really clumsy ...). ), you have:
which
As shown in the following:
Margin can be derived from our definition of margin in the following properties:
1, when the super-plane divides C1 and C2, the size of the super-plane margin depends only on the normal vector ω of the super-plane, and is independent of intercept B.
2, for any C1 and C2 split open super-plane, there are
which
It is not difficult to conclude that between C1 and C2, the super-plane with the largest margin is also present in countless more, as shown.
< Span style= "font-size:18px" > < Span style= "font-size:18px" > < Span style= "font-size:18px" > < Span style= "font-size:18px" >
The first chapter concludes, the next chapter is "Maximum margin hyperplane for linearly separable classes"
Primary translation support Vector machines:a simple Tutorial (i)