[Machine Learning] Mathematical principle of SVM---hard interval maximization

Source: Internet
Author: User
Tags svm

Note: The default for the following is 2 classification

1, SVM principle:

(1) Mapping of input space to feature space

The so-called input space is the input sample set, some cases of input space and feature space is the same, some of the situation is different, and the model definition is defined to the feature space, the feature space refers to all the input eigenvector, eigenvector is the use of numerical values to represent the n-dimensional vector, the input space to the feature space mapping, That is, the use of the characteristics of the numerical quantification (I understand), and the probability of the random variables are the same form, the random variable is the sample space to the real set of mappings, for example: the sample space of the coin toss is {positive, reverse}, mapping to the real set is {1,0}

(2) Maximizing the optimal separation of the space between the super plane

The so-called separation of the super-plane, which is capable of dividing all the eigenvector into two classes of planar or straight line (features two into a straight line, multiple into a plane, that is, a few dimensions of the problem), such as two-dimensional case, a line, the coordinate system can be divided into two categories, the best meaning can be all points as far The maximal interval means that the distance from the nearest point to the straight line in the two types of areas is the largest, so the unique interval maximization of the optimal separation of the super-plane w*x+b*=0, because the distance is therefore determined by the normal vector and intercept, w* is the normal vector, b* is the Intercept, This equation is explained here: the normal vector is the vertical direction from the point to the plane, in the geometry of the plane equation, the plane equation can be set to the general equation ax+by+cz+d=0, which can be considered as (a,b,c) * (x, y, z) is w*= (a,b,c), eigenvector x= (x, Y, Z ), B*=d. In the same vein, the linear line can be set to ax+by+c=0, so the w*,b* is obtained when the super plane is obtained.

(3) Classification decision function

The so-called categorical decision function is just a symbolic function f (x) =sign (w*x+b*), sign is a symbolic function, take the input value of the symbol (plus or minus), after the acquisition of the normal vector and intercept and the input eigenvectors after the operation of the result into sign () to be classified

2. Basic concept

(1) Function interval

Above has been said to find the most recent distance of the super plane distance of the largest plane, so the distance is a very important step, according to the point to the plane distance formula molecule is |w*x+b|, because the denominator is the same, so the |w*x+b| can be relative to the size of the distance. Y (vector) here represents the classification of each eigenvector, and by the top already know that the classification decision is the symbol of the order, so you can determine w*x+b and y symbol (relative to the element) is the same, so you can use Y (w*x+b) to represent the classification of correctness and confidence, this is the function interval:

Note: The function interval of the hyper plane about the feature space is the minimum value of the function interval of the value all eigenvectors to the super plane

                          

(2) Geometry interval

Using the function interval to measure, there will be a problem, when the normal vector and intercept at the same time expand twice times, the super-plane is unchanged, but the function interval is twice times the original, so the concept of the introduction of geometric spacing, in fact, the geometric interval is the function interval divided by the normal vector of the module , The specific formula is as follows:

                            

At the same time, the geometric interval of the feature space is the minimum value of all eigenvectors to the geometric interval of the super plane.

                          

(3) Support vector

In the case of linear separable, the closest eigenvector of the distance separating the super-plane in the feature space is the support vector

3, hard interval maximization of the solution method

Here to explain what is a hard interval maximization, this is relative training data set or feature space, if the data is completely linear can be divided, then the learning model can be called hard interval support vector machine, in addition to the soft interval support vector machine (approximate linear can be divided), non-linear support vector machine, etc., The ultimate goal is to seek the w*,b*

(1) Maximum interval method derivation process

According to the above statement can be obtained w* and b* to two conditions above, first of all, the largest geometric interval of the feature space, the second constraint is that all the geometric interval must be greater than the geometric interval of the feature space, then the constraint optimization problem is as follows:

                  

According to the relationship between the geometric interval and the function interval (all about the super plane), this problem can be described as follows:

                  

The above equation can be optimized, the first formula above the numerator (about the super-plane function interval) will change, and the second equation is not equal to the right (about the super-plane function interval) also changes in the same amplitude, so W and B will also change the same amplitude, so about the function interval changes in the super-plane does not affect the above equation , so you can use a replacement. At this point, max (1/| | w| |) With min (1/2*| | w| |) Is the same, one in the denominator, one in the numerator, as to why the 1/2*| is taken | w| | ^2, when the derivative of the good calculation 1/2*2=1, as follows:

                        

(2) Dual algorithm of learning

By introducing Lagrange duality, the optimal solution of the original problem is obtained by solving the duality problem, and the advantages are as follows: first, the difficulty of calculation can be reduced, and then the natural introduction of kernel function (non-linear data processing method) can be constructed by using Lagrange duality to even function (more please refer to the statistical learning method) as below:

                  

Here the theorem is used to seek w* and b*, as follows:

                        

Therefore, the LaGrand day operator can be obtained according to the constraint optimization problem above, then the w* and b*

[Machine Learning] Mathematical principle of SVM---hard interval maximization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.