"Machine Learning Basics" linear scalable support vector machines

Source: Internet
Author: User
Tags svm

Introduction

Next to a series of machine learning blog posts, I will introduce the commonly used algorithms, and hope that in this process as much as possible to combine the practical application of more in-depth understanding of its essence, hope that the effort will be paid due return.
The next blog post on machine learning is based on the learning of machine learning techniques, focusing on the main tool of feature conversion (feature transforms) from the following three directions:

  1. If there are many feature transformations available now, how can we use these feature transformations, how to control the complexity of feature conversion, from this point of view to stimulate support vector machines (SVM) The development of the algorithm.
  2. How do we mix predictive features to make the whole model more expressive and derive a step-up (Adaptive boosting,adaboost) model from this point of view?
  3. How do we find out the hidden features of the data, or how the machine learns from it, and makes the machine behave better, from this point of view, the previous neural networks have been stimulated into the field of deep learning in recent years.

In this section, we start with a linear support vector machine, extending 1.1 points to a more complex model.

Maximum interval separation over plane (large-margin separating hyperplane)

Given the classification of the problem, the classification plane in three graphs correctly divides the training data into two categories (training error of 0), and the complexity of the linear classification model is the dimension plus 1 (d+1), then how can we explain the right side of the image of the separation plane to better?
Due to the existence of Gaussian noise, for the left side of the graph, if the point near the separation plane with the point is the same class of data is easily judged wrong, so the difference between the two images, the separation plane for the measurement error tolerance, compared to the right side of the graph to avoid the measurement error is better than the robustness of the occurrence.
So, actually, we want the separation plane to be as far away from the training data as possible.


To re-describe the problem, we hope that the split-level data generalization capability is better, and we hope that the greater the spacing of all the points to the line is as good as possible. So our goal is to find the maximum interval of the separation plane, this plane to meet two conditions, one is that the separation plane can correctly separate two types of data (that is, yn=sign (wtxn), yn and wtxn the same number), and the other is the separation interval to take a bit of the recent data from the plane.


Standard maximum interval problem (large-margin problem)

Above, we ask for a maximum interval separation plane, and a problem of optimization to be solved is obtained. Next, we will explore how to calculate the distance from the midpoint to the plane of the optimization solution.
1. Large-margin separating hyperplane


2. Distance to separating Hyperplane
We define Vector w= (W1,..., wd), Vector x= (x1,..., xd), intercept b=w0.


We now consider two points in the plane X ' and X ', which both satisfy the equation wTx ' +b=0, WTx ' +b=0.
X "-X." represents a vector on a plane on which W is the normal vector of the plane.
So the distance from the point to the plane is the distance that the vector of the point to the plane is projected on the normal vector of the plane.



Since the separation plane we are considering is the plane where the positive and negative examples are correctly separated, the formula also needs to satisfy some properties:
We require that the calculated score wtxn+b and yn be the same number so that the absolute value of the distance formula above can be removed.


So, our goal is fixed to the following equation:


3. Margin of special separating hyperplane
Until now, we still can't solve this problem, so we need to make further simplification.
We observe wtx+b=0 and 3wtx+3b=0, and this is the same plane that shrinks. We will think that these coefficients seem to be able to be shrunk.
The reduction of the interval has no effect on the inequality constraint of the optimization problem and has no effect on the optimization of the objective function, so this is an equivalent optimization problem.


The optimization problem of our linear scalable support vector machines is as follows:


4. Standard Large-margin hyperplane problem
The final step, we want to find a better way to solve.
We put the yn (wtxn+b) =1 This condition loosely into the yn (wtxn+b) >=1 condition, and this condition does not affect the final best solution.
Maximizing 1/| | w| | and Minimization of | | w| | The ^2/2 is equivalent, resulting in the ultimate optimization problem:


Optimized solution support vectors

As can be seen, we can find out that the closest point of the distance separation plane is the largest distance from the plane, even if the other irrelevant data points are removed, the result is still true. So, the closest point of the distance separation plane is the only one that determines the plane, so these points are called support vectors.

Solving the general SVM problem

We look at the problem of the optimization of the demand solution and find that it satisfies two characteristics:

  1. The objective function is about the convex two-time function of (B,W)
  2. Inequality constraint is a one-time equation about (b,w)

The problem of optimization of such attributes is called constrained (convex) two-time planning (convex quadratic programming).

Second-time planning

We have compared the problems we have solved with the standard form of the two-time plan, and got our problem in the standard form of representation:


The rest can be solved with software tools that can solve two of times of planning.

Summary

The coefficients of two times planning are given, the model parameters are solved, and the GSVM of SVM model is obtained.


The explanation of SVM and regularization in theoretical analysis

SVM and our direct introduction of the regularization is similar, the difference is that SVM will wtw as the minimization of the objective function, and the ein=0 as a constraint condition.
So SVM is a regularization of the performance, we want to have the largest interval, in fact, it is hoped that the final model can be more robust to the measurement error.


Explanation from the VC theory

Here we do not give specific proof, but from a qualitative point of view to explain.
If we want to add some restrictions to the maximum interval (the maximum interval is greater than a certain constant), this may reduce the number of cases where the data is separated, which makes the VC dimension decrease, which makes the model complexity effectively controlled.


Next

From the above, the VC interpretation can be a simple conclusion, that is, SVM limit can effectively reduce the VC dimension, control complexity, get better generalization ability, and if we can combine the nonlinear transformation, we can get the complex boundary. Next, it will extend to the linear non-divided SVM, using SVM to control the complexity of the method, better use a variety of different characteristics of the transformation.


Reprint please indicate the author Jason Ding and its provenance
GitHub Blog Home page (http://jasonding1354.github.io/)
CSDN Blog (http://blog.csdn.net/jasonding1354)
Jane Book homepage (http://www.jianshu.com/users/2bd9b48f6ea8/latest_articles)
Baidu Search jasonding1354 access to my blog homepage

"Machine Learning Basics" linear scalable support vector machines

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.