Andrew ng Machine learning note +weka correlation algorithm implementation (four) SVM and primitive duality problem

Source: Internet
Author: User
Tags svm andrew ng machine learning

This blog mainly explains Ng's class sixth to seventh video, which covers the function interval and geometry interval, the optimal interval classifier (Optimal Margin
Classifier), primitive/dual problem (primal/dual problem), and several aspects of dual problem of SVM.

    • function interval and geometry interval

function interval (functional margin) and geometric interval (geometric margin) are the basis and premise of understanding SVM.
If y∈{-1,1}, instead of being 0, 1, we are able to perform a classifier function sample such as the following:

The B-parameters here are in fact the original X0, so we can know that W and b determine a definite hyper-plane.
Given a training sample, we define the function interval:

When Y (i) =1, in order to make the function interval maximum, we want to make a large positive number, when Y (i) =-1. We're going to make the upper type take a very small negative value.
Next we can define the global function interval:

This means that the global function interval depends on the sample point with the smallest function interval.
But it's not hard at the same time to find that there is a problem if you increase W and b at the same time. Can be very easy to add the function interval. However, it is meaningless to solve the problem in practice. We want to limit W and B. Normalization conditions need to be added.
Next, the geometric interval is introduced:

For the above picture, if there is a bit of B on the cutting surface, it is a projection on the cutting surface. This interval we use gamma, then we are very easy to know that the direction of the BA is actually the cutting surface gradient direction. Its unit vector is:, its length is 1, the direction and the BA direction are consistent. So if the coordinates of point A are:,
So we're not hard to represent the coordinates of point B:

Substituting coordinates into the equation of the cutting surface
We get the following formula:

So:

For the global gamma, we need to multiply the category:

This is the geometric interval of the point to plane. We are not difficult to see, when | | w| | = 1 o'clock. The geometric interval is the function interval. The same we can define global geometric intervals:

    • Optimal interval classifier

Our goal is to find a super plane. The distance between this plane and its nearest point is the largest, not the distance from other points to the plane.
The formal performance sample is as follows:

The next goal is to find the number of the cutting surface, W and B. But we see that the constraint of the above function is | | w| | = 1. This is a spherical, typical non-convex optimization problem, difficult to solve. We need to make a proper transformation. Consider the relationship between the geometric interval and the function interval:

We are able to translate the original problem into:

We'd better make it again:
So the original problem is to find 1/| | w| | The maximum value. i.e. | | w| | The maximum value of the square, the original problem can be further converted to the following questions:

This problem becomes a typical two-time planning problem, the original problem becomes able to solve.

    • Lagrange duality

In order to solve the above problems. Let's look at one of the simplest equation constraints:

For the above questions. We can usually solve it by Lagrange multiplier method. Introducing variable β:

The above Lagrange multiplication equations are constructed. The original problem can be obtained by the partial derivative of W and β respectively.

and the partial derivative is zero to solve the W and β. Detailed mathematical proofs are not explained here, the undergraduate "calculus" has been learned.
Here we are going to generalize the equation to an inequality, taking into account the following solutions:

There is an inequality constraint condition. Still constructs Lagrange expression:

Because of two expressions. We're going to introduce αβ two variables.
In accordance with the previous solution, this problem solver will encounter a very large problem:
Since G (W) <0, we will α= positive infinity, then the expression value becomes negative infinity, which makes no sense. Therefore, we must avoid this situation and define the following formula:

We make α>0, when only G and H satisfy the constraints. Θ (W) is F (w), i.e.:

So the original problem of the Min F (w) is equivalent to the minθ (W).
We make:

Once again, define a function:

and make:

There are the following relationships:

This is the minimum value of the minimum value less than or equal to the maximum value. This problem is the dual problem of the original problem. Relative to the original problem is only the change of the order of Min and Max, here to take the equal sign. Conditions such as the following descriptive narrations:

① If a constrained inequality GI is a convex (convex) function (a linear function belongs to a convex function)
② constrained equation hi are affine (affine) functions (Shaped like H (w) =wtx+b)
③ and exists W makes for all I,gi (W) < 0

In these if, there must be ω?,α?,β, so that Omega is the solution of the original problem, α?,β? is the solution of the duality problem, and p? = d? = L (ω?,α?,β?). This ω?,α?,β need to meet KKT (karush-kuhn-tucker) conditions. Kkt conditions such as the following:

If ω?,α?,β satisfies the Kuhn-Tucker condition, then they are the solution to the original and dual problems.
It can be seen from the above formula:
α?> 0, then GI (w?) = 0.

The w that satisfies gi (w) = 0 is on the boundary of the feasible domain. This time the W talent is really practical. The inner point that satisfies the GI (w) <0 is meaningless. This leads to the concept of SVM's support vector.

Andrew ng Machine learning note +weka correlation algorithm implementation (four) SVM and primitive duality problem

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.