SVM Learning -- solving quadratic planning problems

Source: Internet
Author: User

The last article mentioned how to solve the optimization problem:

Make a slight change to it, as shown below:

This is a constraint optimization problem. Let's go further to a secondary planning problem. Let's review the constraints optimization problem:

Definition 1: The constraint nonlinear problem is,

Both of them are defined in the real-value continuous function, and at least one of them isNon-linear(On the contrary, linear constraints optimization problem). M is a positive integer called a target function and a constraint function. If the target function is a quadratic function, it is called a quadratic programming problem.All constraintsA set composed of vertices is called a feasible region. These vertices are called feasible vertices.

Definition 2: Farkas theorem. For a given n-dimensional vector and B, for any satisfied vector p, it is necessary that B is in the convex cone formed by the vector, that is, yes :,.

How can we understand the description "a vector is in a convex cone formed by several other vectors? You can see,

By using the parallelogram law, we can see that vector B is in a convex polygon formed from the ground up and exert its imagination. In space, this is not like a convex cone.

Definition 3: If a feasible point exists and meets the constraints

These are all valid constraints.

It is called the K-T point.

The geometric significance of K-T points can be seen from:

Obviously X1 is a K-T point, while X2 is not. In general nonlinear programming, the K-T condition is the necessary condition of the optimal solution, but it is not its sufficient condition. At this time, the K-T point is not necessarily the most advantage, but for the Convex Programming Problem, k-T condition is a sufficient condition for the optimal solution; By The Way, convex planning is a good comrade, its local optimal solution is the global optimal solution, so its K-T is the global advantage.

Theorem 1: it is the local minimum point of the constraint problem. The set of the linearity feasible direction at the point is equal to the set of the feasible direction of ITS sequence, so that:

These are all valid constraints.

How can we meet the condition "the set of the linearity feasible direction at a point is equal to the set of the feasible direction of ITS sequence? As long as all the valid constraints are linear functions, it must be a K-T point.

Theorem 2: first-order optimization condition: for feasible points, if the objective function and all valid constraints are micro-located and any and non-zero, where the sequence is feasible, the direction vector D satisfies:, It is a strict local minimum point. This means that when moving to any direction at a certain point will lead to an increase in the target function value, this point is not a local minimum point.

Theorem 3: Second-Order optimum condition: set as a K-T point, is the corresponding Laplace multiplier. If D is non-zero, the linear zero-constraint direction at the place, it is a strict local minimum.

Inference 1: set as a K-T point, is the corresponding Laplace multiplier, if all satisfied non-zero vector D has, it is a strict local minimum point.

For the preceding constraint nonlinear programming problem, if it is a quadratic function and all the constraints are linear functions, it becomes a quadratic programming problem. This is written in the following form:

Theorem 4: If it is a feasible point of the quadratic planning problem, it is a local minimum point.PrerequisitesYes: if and only if there is a Laplace multiplier, make:

,

True (that is, K-T point) and satisfied with everything

(E is the valid constraint of equations, and I () is the valid constraint of inequalities)

All vectors d have :.

Theorem 5: If H is set to a semi-Definite Matrix (all feature values are greater than or equal to 0), then the global minimum point of the quadratic programming problem is only when it is a local minimum point or a K-T point.

When H is a semi-Definite Matrix and the objective function is convex function, the quadratic planning is called convex quadratic planning, and any K-T point of H is the smallest point. Looking back at the problem we are going to solve at the beginning, the target function is obviously a convex function, so it is a convex quadratic planning problem, so there must be a global minimization (really good !).

At this point, we can start to solve the problem at the beginning:

It has been determined that it is a convex quadratic programming problem, so we can use its Laplace function:

After the bitwise AND derivative are obtained:

Introduce the original Laplace function and convert it into a dual problem:

In this way, the problem of optimization with inequality constraints is transformed into an optimization problem with only equality constraints through its dual form, that is, the following optimization problem W:

After the result is obtained, the maximum geometric interval is obtained to obtain the optimal classification hyperplane.

K-T points to meet another condition is:, this statement explains what problems?

1. Obviously, the Laplace coefficients of input points whose function intervals are not equal to 1 must be 0 (these points are non-positive factors ), the Laplace coefficient of the input vertex whose function interval is exactly 1 is not 0 (These vertices are positive factors), which means that the final classification hyperplane is the boundary point with the function interval of 1, so these input points areSupport Vector). It can be seen from this that the support vector machine has strong anti-interference capability, and the disturbance to non-positive factors has no influence on the optimal solution;

2. Where I is the support vector, because:

, Our target function

(Note that this is a constant with constraints)

Therefore, through this dual conversion, we get the implementation geometric interval {(\ sum \ limits _ {I \ In support \ quad vector} \ Alph ^ * _ I )} ^ {1/2} "src =" http://chart.apis.google.com/chart? CHT = TX & chlorophyll = % 5 cgamma % 3d % 5 cfrac % 7B1% 7D % 7b % 7C % 7cw % 5E * % 7C % 7C % 7D % 3d % 0d % 0a % 7B (% 5 csum % 5climits _ % 7bi + % 5cin + support + % 5 cquad + vector % 7D % 5 calph % 5E * _ I) % 7D % 5E % 7B1% 2f2% 7D ">.

Through the above derivation, we can also see that converting the Second Plan into its dual problem can simplify the constraints, so let's remember the optimization problem W, which is a great conversion, this formula also has a feature: "data only appears in the inner accumulation ".

In the previous discussion, we discussed how to find the maximum interval hyperplane to divide two types of data when the input sample is linearly segmented. What should we do if the input sample is linearly inseparable, isn't the previous discussions in vain? After learning SVM, we can know where it is. If we can map Input samples to another high-dimensional space through some function ing and linearly divide the input samples, I can't use the results of the previous discussion. I still remember that "data only appears in the inner accumulation." If the ing relationship is:, the inner product is changed, if a method can be used to calculate the inner product directly, the two steps of the input sample "ing from low-dimensional space to high-dimensional space" and "inner product after ing" are merged into one step, in this way, the direct calculation method is the kernel function method. In the next article, let's learn the kernel function. The theoretical part of SVM is still very mathematical!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.