Before the SVM-convex optimization and duality problem

Source: Internet
Author: User
Tags svm

This article is a little bit of knowledge about optimization problems before SVM, which is used in SVM. Considering the complexity of SVM, the basic knowledge of optimization is put forward, this article, so, this article will not involve the optimization problem of many deep-seated problems, but in the scope of personal knowledge of the SVM is involved in the optimization problem.

One, convex optimization problem

In the optimization problem, the convex optimization problem has been studied extensively because of its excellent properties (local optimal solution is the global optimal solution).

For an optimization problem with constraints:

\[\left\{\begin{matrix}\underset{x}{\mathop{\min}}\,f (x) \\\begin{matrix}s.t. & X\in C \\\end{matrix} \\\end{mat RIX} \right.\]

Wherein, $F (x) $ is a convex function, the feasible domain of variable $\text{x}$ $\text{c}$ is a convex set, then this optimization problem is called a convex optimization problem.

The form of the above constraints is more explicit, a convex optimization problem can be written as:

\[\left\{\begin{matrix}\underset{x}{\mathop{\min}}\,f (x) \\\begin{matrix}s.t. & {{g}_{i}} (x) \le 0 \\{} & {{h} _{i}} (x) =0 \\\end{matrix} \\\end{matrix} \right.\]

Of course, $f (x) $ is still a convex function, but there are certain requirements for constraints: ${{g}_{i}} (x) $ is a convex function; ${{h}_{i}} (x) $ is an affine function. This requirement is, of course, to ensure that the feasible field is a convex set.

${{g}_{i}} (x) $ is a convex function in inequality constraints, while the horizontal intercept set of a convex function $\{x|{ {G}_{i}} (x) \le \alpha \}$ is a convex set (the nature of the convex function), which makes the inequality constraint guarantee the feasible domain as convex set;

For the Equality Constraint ${{h}_{i}} (x) =0$ can be written as:

\[\left\{\begin{matrix}{{h}_{i}} (x) \le 0 \\{{h}_{i}} (x) \ge 0 \\\end{matrix} \right.\]

To make the opportunity for the $x$ of the condition to be convex sets, it is required that ${{h}_{i}} (x) $ is both a convex function and a concave function, so that ${{h}_{i}} (x) $ can only be an affine function.

The above is the general form of convex optimization problem. The common linear programming, two-times programming, two-time constrained two-time planning and other optimization problems are convex optimization problems.

Second, Lagrange duality

Throw off the convex optimization problem and return to the general optimization problem.

General optimization problems can be written in the form of:

$\left\{\begin{matrix}\underset{x}{\mathop{\min}}\,f (x) \\\begin{matrix}s.t. & {{g}_{i}} (x) \le 0 \\{} & {{h}_ {i}} (x) =0 \\\end{matrix} \\\end{matrix} \right.$

Of course, there is no requirement for $f (x) $, ${{g}_{i} (x) $, ${{h}_{i}} (x) $.

According to the Lagrangian method, the corresponding Lagrangian function is:

$L (X,\alpha, \beta) =f (x) +\sum\limits_{i}{{{\alpha}_{i}}{{g}_{i}} (x)}+\sum\limits_{i}{{{\beta}_{i}}{{h}_{i}} (x)} $

where $\alpha $, $\beta $ is the Lagrange multiplier (all vectors, the length of the inequality constraints and the number of equality constraints respectively), and ${{\alpha}_{i}}\ge 0$, $\beta $ arbitrary.

To define a function:

${{\theta}_{p}} (x) =\underset{\alpha, \beta: {{\alpha}_{i}}\ge 0}{\mathop{\max}}\,l (X,\alpha, \beta) $

Easy to find:

${{\theta}_{p}} (x) =\left\{\begin{matrix}f (x) & {{g}_{i}} (x) \le 0\and {{h}_{i}} (x) =0 \\+\infty & {{g}_{i}} (x) & gt;0| | {{H}_{i}} (x) \ne 0 \\\end{matrix} \right.$

If the original constraints are met then "(X,\alpha, \beta) =f (x) +\sum\limits_{i}{{{\alpha}_{i}}{{g}_{i}} (x)}+\sum\limits_{i}{{{\beta}_{i} }{{h}_{i}} (x)}$, the last item is zero, the second item is to get the maximum value, because ${{g}_{i}} (x) \le 0$, so you can only take $\alpha =\overset{\to}{\mathop{0}}\,$ to get the maximum value of 0, so ${{ \theta}_{p}} (x) =\underset{\alpha, \beta: {{\alpha}_{i}}\ge 0}{\mathop{\max}}\,l (X,\alpha, \beta) =f (x) $;

If you violate the original constraints, such as the existence of a certain constraint ${{g}_{i}} (x) >0$, then you can take ${{\alpha}_{i}}$ arbitrarily large, so ${{\theta}_{p}} (x) =+\infty $. Violation of the equality Constraint ${{h}_{i}} (x) =0$ is similar.

So it can be considered that ${{\theta}_{p}} (x) $ is to absorb the constraints in the principle optimization problem, and the original constrained optimization problem becomes unconstrained optimization problem (relative to the original variable $x$ unconstrained), that is, the original optimization problem can be written as:

$\begin{align}\underset{x}{\mathop{\min}}\,{{\theta}_{p}} (x) =\underset{x}{\mathop{\min}}\,\underset{\alpha, \ Beta: {{\alpha}_{i}}\ge 0}{\mathop{\max}}\,l (X,\alpha, \beta) \end{align}$

Now, the claim (1) is the original problem , which is equivalent to the original constrained problem.

The dual problem is obtained in the (p) Min and Max Exchange Order:

$\begin{align}\underset{\alpha, \beta: {{\alpha}_{i}}\ge 0}{\mathop{\max}}\,{{\theta}_{D}} (\alpha, \beta) =\ Underset{\alpha, \beta: {{\alpha}_{i}}\ge 0}{\mathop{\max}}\,\underset{x}{\mathop{\min}}\,L (X,\alpha, \beta) \end{ align}$

Where ${{\theta}_{d}} (\alpha, \beta) =\underset{x}{\mathop{\min}}\,l (X,\alpha, \beta) $.

${{p}^{*}}$ is the optimal solution of the original problem, and the optimal variable of the corresponding optimal solution is ${{x}^{*}}$, then the ${{p}^{*}}=f ({{x}^{*}}) $;

${{d}^{*}}$ is the optimal solution of dual problem, the optimal variable of corresponding optimal solution is ${{\alpha}^{*}}$, ${{\beta}^{*}}$, then ${{d}^{*}}={{\theta}_{d}} ({{\alpha}^{*}}, {{\beta}^{*}}) $

The following explains ${{d}^{*}}\le {{p}^{*}}$.

For any $\alpha, \beta $ ($\alpha \ge 0$):

$\begin{align}{{\theta}_{d}} (\alpha, \beta) &=\underset{x}{\mathop{\min}}\,l (X,\alpha, \beta) \\& \le L ({{x} ^{*}},\alpha, \beta) \ \ & =f ({{x}^{*}}) +\sum\limits_{i}{{{\alpha}_{i}}{{g}_{i}} ({{x}^{*}})}+\sum\limits_{i}{{{ \beta}_{i}}{{h}_{i}} ({{x}^{*}})} \\& \le F ({{x}^{*}}) \\& ={{p}^{*}} \\\end{align}$

The first not equal to the establishment is obvious, which is directly obtained by the definition of $\underset{x}{\mathop{\min}}\,l (X,\alpha, \beta) $; The second one is not equal because ${{x}^{*}}$ is a viable solution, so the constraints ${ {G}_{i}} (x) \le 0$ and ${{h}_{i}} (x) =0$ are satisfied, so $\sum\limits_{i}{{{\alpha}_{i}}{{g}_{i}} ({{x}^{*}})}\le 0$, $\sum\limits_ I {{{\beta}_{i}}{{h}_{i}} ({{x}^{*}})}=0$.

Due to the arbitrary nature of $\alpha $, $\beta $ in the above derivation process, ${{d}^{*}}={{\theta}_{d}} ({{\alpha}^{*}},{{\beta}^{*}}) \le {{p}^{*}}$, So solving duality problem is the lower bound of the optimal solution of the original problem.

Usually, the duality problem has a better form than the original problem (there is a "convex optimization problem" regardless of the form of the original problem, but it has not been proved. ), so that when the original problem is not solved, the duality problem can be solved instead. The problem is that in general there is ${{d}^{*}}\le {{p}^{*}}$, so solving duality problem can only get the lower bound of the original problem solution, and cannot guarantee the ${{d}^{*}}={{p}^{*}}$.

${{d}^{*}}={{p}^{*}}$ can be guaranteed when the original problem satisfies some conditions.

Slater conditions : Presence $x$ , making inequality constrained ${{g}_{i}} (x) \le 0$ strictly established, i.e. ${{g}_{i}} (x) =0$ .

When the original problem is a convex optimization problem, and satisfies the Slater condition, there is ${{d}^{*}}={{p}^{*}}$, so that the solution of the original problem and the dual problem is consistent and the dual problem can be solved. Obviously, Slater is a sufficient condition for the convex optimization problem to be equivalent to its duality problem.

The KKT condition is the necessary condition for the equivalence between the original problem and the duality problem. Consider the general optimization problem (not necessarily the convex optimization), if there is ${{d}^{*}}={{p}^{*}}$:

                                                         \[\begin{align}{{d}^{*}}&={{\theta}_{d}} ({{\alpha}^{*}},{{\beta}^{*}}) \\& =\underset{x}{\mathop{\min}}\,l (X,{{\alpha}^{*}},{{\beta}^{*}) \\& \le L ({{X}^{*}},{{\alpha}^{*}},{{\ Beta}^{*}}) \\& =f ({{x}^{*}}) +\sum\limits_{i}{{{\alpha}_{i}}^{*}{{g}_{i}} ({{x}^{*}})}+\sum\limits_{i}{{{\ Beta}_{i}}^{*}{{h}_{i}} ({{x}^{*}})} \\& \le F ({{x}^{*}}) \\& ={{p}^{*}} \\\end{align}\]

         because of ${{d}^{*}}={{p}^{*}}$, so the above derivation process so the "$\le $" should be taken to the equals sign. The first equals sign gets \[\underset{x}{\mathop{\min}}\,l (X,{{\alpha}^{*}},{{\beta}^{*}}) =l ({{X}^{*}},{{\alpha}^{*}},{{\beta}^{* }}) \], indicating that ${{x}^{*}}$ is an extreme point of \[l (X,{{\alpha}^{*}},{{\beta}^{*}) \], so \[l (X,{{\alpha}^{*}},{{\beta}^{*}) \] in ${{x} ^{*}}$ at zero, $\frac{\partial L (X,{{\alpha}^{*}},{{\beta}^{*})}{\partial x}{{|} _{{{x}^{*}}}}=0$; the second equal sign gets \[f ({{x}^{*}}) +\sum\limits_{i}{{{\alpha}_{i}}^{*}{{g}_{i} ({{x}^{*}})}+\sum\limits_{i} {{{\beta}_{i}}^{*}{{h}_{i}} ({{x}^{*}})}=f ({{x}^{*}}) \], so \[\sum\limits_{i}{{{\beta}_{i}}^{*}{{h}_{i}} ({{x}^{*} })}=0\], \[\sum\limits_{i}{{{\alpha}_{i}}^{*}{{g}_{i}} ({{x}^{*}})}=0\],\[\sum\limits_{i}{{{\beta}_{i}}^{*}{{h}_ {i}} ({{x}^{*}})} =0\] is obvious, because \[{{h}_{i}} ({{x}^{*}}) is inherently =0\], with emphasis on the original \[\sum\limits_{i}{{{\alpha}_{i}}^{*}{{g}_{i}} ({{x}^{*}}) }\le 0\] Now takes the equals sign. Because\[\sum\limits_{i}{{{\alpha}_{i}}^{*}{{g}_{i}} ({{x}^{*}})}=0\], so ${{\alpha}_{i}}>0\rightarrow {{g}_{i}} ({{x}^{*}}) =0$.

Combining the above two points, in addition to the original constraints, you can get KKT conditions:

$\left\{\begin{matrix}\frac{\partial L (X,{{\alpha}^{*}},{{\beta}^{*}})}{\partial x}{{|} _{{{x}^{*}}}}=0 \\\sum\limits_{i}{{{\alpha}_{i}}^{*}{{g}_{i}} ({{x}^{*}})}=0 \\{{\alpha}_{i}}^{*}\ge 0 \\\begin{ Matrix}{{g}_{i}} ({{x}^{*}}) \le 0 \\{{h}_{i} ({{x}^{*}}) =0 \\\end{matrix} \\\end{matrix} \right.$

The KKT condition is a necessary condition for the ${{d}^{*}}={{p}^{*}}$, and the properties satisfying for the optimal point are given when the original problem and the duality problem are equivalent.

Although the KKT condition is a necessary condition for ${{d}^{*}}={{p}^{*}}$, when the original problem is a convex optimization problem, it is escalated to a sufficient condition, that is to find ${{x}^{*}},{{\alpha}^{*}},{{\beta}^{*}}$ If the above five conditions are met, then the original problem has the same solution as the duality problem and is distributed in ${{x}^{*}}$ and $ ({{\alpha}^{*}},{{\beta}^{*}}).

In addition, according to the KKT condition \[\sum\limits_{i}{{{\alpha}_{i}}^{*}{{g}_{i}} ({{x}^{*}})}=0\] can be obtained, ${{g}_{i}} ({{x}^{*}}) <0\ RightArrow {{\alpha}_{i}}^{*}=0$, on the other hand, only ${{g}_{i}} ({{x}^{*}}) =0$,${{\alpha}_{i}}^{*}$ is probably not 0, which is an important attribute used in SVM.

Before the SVM-convex optimization and duality problem

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.