The Lagrange multiplier method (Lagrange Multiplier) and Kkt (Karush-kuhn-tucker) conditions are important methods for solving constrained optimization problems, using Lagrange multiplier method when there are equality constraints, and using KKT conditions when there are unequal constraints. The premise is: only when the objective function is a convex function, the use of these two methods to ensure that the optimal solution is obtained.
For unconstrained optimization problems, there are many classical solutions, see unconstrained optimization method.
Lagrange Multiplier method
First of all, what is the Lagrange multiplier method, and then why.
$\min\;f (x) \\s.t.\;h_{i} (x) =0\;\;\;\;i=1,2...,n$
This problem is converted to
\begin{equation}min\; [F (x) +\sum_{i=1}^{n}\lambda_{i}h_{i} (x)]\label{lagrange}\end{equation}
Wherein $\lambda_{i}\ne{0}$, called Lagrange multiplier.
Let's look at how Wikipedia is explaining the rationality of Lagrange multiplier method.
A two-dimensional optimization problem exists:
$\min\;f (x, y) \\s.t.\;g (x, y) =c$
We can draw to help think.
The Green Line marks the trajectory of the point that constrains $g (x, y) =c$. The Blue Line is the contour of the $f (x, y) $ . The arrows represent the slope, parallel to the normals of the contour line.
From the diagram you can see intuitively that the slope of F and G is parallel to the optimal solution.
$\bigtriangledown[f (x, y) +\lambda (g (x, Y)-1)]=0\;\;\;\;\lambda\ne{0}$
Once the value of $\lambda$ is calculated, it is easy to find the point corresponding to the unconstrained extremum and the extremum by nesting it into the lower formula.
$F (x, y) =f (x, y) +\lambda (g (x, Y)-c) $
The new equation $f (x, y) $ is equal to $f (x, y) $ when the extremum is reached, because $g (x, y)-c$ is always equal to zero when $f (x, y) $ reaches the extremum.
\eqref{lagrange} has a minimum derivative of 0, which is $\bigtriangledown{f (x)}+\bigtriangledown{\sum_{i=1}^{n}\lambda_{i}h_{i} (x)}=0$, That is to say $f (x) $ and $h (x) $ gradient collinear.
Kkt conditions
First look at what the Kkt condition is, and then explain why.
$\begin{equation}let\; L (X,\MU) =f (x) +\sum_{k=1}^q\mu_{k}g_{k} (x) \end{equation}$
Where $\mu_{k}\ge{0},g_{k} (x) \le{0}$
$\because \left.\begin{matrix}\mu_{k}\ge{0}\\g_{k} (x) \le{0}\end{matrix}\right\}$=>$\mu{g (x)}\le{0}$
$\therefore$ \begin{equation}\max_{\mu}l (X,\MU) =f (x) \label{a}\end{equation}
$\therefore$\begin{equation}\min_{x}f (x) =\min_{x}\max_{\mu}l (X,\MU) \label{firsthalf}\end{equation}
$\max_{\mu}\min_{x}l (X,\MU) =\max_{\mu}[\min_{x}f (x) +\min_{x}\mu{g (x)}]=\max_{\mu}\min_{x}f (x) +\max_{\mu}\min_{ X}\mu{g (x)}=\min_{x}f (x) +\max_{\mu}\min_{x}\mu{g (x)}$
and $\because\left.\begin{matrix}\mu_{k}\ge{0}\\g_{k} (x) \le{0}\end{matrix}\right\}$=>$\min_{x}\mu{g (x)}=\left\ {\begin{matrix}0 & If\;\mu=0\;or\;g (x) =0\\-\infty & If\;\mu>0\;and\;g (x) <0\end{matrix}\right.$
$\therefore \max_{\mu}\min_{x}\mu{g (x)}=0$ at this time $\mu=0\;or\;g (x) =0$
\begin{equation}\therefore \max_{\mu}\min_{x}l (X,\MU) =\min_{x}f (x) +\max_{\mu}\min_{x}\mu{g (x)}=\min_{x}f (x) \ Label{secondhalf}\end{equation} at this time $\mu=0\;or\;g (x) =0$
Union \eqref{firsthalf},\eqref{secondhalf} We get $\min_{x}\max_{\mu}l (X,\MU) =\max_{\mu}\min_{x}l (X,\MU) $
i.e. $\left.\begin{matrix}l (X,\MU) =f (x) +\sum_{k=1}^q\mu_{k}g_{k} (x) \\\mu_{k}\ge{0}\\g_{k} (x) \le{0}\end{matrix}\ Right\}$=>$\min_{x}\max_{\mu}l (X,\MU) =\max_{\mu}\min_{x}l (X,\MU) =\min_{x}f (x) $
We put $\max_{\mu}\min_{x}l (X,\MU) $ as the original problem $\min_{x}\max_{\mu}l (X,\MU) $ for the duality problem, the above formula shows that when a certain condition is met the original problem, the solution of duality, and $\min_{x}f (x) $ Is the same, and $\mu=0\;or\;g (x^*) =0$ at the optimal solution $x^*$. Put $x^*$ into \eqref{a} $\max_{\mu}l (X^*,\MU) =f (x^*) $, \eqref{secondhalf from $\max_{\mu}\min_{x}l (X,\MU) =f (x^*) $, so "(x ^*,\MU) =\min_{x}l (X,\MU) $, which indicates that $x^*$ is also the extreme point of (X,\MU) $, which is $\frac{\partial{l (X,\MU)}}{\partial{x}}|_{x=x^*}=0$.
Finally, summarize:
$\left.\begin{matrix}l (X,\MU) =f (x) +\sum_{k=1}^q\mu_{k}g_{k} (x) \\\mu_{k}\ge{0}\\g_{k} (x) \le{0}\end{matrix}\ Right\}$=>$\left\{\begin{matrix}\min_{x}\max_{\mu}l (X,\MU) =\max_{\mu}\min_{x}l (X,\MU) =\min_{x}f (x) =f (x^*) \ \ \mu_{k}{g_{k} (x^*) =0}\\\frac{\partial{l (X,\MU)}}{\partial{x}}|_{x=x^*}=0\end{matrix}\right.$
The KKT condition is the generalization of the Lagrange multiplier method, and if we include the equality constraint and the inequality constraint together, we will show:
$\left.\begin{matrix}l (X,\LAMBDA,\MU) =f (x) +\sum_{i=1}^{n}\lambda_{i}h_{i} (x) +\sum_{k=1}^q\mu_{k}g_{k} (x) \\\ Lambda_{i}\ne{0}\\h_{i} (x) =0\\\mu_{k}\ge{0}\\g_{k} (x) \le{0}\end{matrix}\right\}$=>$\left\{\begin{matrix}\ Min_{x}\max_{\mu}l (X,\LAMBDA,\MU) =\max_{\mu}\min_{x}l (X,\LAMBDA,\MU) =\min_{x}f (x) =f (x^*) \\\mu_{k}{g_{k} (x^*) =0 }\\\frac{\partial{l (X,\LAMBDA,\MU)}}{\partial{x}}|_{x=x^*}=0\end{matrix}\right.$
Note: $x, \lambda,\mu$ are vectors.
$\frac{\partial{l (X,\LAMBDA,\MU)}}{\partial{x}}|_{x=x^*}=0$ indicates $f (x) $ at the extreme point $x^*$ the gradient is each $h_{i} (x^*) $ and $g_{k} (x^*) $ Linear combination of gradients.
Reprint http://www.cnblogs.com/zhangchaoyang/articles/2726873.html
Lagrange Multiplier method and Kkt condition