Gradient optimization (II.)

Last Update:2018-08-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Top-up gradient optimization

Extended:

The relationship of several concepts in machine learning

Successive approximation method

Issue 1: \ (Ax = b\)

For problem 1, when the order of \ (a\) is large, and the large sparse matrix equations of 0 elements are many, it is a great challenge to solve it by using the principal element elimination method. For this reason, successive approximation method (or iterative method ) comes into being, concrete reference iterative method. (e.g. conjugate gradient method is a good iterative method)

Let's take a look at the concrete operation of the iterative method:

First rewrite \ (ax=b\) to \ (x = Bx + f\), using the formula:

\[x^{k+1} = bx^k + f\]

where \ (k\) is the number of iterations \ ((k=0, 1, 2, \cdots) \)

The method of solving the approximate solution gradually is called iterative method.

If \ (\underset{k \to \infty}\lim x^k\) exists (recorded as \ (x^*\)), called this iterative method convergence, obviously \ (x^*\) is the solution of the equations, otherwise called this iterative method Divergence.

Study \ (\{x^k \}\)The convergence of

Introduce the error vector:
\[ε^{k+1} = x^{k+1}-x^*\]

Get
\[ε^{k+1} = (bx^k + f)-(bx^* + f) = Bε^k = B^kε^0\]

Therefore, to study the convergence of \ (\{x^k \}\) , only need to study \ (\underset{k \to \infty}\limε^k = 0\) or \ (\underset{k \to \infty} \lim b^k = 0\) satisfies the condition.

The results of the iteration are presented as Python:

import numpy as np%pylab inline

Populating the interactive namespace from numpy and matplotlib

Solving equations \ (ax=b\)Eliminating element method

A = mat([[8, -3, 2], [4, 11, -1], [6, 3, 12]])b = mat([20, 33, 36])result = linalg.solve(A, b.T)print(result)

[[3.] [2.] [1.]]

The solution of solving the original equation GROUP by iterative: \ (x^{k+1}=b x^k+f\)

B = mat([[0.0, 3.0 / 8.0, -2.0 / 8.0], [-4.0 / 11.0, 0.0, 1.0 / 11.0],         [-6.0 / 12.0, -3.0 / 12.0, 0.0]])m, n = shape(B)f = mat([[20.0 / 8.0], [33.0 / 11.0], [36.0 / 12.0]])

error = 1.0e-7  # 误差阈值steps = 100  # 迭代次数xk = zeros((n, 1))  # 初始化 xk = x0errorlist = []  # 记录逐次逼近的误差列表for k in range(steps):  # 主程序    xk_1 = xk  # 上一次的 xk    xk = B * xk + f  # 本次 xk    errorlist.append(linalg.norm(xk - xk_1))  # 计算并存储误差    if errorlist[-1] < error:  # 判断误差是否小于阈值        print(k + 1)  # 输出迭代次数        breakprint(xk)  # 输出计算结果

18[[2.99999998] [2.00000003] [1.00000003]]

Plot error convergence scatter plot

def drawScatter(plt, mydata, size=20, color='blue', mrkr='o'):    m, n = shape(mydata)    if m > n and m > 2:        plt.scatter(mydata.T[0], mydata.T[1], s=size, c=color, marker=mrkr)    else:        plt.scatter(mydata[0], mydata[1], s=size, c=color, marker=mrkr)

matpts = zeros((2, k + 1))matpts[0] = linspace(1, k + 1, k + 1)matpts[1] = array(errorlist)drawScatter(plt, matpts)plt.show()

, it can be seen that the error converges very quickly, starting from the fourth time to close to the final result, several iterations later are the results of fine-tuning.

The existence of the solution is judged by the convergence of the error, as long as the error can be convergent, the equations will have a solution, but if the objective function is nonlinear, in order to converge faster, we need to find the fastest convergent direction (gradient direction).

Gradient Descent

Reference: Gradient Descent method

The objective function is assumed to be a convex function, which is represented in the optimization method as:

\[\underset{x}{\arg\min} \;\;f (x), \;\;x\in \mathbb{r}^n\]

If \ (f (x) \) is x_0\ , then \ (∇f (x_0) \) is the fastest changing direction of \ (x_0\) .

In order to solve the minimum value of \ (f (x) \) , you can select any initial point \ (x_0\)and go along the negative gradient from \ (x_0\) , which makes the \ (f (x) \) drop the fastest. The introduction of new parameters \ (\rho_k\) is called the step size , there are

\[x_{k+1} = x_k-\rho_k\frac{∇f (x_k)}{| | ∇f (x_k) | |} \]

See gradient correlation code in detail

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Gradient optimization (II.)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Gradient optimization (II.)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support