Top-up gradient optimization
Extended:
- The relationship of several concepts in machine learning
Successive approximation method
Issue 1: \ (Ax = b\)
For problem 1, when the order of \ (a\) is large, and the large sparse matrix equations of 0 elements are many, it is a great challenge to solve it by using the principal element elimination method. For this reason, successive approximation method (or iterative method ) comes into being, concrete reference iterative method. (e.g. conjugate gradient method is a good iterative method)
Let's take a look at the concrete operation of the iterative method:
First rewrite \ (ax=b\) to \ (x = Bx + f\), using the formula:
\[x^{k+1} = bx^k + f\]
where \ (k\) is the number of iterations \ ((k=0, 1, 2, \cdots) \)
The method of solving the approximate solution gradually is called iterative method.
If \ (\underset{k \to \infty}\lim x^k\) exists (recorded as \ (x^*\)), called this iterative method convergence, obviously \ (x^*\) is the solution of the equations, otherwise called this iterative method Divergence.
Study \ (\{x^k \}\)The convergence of
Introduce the error vector:
\[ε^{k+1} = x^{k+1}-x^*\]
Get
\[ε^{k+1} = (bx^k + f)-(bx^* + f) = Bε^k = B^kε^0\]
Therefore, to study the convergence of \ (\{x^k \}\) , only need to study \ (\underset{k \to \infty}\limε^k = 0\) or \ (\underset{k \to \infty} \lim b^k = 0\) satisfies the condition.
The results of the iteration are presented as Python:
import numpy as np%pylab inline
Populating the interactive namespace from numpy and matplotlib
Solving equations \ (ax=b\)Eliminating element method
A = mat([[8, -3, 2], [4, 11, -1], [6, 3, 12]])b = mat([20, 33, 36])result = linalg.solve(A, b.T)print(result)
[[3.] [2.] [1.]]
The solution of solving the original equation GROUP by iterative: \ (x^{k+1}=b x^k+f\)
B = mat([[0.0, 3.0 / 8.0, -2.0 / 8.0], [-4.0 / 11.0, 0.0, 1.0 / 11.0], [-6.0 / 12.0, -3.0 / 12.0, 0.0]])m, n = shape(B)f = mat([[20.0 / 8.0], [33.0 / 11.0], [36.0 / 12.0]])
error = 1.0e-7 # 误差阈值steps = 100 # 迭代次数xk = zeros((n, 1)) # 初始化 xk = x0errorlist = [] # 记录逐次逼近的误差列表for k in range(steps): # 主程序 xk_1 = xk # 上一次的 xk xk = B * xk + f # 本次 xk errorlist.append(linalg.norm(xk - xk_1)) # 计算并存储误差 if errorlist[-1] < error: # 判断误差是否小于阈值 print(k + 1) # 输出迭代次数 breakprint(xk) # 输出计算结果
18[[2.99999998] [2.00000003] [1.00000003]]
Plot error convergence scatter plot
def drawScatter(plt, mydata, size=20, color='blue', mrkr='o'): m, n = shape(mydata) if m > n and m > 2: plt.scatter(mydata.T[0], mydata.T[1], s=size, c=color, marker=mrkr) else: plt.scatter(mydata[0], mydata[1], s=size, c=color, marker=mrkr)
matpts = zeros((2, k + 1))matpts[0] = linspace(1, k + 1, k + 1)matpts[1] = array(errorlist)drawScatter(plt, matpts)plt.show()
, it can be seen that the error converges very quickly, starting from the fourth time to close to the final result, several iterations later are the results of fine-tuning.
The existence of the solution is judged by the convergence of the error, as long as the error can be convergent, the equations will have a solution, but if the objective function is nonlinear, in order to converge faster, we need to find the fastest convergent direction (gradient direction).
Gradient Descent
Reference: Gradient Descent method
The objective function is assumed to be a convex function, which is represented in the optimization method as:
\[\underset{x}{\arg\min} \;\;f (x), \;\;x\in \mathbb{r}^n\]
If \ (f (x) \) is x_0\ , then \ (∇f (x_0) \) is the fastest changing direction of \ (x_0\) .
In order to solve the minimum value of \ (f (x) \) , you can select any initial point \ (x_0\)and go along the negative gradient from \ (x_0\) , which makes the \ (f (x) \) drop the fastest. The introduction of new parameters \ (\rho_k\) is called the step size , there are
\[x_{k+1} = x_k-\rho_k\frac{∇f (x_k)}{| | ∇f (x_k) | |} \]
See gradient correlation code in detail