Derivation of gradient descent method by vectorization

Source: Internet
Author: User

In the vectorization section of the second week, the vectorization process of the gradient descent method is not very clear at first, and it was later deduced and recorded here.

The following is the parametric recursive formula for gradient descent (assuming n=2):

Equation 1:

$\theta_0: = \theta_0-\alpha \frac{1}{m}\sum_{i=1}^{m} (H_\theta (x^{(i)})-y^{(i)}) x^{(i)}_0$

$\theta_1: = \theta_1-\alpha \frac{1}{m}\sum_{i=1}^{m} (H_\theta (x^{(i)})-y^{(i)}) x^{(i)}_1$

$\theta_2: = \theta_2-\alpha \frac{1}{m}\sum_{i=1}^{m} (H_\theta (x^{(i)})-y^{(i)}) x^{(i)}_2$

When the teacher talks about this, it is mentioned that the above equation is quantified as follows:

Equation 2:

$\theta: = \theta-\alpha \delta$

Equation 3:

$\delta = \frac{1}{m}\sum_{i=1}^{m} (H_\theta (x^{(i)})-y^{(i)}) x^{(i)}$

So how did the results get? Here is my derivation process:

(1) First, a vectorization operation is performed on the whole of equation 1:

$\begin{pmatrix} \theta _0 \ \theta_1 \ \theta_2 \end{pmatrix}: = \begin{pmatrix} \theta_0-\alpha \frac{1}{m} \sum_{i =1}^{m} (H_\theta (x^{(i))-y^{(i)}) x_0^{(i)} \ \ \theta_1-\alpha \frac{1}{m} \sum_{i=1}^{m} (H_\theta (x^{(i)})-y^{( i)}) x_1^{(i)} \ \ \theta_2-\alpha \frac{1}{m} \sum_{i=1}^{m} (H_\theta (x^{(i)})-y^{(i)}) x_2^{(i)} \end{pmatrix}$

(2) The right side of the equal sign can be split into the following matrix subtraction and multiplication rules:

$\begin{pmatrix} \theta _0 \ \theta_1 \ \theta_2 \end{pmatrix}: = \begin{pmatrix} \theta_0 \ \theta_1 \ \theta_2 \en D{pmatrix}-\alpha \frac{1}{m} \begin{pmatrix} \sum_{i=1}^{m} (H_\theta (x^{(i)})-y^{(i)}) x_0^{(i)} \ \ \sum_{i=1}^{m} ( H_\theta (x^{(i)})-y^{(i)}) x_1^{(i)} \ \ \sum_{i=1}^{m} (H_\theta (x^{(i)})-y^{(i)}) x_2^{(i)} \end{pmatrix}$

(3) The $\theta$ on both sides of the equal sign can be quantified to $\theta$, then the key is the part behind the minus sign, which we can expand:

$\theta: = \theta-\alpha \frac{1}{m} \begin{pmatrix} (H_\theta (x^{(1)})-y^{(1)}) x_0^{(1)} + (H_\theta (x^{(2)})-y^{ (2)}) x_0^{(2)} + ... + (H_\theta (x^{(M)})-y^{(M)}) x_0^{(m)}\\ (H_\theta (x^{(1)})-y^{(1)}) x_1^{(1)} + (H_\theta (x^{(2)}) -y^{(2)}) x_1^{(2)} + ... + (H_\theta (x^{(M)})-y^{(M)}) x_1^{(m)}\\ (H_\theta (x^{(1)})-y^{(1)}) x_2^{(1)} + (H_\theta (X^{(2)})-y^{(2)}) x_2^{(2)} + ... + (H_\theta (x^{(M)})-y^{(M)}) x_2^{(m)}\\ \end{pmatrix}$

(4) The expanded matrix is divided into the following m parts by the addition rule:

$\theta: = \theta-\alpha \frac{1}{m} (\begin{pmatrix} (H_\theta (x^{(1)})-y^{(1)}) x_0^{(1)} \ \ (H_\theta (x^{(1)})- y^{(1)}) x_1^{(1)} \ \ (H_\theta (x^{(1)})-y^{(1)}) x_2^{(1)} \end{pmatrix} + \begin{pmatrix} (H_\theta (x^{(2)})-y^{(2)} ) x_0^{(2)} \ \ (H_\theta (x^{(2)})-y^{(2)}) x_1^{(2)} \ \ (H_\theta (x^{(2)})-y^{(2)}) x_2^{(2)} \end{pmatrix} + ... + \be Gin{pmatrix} (H_\theta (x^{(M)})-y^{(M)}) x_0^{(m)} \ \ (H_\theta (x^{(M)})-y^{(M)}) x_1^{(m)} \ \ (H_\theta (x^{(M)})-Y ^{(M)}) x_2^{(M)} \end{pmatrix}) $

(5) Extract the same parts from each matrix:

$\theta: = \theta-\alpha \frac{1}{m} [(H_\theta (x^{(1)})-y^{(1)}) \begin{pmatrix} x_0^{(1)} \ \ x_1^{(1)} \ \ x_2^{(1)} \end{pmatrix} + (H_\theta (x^{(2)})-y^{(2)}) \begin{pmatrix} x_0^{(2)} \ \ x_1^{(2)} \ \ x_2^{(2)} \end{pmatrix} + ... + (H_\theta (x^{(M)})-y^{(M)}) \begin{pmatrix} x_0^{(m)} \ \ x_1^{(m)} \ \ x_2^{(M)} \end{pmatrix})]$

(6) You can see that each x matrix consists of a vector x:

$\theta: = \theta-\alpha \frac{1}{m} [(H_\theta (x^{(1)})-y^{(1)}) x^{(1)} + (H_\theta (x^{(2)})-y^{(2)}) x^{(2)} + ... + (H_\theta (x^{(M)})-y^{(M)}) x^{(m)}]$

(7) It is then represented as a form of $\sigma$:

$\theta: = \theta-\alpha \frac{1}{m} \sum_{i=1}^{m} (H_\theta (x^{(i)})-y^{(i)}) x^{(i)}$

Derivation of gradient descent method by vectorization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.