In the vectorization section of the second week, the vectorization process of the gradient descent method is not very clear at first, and it was later deduced and recorded here.
The following is the parametric recursive formula for gradient descent (assuming n=2):
Equation 1:
$\theta_0: = \theta_0-\alpha \frac{1}{m}\sum_{i=1}^{m} (H_\theta (x^{(i)})-y^{(i)}) x^{(i)}_0$
$\theta_1: = \theta_1-\alpha \frac{1}{m}\sum_{i=1}^{m} (H_\theta (x^{(i)})-y^{(i)}) x^{(i)}_1$
$\theta_2: = \theta_2-\alpha \frac{1}{m}\sum_{i=1}^{m} (H_\theta (x^{(i)})-y^{(i)}) x^{(i)}_2$
When the teacher talks about this, it is mentioned that the above equation is quantified as follows:
Equation 2:
$\theta: = \theta-\alpha \delta$
Equation 3:
$\delta = \frac{1}{m}\sum_{i=1}^{m} (H_\theta (x^{(i)})-y^{(i)}) x^{(i)}$
So how did the results get? Here is my derivation process:
(1) First, a vectorization operation is performed on the whole of equation 1:
$\begin{pmatrix} \theta _0 \ \theta_1 \ \theta_2 \end{pmatrix}: = \begin{pmatrix} \theta_0-\alpha \frac{1}{m} \sum_{i =1}^{m} (H_\theta (x^{(i))-y^{(i)}) x_0^{(i)} \ \ \theta_1-\alpha \frac{1}{m} \sum_{i=1}^{m} (H_\theta (x^{(i)})-y^{( i)}) x_1^{(i)} \ \ \theta_2-\alpha \frac{1}{m} \sum_{i=1}^{m} (H_\theta (x^{(i)})-y^{(i)}) x_2^{(i)} \end{pmatrix}$
(2) The right side of the equal sign can be split into the following matrix subtraction and multiplication rules:
$\begin{pmatrix} \theta _0 \ \theta_1 \ \theta_2 \end{pmatrix}: = \begin{pmatrix} \theta_0 \ \theta_1 \ \theta_2 \en D{pmatrix}-\alpha \frac{1}{m} \begin{pmatrix} \sum_{i=1}^{m} (H_\theta (x^{(i)})-y^{(i)}) x_0^{(i)} \ \ \sum_{i=1}^{m} ( H_\theta (x^{(i)})-y^{(i)}) x_1^{(i)} \ \ \sum_{i=1}^{m} (H_\theta (x^{(i)})-y^{(i)}) x_2^{(i)} \end{pmatrix}$
(3) The $\theta$ on both sides of the equal sign can be quantified to $\theta$, then the key is the part behind the minus sign, which we can expand:
$\theta: = \theta-\alpha \frac{1}{m} \begin{pmatrix} (H_\theta (x^{(1)})-y^{(1)}) x_0^{(1)} + (H_\theta (x^{(2)})-y^{ (2)}) x_0^{(2)} + ... + (H_\theta (x^{(M)})-y^{(M)}) x_0^{(m)}\\ (H_\theta (x^{(1)})-y^{(1)}) x_1^{(1)} + (H_\theta (x^{(2)}) -y^{(2)}) x_1^{(2)} + ... + (H_\theta (x^{(M)})-y^{(M)}) x_1^{(m)}\\ (H_\theta (x^{(1)})-y^{(1)}) x_2^{(1)} + (H_\theta (X^{(2)})-y^{(2)}) x_2^{(2)} + ... + (H_\theta (x^{(M)})-y^{(M)}) x_2^{(m)}\\ \end{pmatrix}$
(4) The expanded matrix is divided into the following m parts by the addition rule:
$\theta: = \theta-\alpha \frac{1}{m} (\begin{pmatrix} (H_\theta (x^{(1)})-y^{(1)}) x_0^{(1)} \ \ (H_\theta (x^{(1)})- y^{(1)}) x_1^{(1)} \ \ (H_\theta (x^{(1)})-y^{(1)}) x_2^{(1)} \end{pmatrix} + \begin{pmatrix} (H_\theta (x^{(2)})-y^{(2)} ) x_0^{(2)} \ \ (H_\theta (x^{(2)})-y^{(2)}) x_1^{(2)} \ \ (H_\theta (x^{(2)})-y^{(2)}) x_2^{(2)} \end{pmatrix} + ... + \be Gin{pmatrix} (H_\theta (x^{(M)})-y^{(M)}) x_0^{(m)} \ \ (H_\theta (x^{(M)})-y^{(M)}) x_1^{(m)} \ \ (H_\theta (x^{(M)})-Y ^{(M)}) x_2^{(M)} \end{pmatrix}) $
(5) Extract the same parts from each matrix:
$\theta: = \theta-\alpha \frac{1}{m} [(H_\theta (x^{(1)})-y^{(1)}) \begin{pmatrix} x_0^{(1)} \ \ x_1^{(1)} \ \ x_2^{(1)} \end{pmatrix} + (H_\theta (x^{(2)})-y^{(2)}) \begin{pmatrix} x_0^{(2)} \ \ x_1^{(2)} \ \ x_2^{(2)} \end{pmatrix} + ... + (H_\theta (x^{(M)})-y^{(M)}) \begin{pmatrix} x_0^{(m)} \ \ x_1^{(m)} \ \ x_2^{(M)} \end{pmatrix})]$
(6) You can see that each x matrix consists of a vector x:
$\theta: = \theta-\alpha \frac{1}{m} [(H_\theta (x^{(1)})-y^{(1)}) x^{(1)} + (H_\theta (x^{(2)})-y^{(2)}) x^{(2)} + ... + (H_\theta (x^{(M)})-y^{(M)}) x^{(m)}]$
(7) It is then represented as a form of $\sigma$:
$\theta: = \theta-\alpha \frac{1}{m} \sum_{i=1}^{m} (H_\theta (x^{(i)})-y^{(i)}) x^{(i)}$
Derivation of gradient descent method by vectorization