Reference: openclassroom
Linear Regression)
To fit the relationship between age (x1) and height (y) of children under 10 years old, we assume a function h (x) for x ):
H (x) = & Theta; 0 + & Theta; 1 * x1 = & Theta; 0 * x0 + & Theta; 1 * x1 = & Theta; T * x (x0 = 1, x = [x0, x1])
Our goal is to find & Theta; so that h (x) is close to y.
Therefore, we need to minimize the square error between h (x) and y on m training samples (x, y.
That is, to minimize J (& Theta;) = 1/(2 * m) * & sum; I (h (x (I)-y (I) 2
The role of 2 on the denominator is to offset the result produced by the square item during the derivation.
Solution 1: Gradient Descent (Gradient Descent)
& Theta; move toward the gradient direction of J (& Theta;) (that is, the deviation of J (& Theta;) about & Theta;) until J (& Theta ;) reach the smallest vertex (J (& Theta;) in linear regression is a bowl, and the smallest vertex is the smallest vertex)
& Alpha; indicates the step size. & alpha; no adjustment is required because the deviation of J (& Theta;) about & Theta.
Execute the following two update formulas until convergence.
Note: run the command at the same time. Instead of finding another iteration.
& Theta; 0 = & Theta; 0-& alpha;/m * & sum; I (h (x (I)-y (I) x0 (I)
& Theta; 1 = & Theta; 1-& alpha;/m * & sum; I (h (x (I)-y (I) x1 (I)
Solution 2: Normal Equations
J (& Theta;) about & Theta; evaluate to 0, and solve the join column equations:
& Theta; = (XTX)-1XTY (where the row vector of X is x (I), and each element of Y is y (I ))
Note: (XTX)-1 is not necessarily meaningful.
Case 1: the dimension of each x (I) sample is n. When m is less than or equal to n, XTX is a non-full rank singular matrix with no inverse element.
Case 2: linear correlation of x (I) features, that is, when linear correlation of X column vectors, XTX is not full rank, which is a singular matrix with no inverse element.
[ML] Solving linear regression equations