1. What is linear regression?
The linear relationship is used to fit the input and output.
Set the input to X, the output y=ax+b.
For the multivariate situation y=bx1+a1x1+a2x2+...+anxn.
Using θ to represent coefficients, you can write:
Among them, X0=1.
2. What is the use of linear regression?
For continuous input and output problems, if linear regression can better fit the input and output, then the model can be used to predict the output of other inputs.
Conversely, if linear regression can fit the input and output well, then the output and input have a strong linear correlation, which can identify the redundant information in the input.
3. How to determine whether the better fit?
The initial consideration is that when using the model output, and how much deviation from the actual output, choose a method to quantify this deviation.
Each sample input model has a bias.
In linear regression, the degree of deviation is judged by finding the square mean of these deviations. Writing:
Where the actual output is Y, the model output is H, superscript I refers to each sample. The coefficient is divided by 2 on the basis of the square mean.
The equation for judging deviations is called cost Function. The smaller the deviation, the lower the value of the cost function, the better the fit.
4. How do I train a model?
The purpose of the training model is to achieve good fit, that is to say, the value of cost function is as small as possible.
Training here, is to choose a set of coefficients θ (after the model is determined, the parameter of the model is the coefficient theta), to achieve the above purpose.
Calculus, you can find the partial derivative of θ equal to 0 points, directly get the extremum point.
According to Andrew Ng courseware, when the number of parameters is greater than 10,000, the direct extremum point time is too long, you need to choose another method.
5. How to train the Model: gradient drop.
As the name implies, it is descending along the gradient. Select an appropriate step alpha to change θ one stepat a length, reducing the value of the cost function.
Among them, θj represents the individual coefficients. : = The preceding colon indicates that each θj is changed at the same time.
How many steps do you take? And how to judge the model training well? It is best to observe the change in the value of J (θ) after each θj change.
At the beginning θj equals how much? At first, you can choose a set of values.
How big should the step α be chosen? To try it manually, "find" the appropriate value.
Finally, after many iterations, the algorithm obtains a set of θ, which makes the value of cost function smaller.
6.matlab implements a linear regression.
% input parameters for a feature
X1=[0.50,0.75,1.00,1.25,1.50,1.75,1.75,2.00,2.25,2.50,2.75,3.00,3.25,3.50,4.00,4.25,4.50,4.75,5.00,5.50];
X0=ones (Size (X1));
X= ([X0; X1]) ';
Y= ([10, 22, 13, 43, 20, 22, 33, 50, 62, 48, 55, 75, 62, 73, 81, 76, 64, 82, 90, 93]) ';
% Gradient Drop parameter design
alpha=0.001; % when Alpha is greater than 1 o'clock, it doesn't converge.
THETA=[2;3]; % Select which point as starting point seems to have little effect on convergence speed
times=2000; % Iteration Count
For I=1:times
Delta=x*theta-y; % partial derivative
Theta=theta-alpha.* (X ' *delta); % Gradient Drop
J (i) =delta ' *delta; % to calculate the cost function value at this time
End
% observe the change in cost function value with the number of iterations
% plot (J);
% observed fitting conditions
Stem (x1,y);
P2=x*theta;
Hold on;
Plot (X1,P2);
7. Actual Use
When you actually use linear regression, the input data is optimized first. Includes: 1. Remove redundant and unrelated variables; 2. For nonlinear relationships, polynomial fitting is used to change a variable into multiple variables; 3. Normalization of the input range.
Summary
Linear regression begins with the assumption that there is a linear relationship between input and output
Then, using the linear regression model H=ΘTX, the cost Function J (θ) is used to evaluate the fitting degree,
By applying the gradient descent algorithm to J (θ), a good set of parameter θ is approximated, and a suitable model H is obtained.
The use of linear regression is based on the assumption that there is a linear relationship between input and output, which maps a set of features to a value.
Use, perhaps because the model is too simple to feel the "machine learning" feeling. The choice also requires a lot of prior knowledge, for specific situations, just like the general programming process problems.
When using the gradient descent algorithm, the iterative process, a little "learning" feeling.
Andrew ng Machine Learning (i): Linear regression