Regression analysis is a statistical method to analyze the data, in order to understand the correlation between two or more variables, correlation direction and intensity, and establish a mathematical model to observe the specific variables to predict the variables of interest to the researcher. More specifically, regression analysis can help people understand the amount of variation in the dependent variable when only one argument changes.
Regression analysis is a model that establishes the relationship between the variable y and the self-variable x, which can have an argument x or more than an independent variable (X1, X2 ...). Xi).
The purpose of regression analysis is to find a function (regression estimator) that is most representative of all observational data, and use this function to represent the relationship between the dependent variable and the independent variable.
Linear regression is the first type of regression analysis that has been rigorously studied and widely used in practical applications.
Linear regression has many practical uses. Divided into the following two major categories:
1) If the target is a prediction or mapping, linear regression can be used to fit a predictive model to the values of the observed data set and X. When such a model is completed, for a new X-value, a Y-value can be predicted with the fitted model without a given pair of Y.
2) Given a variable y and some variables X1,..., Xp, these variables may be related to Y, and linear regression analysis can be used to quantify the strength of the correlation between Y and XJ, evaluate the XJ that is not related to Y, and identify which XJ's subset contains redundant information about Y.
A multivariate linear regression model is represented in the following form:
We can use matrices to represent:
Fitting a regression formula is to select parameters appropriately so that the function model best fits the observed values. In general, the observed values are much larger than the selected parameters.
The linear regression model is often fitted with the least squares approximation, which is matched by minimizing the squared error and finding the best function of the data.
For single-Variable linear regression:
is to minimize the following:
The two parameters can be solved by derivation respectively.
The gradient descent method (English: Gradient descent) is an optimization algorithm, often referred to as the steepest descent method.
Gradient Descent method, based on the observation that if the real value function f (x) at point A is micro and defined, then the function f (x) in the direction of a point along the gradient opposite-? F (a) fell fastest.
Therefore, we can gradually iterate from one of the initial values to approximate the optimal value, which is:
Makes:
Here to meet:
1)? F (a) will approximate the X-optimal value
2) When x reaches the optimal value,? F (x) =0
Here γ is the learning rate.
Example: y=x^2/2-2x
Calculation process:
Let an initial starting point, set to X0=-4.
(1) First given two parameters: 1.5, 0.01;
(2) Calculating derivative: DY/DX = x-2
(3) Calculate the current guide value: Y ' =-6
(4) Modify the current parameters:
x0=-4-1.5* (-6) = 5.0;
(5) Calculate the current guide value: Y ' =3.0
(6) Modify the current parameters:
x1=5.0–1.5* (3.0) = 0.5;
(7) Calculate the current guide value:
Y ' =-1.5
(8) Modify the current parameters:
x2=0.5-1.5* (-1.5)
= 2.75;
(9) Calculate the current guide value: Y ' =0.75
(10) Modify the current parameters:
x3 = 2.75-1.5* (0.75) = 1.625;
(11) Calculate the current guide value:
Y ' =-0.375
(12) Modify the current parameters: x4= 1.625-1.5* (-0.375) = 2.1875;
...
It can be seen that when the =1.5, the search appears in oscillatory form, and searches around the extremum point repeatedly. It can be proved that when the <1.0, the search will tend to the extremum point monotonically, will not oscillate; when >2.0, the search will gradually diverge around the extremum point, and will not converge to the extremum point.
To ensure convergence, it should not be too large. But if it is too small, the rate of convergence will be very slow. The adaptive adjustment method can be used to accelerate convergence without divergence.
Reference:
Https://zh.wikipedia.org/wiki/%E6%9C%80%E5%B0%8F%E4%BA%8C%E4%B9%98%E6%B3%95
Https://zh.wikipedia.org/wiki/%E7%B7%9A%E6%80%A7%E5%9B%9E%E6%AD%B8
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Linear regression and recursive descent