From a statistical point of view, most of the methods of machine learning are statistical classification and regression method to the field of engineering extension.
The term "regression" (Regression) was the origin of the British scientist Francis Galton (1822-1911) in a 1886 paper [1] to study the relationship between height and parental height of a child. After observing 1087 couples, the adult son was found to be of height =33.73+0.516* The average height of the parent (in inches). He found that the height of the child was more moderate than that of his parents: if both parents were very tall, the child was taller but shorter than his parents, and if both parents were very short, the child was taller and taller than his parents. This discovery is known as the "Return to mean" (regression to the mean). This also shows that the regression model is a soft model, and the regression model is more descriptive of the correlation between things than causality, which is not as rigorous as a physical model or some function (such as Kepler's Law of planetary motion).
1. Starting with the unary linear regression
When we judge whether we weigh properly, we have to measure our height first. Because in both physical and aesthetic terms, weight and height are related. It is common to think that the human body is homogeneous, that is, the relationship between height and weight is linear, then we hope to establish a linear regression model of one element.
Y=β0+β1x+ε,
X is the current height, ε is the error term, β0 and β1 are two constants, it is generally considered that each height of ε is independent, and the average value is 0, the variance is σ2 normal distribution, recorded as Ε-i.i.d~n (0,σ2). Because of the error, the current height x under the weight y, recorded as y|x, also exists y|x ~n (β0+β1x,σ2), so we put our own height x in, we can get that height under the weight of the average, and 99.74% of the confidence that the height, weight should be between (β0+β1x-3σ,β0+β1x+3σ). Of course, if you deviate from this interval, the weight is not standard, but this also requires that the value of σ is not too large.
One-dimensional linear regression is to estimate the value of two constants, β0 and β1, through sample data. Of course, this is a question of the benevolent see, the weight of the thin people in order to maintain the body, do not want to have fat data interference model; Fat people will choose only height-the most standard person for weight control. Of course, it is also unreasonable to choose boys ' data when considering the relationship between height and weight of girls. We choose the height-weight data (x1, y1), (x2, y2),..., (xn, yn), and use the least squares method to get the estimated β0 and β1 according to our own criteria:
As the sample data is selected according to the rules, we can think that there is almost no noise data, that is, Σ value is not too large, so the current height under the standard weight range will be reduced, making the model more accurate and effective. This method uses the least squares method to get the empirical regression equation, that is to get a straight line
is safe. The empirical regression equation gives an estimate of the weight of any height XI in the sample, and the difference between the true value of the body weight and the estimated value is called the true residuals.
as the residual difference exists positive and negative, in order to accumulate residual effect, the residual sum of all sample points is squared and then the residual sum of squares is obtained. The least squares method is to solve the optimization problem that minimizes the sum of squares of residuals. the least squares method is to minimize the collision of the empirical regression model with the entire sample, even though the empirical regression model does not pass through any of the points in the sample, but it passes through the sample's mean point
2. Estimating process of model parameters
3. Properties of the least squares estimator
First, the least squares estimate is linear . The estimated value of β0,β1 is a linear combination of the Y1,y2,..., yn. At the same time, The estimate is unbiased, that is, the expectation of β0,β1 estimates is the same as that of β0,β1.
Considering whether the model is valid, we ask for the variance of the estimated value
In sum, the estimates for a given x0,y0 are subject to the following normal distribution
This shows that in the empirical regression model, the estimates of the different XI are unbiased, but the variance size is generally different. The least square method is the unbiased estimator with the smallest variance, that is, the least squares estimation is the best in the whole unbiased model. From the estimate distribution of y0, we can see that if we want to reduce the variance of the model, we should enlarge the sample capacity, that is, increase the value of N. At the same time, as much as possible to disperse the sample to increase the LXX. Back to the weight-height modeling problem, if you choose a person of different height, same sex and weight-height ratio, it is easy to estimate the linear relationship between the most standard body weight-height of the sex by using least squares.
[1] Regression towards mediocrity in hereditary stature. Francis Galton, Journal of the anthropological Institute, 1886, 15:246–263
Machine learning from Statistics (i.) unary linear regression