Derivation of multivariate linear regression formula and implementation of R language

Last Update:2018-08-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Multivariate linear regression multiple linear regression model

Many of the problems in practice are that a dependent variable is linearly correlated with multiple independent variables, and we can use a multivariate linear regression equation to represent it.

To facilitate the calculation, we will write the form in matrix:

Y = XW

Assuming the dimension of the argument is n
W is the coefficient of the independent variable, subscript 0-n
X is an argument vector or matrix, x dimension is n, in order to be able to correspond with W0, X needs to insert a column that is all 1 in the first row.
Y is the dependent variable
Then the problem is changed to the known sample X matrix and the corresponding dependent variable y value, to find a w that satisfies the equation, generally does not exist a W is the entire sample can satisfy the equation, after all, the real sample has a lot of noise. The most common way to solve w is the least squares method.

Least squares

We want to find out that W is the closest solution to a linear equation, the closest we define to the sum of squares and minimums of residuals, the formula for residuals and the sum of squares of residuals are as follows:

The above formula is derived in the way of the sum of squares of the minimum residuals, and there is a way of thinking that the same formula can be deduced with the maximum likelihood, first of all the assumptions about the model:

The error equal variance hypothesis, that is, the error expectation of each sample is 0, each sample error variance is the same value assumed σ
The error density function is normal distribution e ~ N (0,σ^2)

The simple derivation is as follows:

By using the maximum likelihood principle, the same formula as the least squares is derived.

Least squares Solution

The two-times function is a convex function, and the extremum point is the minimum point. Just ask for derivative = 0 to solve the W.

Analog data

We practice here with the R language simulation, because we use the matrix operation, the formula unary and multivariate are compatible, we for visualization convenience, we use the R language comes with women data to do a linear regression, and multivariate linear regression basically the same way.
Women data is as follows

> women   height weight1      58    1152      59    1173      60    1204      61    1235      62    1266      63    1297      64    1328      65    1359      66    13910     67    14211     68    14612     69    15013     70    15414     71    15915     72    164

Weight and height have a linear relationship, and we do a scatter plot to see:

We use the formula derived from the least squares to calculate w as follows

X <- cbind(rep(1, nrow(women)), women$height)X.T <- t(X)w <- solve(X.T %*% X) %*% X.T %*% y> w          [,1][1,] -87.51667[2,]   3.45000> lm.result <- lm(women$weight~women$height)> lm.resultCall:lm(formula = women$weight ~ women$height)Coefficients: (Intercept)  women$height        -87.52          3.45

The above R code W lets us use the formula to calculate, below is the R language integrated linear regression function to fit out, can see our calculation result is correct, LM's only the decimal point takes two bits only, will return the function to draw in the graph to see the regression effect.

Draw the corresponding R code as follows, with the sense of r ..... It's too elegant.

> png(file="chart2.png")> plot(women$height, women$weight)> lines(women$height, X %*% w)> dev.off()

Gradient Descent method

In addition to solving w with normal equations, W can be obtained with the most common gradient descent method, because the least squares is a convex function, so the smallest point found here is the minimum point. The following code is written in R is very easy to write, but at the beginning of the Step step parameter adjustment is too large, resulting in no convergence, I also
Think is a program error, then how to see also did not write wrong, the parameter adjusted a very small value, the result is convergent. Step of this value should be changed, the first big after the comparison of science, I this tune is very small, need to close to 5 million times to converge.

Initializes w to all 0 vectors, or randomly to a vector
Set the maximum number of iterations, in order to converge, this example sets a large
Set step step, small convergence very slow, big not convergence ....
To find the gradient of the loss function
W (k+1) for W (k) + loss function Negative gradient * step step
Loop until the gradient is close to 0

X <- cbind(rep(1, nrow(women)), women$height)Y <- women$weightmaxIterNum <- 5000000;step <- 0.00003;W <- rep(0, ncol(X))for (i in 1:maxIterNum){    grad <- t(X) %*% (X %*% W -  Y);    if (sqrt(as.numeric(t(grad) %*% grad)) < 1e-3){        print(sprintf('iter times=%d', i));        break;    }    W <- W - grad * step;}print(W);

Output

[1] "ITER times=4376771"

Print (W);
[, 1]
[1,]-87.501509
[2,] 3.449768

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Derivation of multivariate linear regression formula and implementation of R language

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Derivation of multivariate linear regression formula and implementation of R language

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support