Multivariate linear regression multiple linear regression model

Many of the problems in practice are that a dependent variable is linearly correlated with multiple independent variables, and we can use a multivariate linear regression equation to represent it.

To facilitate the calculation, we will write the form in matrix:

Y = XW

- Assuming the dimension of the argument is n
- W is the coefficient of the independent variable, subscript 0-n
- X is an argument vector or matrix, x dimension is n, in order to be able to correspond with W0, X needs to insert a column that is all 1 in the first row.
- Y is the dependent variable

Then the problem is changed to the known sample X matrix and the corresponding dependent variable y value, to find a w that satisfies the equation, generally does not exist a W is the entire sample can satisfy the equation, after all, the real sample has a lot of noise. The most common way to solve w is the least squares method.

Least squares

We want to find out that W is the closest solution to a linear equation, the closest we define to the sum of squares and minimums of residuals, the formula for residuals and the sum of squares of residuals are as follows:

The above formula is derived in the way of the sum of squares of the minimum residuals, and there is a way of thinking that the same formula can be deduced with the maximum likelihood, first of all the assumptions about the model:

- The error equal variance hypothesis, that is, the error expectation of each sample is 0, each sample error variance is the same value assumed σ
- The error density function is normal distribution e ~ N (0,σ^2)

The simple derivation is as follows:

By using the maximum likelihood principle, the same formula as the least squares is derived.

Least squares Solution

The two-times function is a convex function, and the extremum point is the minimum point. Just ask for derivative = 0 to solve the W.

Analog data

We practice here with the R language simulation, because we use the matrix operation, the formula unary and multivariate are compatible, we for visualization convenience, we use the R language comes with women data to do a linear regression, and multivariate linear regression basically the same way.

Women data is as follows

`> women height weight1 58 1152 59 1173 60 1204 61 1235 62 1266 63 1297 64 1328 65 1359 66 13910 67 14211 68 14612 69 15013 70 15414 71 15915 72 164`

Weight and height have a linear relationship, and we do a scatter plot to see:

We use the formula derived from the least squares to calculate w as follows

`X <- cbind(rep(1, nrow(women)), women$height)X.T <- t(X)w <- solve(X.T %*% X) %*% X.T %*% y> w [,1][1,] -87.51667[2,] 3.45000> lm.result <- lm(women$weight~women$height)> lm.resultCall:lm(formula = women$weight ~ women$height)Coefficients: (Intercept) women$height -87.52 3.45`

The above R code W lets us use the formula to calculate, below is the R language integrated linear regression function to fit out, can see our calculation result is correct, LM's only the decimal point takes two bits only, will return the function to draw in the graph to see the regression effect.

Draw the corresponding R code as follows, with the sense of r ..... It's too elegant.

`> png(file="chart2.png")> plot(women$height, women$weight)> lines(women$height, X %*% w)> dev.off()`

Gradient Descent method

In addition to solving w with normal equations, W can be obtained with the most common gradient descent method, because the least squares is a convex function, so the smallest point found here is the minimum point. The following code is written in R is very easy to write, but at the beginning of the Step step parameter adjustment is too large, resulting in no convergence, I also

Think is a program error, then how to see also did not write wrong, the parameter adjusted a very small value, the result is convergent. Step of this value should be changed, the first big after the comparison of science, I this tune is very small, need to close to 5 million times to converge.

- Initializes w to all 0 vectors, or randomly to a vector
- Set the maximum number of iterations, in order to converge, this example sets a large
- Set step step, small convergence very slow, big not convergence ....
- To find the gradient of the loss function
- W (k+1) for W (k) + loss function Negative gradient * step step
- Loop until the gradient is close to 0

`X <- cbind(rep(1, nrow(women)), women$height)Y <- women$weightmaxIterNum <- 5000000;step <- 0.00003;W <- rep(0, ncol(X))for (i in 1:maxIterNum){ grad <- t(X) %*% (X %*% W - Y); if (sqrt(as.numeric(t(grad) %*% grad)) < 1e-3){ print(sprintf('iter times=%d', i)); break; } W <- W - grad * step;}print(W);`

Output

[1] "ITER times=4376771"

Print (W);

[, 1]

[1,]-87.501509

[2,] 3.449768

More great articles http//