Why do we need linear regression?
On the one hand, the relationships that linear regression can simulate are far more than linear relationships. "Linear" in linear regression refers to the linearity of coefficients, and the function relation between output and feature can be highly nonlinear by nonlinear transformation of feature and generalization of generalized linear model. On the other hand, it is also more important that the easy interpretation of the linear model makes it an irreplaceable position in the fields of physics, economics and business.
So, how do you use Python to achieve linear regression?
Because of the widespread popularity of machine learning Library Scikit-learn, the common approach is to call Linear_model from the library to fit the data. While this can provide other advantages of machine learning for other pipelining features such as data normalization, model coefficients regularization, and passing a linear model to another downstream model, this is often not the quickest and easiest way when a data analyst needs to quickly and easily determine the regression coefficients (and some basic related statistics).
Below, I'll cover some faster and more concise approaches, but the amount of information and modeling flexibility they provide varies.
8 Methods for linear regression
Method One: Scipy.polyfit () or Numpy.polyfit ()
This is one of the most basic least-squares polynomial fitting functions (least squares polynomial fit function) that accepts a dataset and any dimension polynomial functions (specified by the user) and returns a set of coefficients that minimize squared errors. A detailed description of the function is given here. For simple linear regression, you can choose a 1-D function. But if you want to fit a higher-dimensional model, you can build the polynomial feature from the linear feature data and fit the model.
Method Two: Stats.linregress ()
This is a highly specialized linear regression function that can be found in the SCIPY statistical module. However, since it is only used to optimize the least squares regression for calculating two sets of measurement data, its flexibility is quite limited. Therefore, it can not be used for generalized linear model and multivariate regression fitting. However, because of its particularity, it is one of the quickest methods in simple linear regression. In addition to the fitted coefficients and intercept items, it also returns basic statistics such as R2 coefficients and standard deviations.
Method Three: Optimize.curve_fit ()
This is consistent with the polyfit approach, but is inherently more general. This powerful function comes from the Scipy.optimize module, which can fit any user-defined function onto a dataset by minimizing the least squares.
For simple linear regression, you can write only a linear MX + C function and call this estimate function. It goes without saying that it also applies to multivariate regression and returns an array of function parameters with the least squares measure and a covariance matrix.
Method Four: NUMPY.LINALG.LSTSQ
This is the basic method of computing the least squares solution of linear equations by matrix decomposition. A simple linear algebra module from the NumPy package. In this method, by calculating Euclidean 2-norm | | b-ax| | 2 minimize the vector x to solve equation ax = b.
The equation may have countless solutions, unique solutions, or no solutions. If A is a square and full rank, then X (rounding) is the "exact" solution of the equation.
You can use this method to do a unary or multivariate linear return to get the calculated coefficients and residuals. A small trick is to calculate the intercept item by adding a column after the X data before calling the function. This proves to be one of the ways to solve linear regression problems more quickly.
Method Five: Statsmodels.ols ()
Statsmodels is a small Python package that provides classes and functions for many different statistical model estimates, as well as classes and functions for statistical testing and statistical data exploration. Each estimate corresponds to a generic result list. Can be tested based on existing statistical packages to ensure the correctness of statistical results.
For linear regression, you can use the OLS or general least squares functions in the package to obtain complete statistics in the estimation process.
A small trick to keep in mind is that you must manually add a constant to the data x to calculate the intercept, otherwise only the coefficients will be obtained by default. The following is a complete summary of the OLS model. The results are as rich as statistical languages such as R or Julia.
Method Six and seven: using the inverse of the matrix to solve the analytic solution
For a well-conditioned linear regression problem (where at least the number of data points > Features is satisfied), the coefficient solution is equivalent to the existence of a simple closed matrix solution that minimizes the least squares. Given by the following formula:
Here are two options:
(a) using simple multiplication to find the inverse of the matrix
(b) The Moore-penrose generalized pseudo-inverse matrix of x is computed first, and then the dot product is taken with Y. Because the second process involves singular value decomposition (SVD), it is slower, but it works well for datasets that do not have good conditions.
Method Eight: Sklearn.linear_model. Linearregression ()
This is a typical approach used by most machine learning engineers and data scientists. Of course, for real-world problems, it may be superseded by cross-validation and regularization algorithms such as lasso regression and ridge regression, rather than being used too much, but the core of these advanced functions is the model itself.
Eight methods of efficiency competition H5 chess Source Building (h5.hxforum.com) Contact 170618633533 Penguin 2952777280 Tel17061863533 source for sale, sale of room cards for sale, platform rental
As a data scientist, you should always look for accurate and fast methods or functions to complete the data modeling work. If the model is inherently slow, it can cause execution bottlenecks on large datasets.
The solution of simple matrix inverse algorithm is faster
As data scientists, we must always explore multiple solutions to analyze and model the same tasks, and choose the best solution for specific problems.
In this paper, we discuss 8 methods of simple linear regression. Most of them can be extended to more generalized multivariate and polynomial regression modeling.
The objective of this paper is to discuss the relative running speed and computational complexity of these methods. We test on a synthetic data set (up to 10 million samples) with a continuously increasing amount of data, and give the calculation time for each method.
8 ways to implement linear regression with Python