Learning Ridge Regression with Scikit-learn and pandas

Source: Internet
Author: User

This article will use an example to tell how to use Scikit-learn and pandas to learn ridge regression.

1. Loss function of Ridge regression

In my other article on linear regression, I made some introductions to ridge regression and when it was appropriate to use ridge regression. If you are completely unclear about what is Ridge regression, read this article.

Summary of the principle of linear regression

The loss function representation of the ridge regression is:

\ (J (\mathbf\theta) = \frac{1}{2} (\mathbf{x\theta}-\mathbf{y}) ^t (\mathbf{x\theta}-\mathbf{y}) + \frac{1}{2}\alpha| | \theta| | _2^2\)

where \ (\alpha\) is a constant coefficient, it needs to be tuned. \(|| \theta| | _2\) is the L2 norm.

The algorithm needs to solve is to find a suitable hyper-parameter \ (\alpha\) case, the Mission \ (J (\mathbf\theta) \) the smallest \ (\theta\). It is generally possible to solve this problem by using gradient descent and least squares method. Scikit-learn is used for least squares.

2. Data acquisition and preprocessing

Here we still run Ridge regression with the open machine learning data from UCI University.

Introduction of the data in this: http://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant

Data in this: http://archive.ics.uci.edu/ml/machine-learning-databases/00294/

Inside is a cycle of data, a total of 9,568 sample data, each data has 5 columns, namely: at (temperature), V (pressure), AP (humidity), RH (voltage), PE (output power). We don't have to dwell on every specific meaning.

Our problem is to get a linear relationship, the corresponding PE is the sample output, and AT/V/AP/RH these 4 is a sample feature, the purpose of machine learning is to adjust the hyper-parameter \ (\alpha\) to obtain a linear regression model, namely:

\ (PE = \theta_0 + \theta_1*at + \theta_2*v + \theta_3*ap + \theta_4*rh\)

Minimizes the loss function \ (J (\mathbf\theta) \). And the need to learn is \ (\theta_0, \theta_1, \theta_2, \theta_3, \theta_4\) These 5 parameters.

After downloading the data can be found is a compressed file, decompression can see there is an xlsx file, we first opened it with Excel, and then "Save as" "CSV format, save it, we use this CSV to run Ridge regression.

This set of data is not necessarily suitable for use with ridge regression model, in fact this group of data is highly linear, using the regularization of the ridge regression only for the sake of convenience.

3. Data reading and training set partition test set

Let's open Ipython notebook and create a new notebook. Of course, you can also enter it directly in the interactive command line of Python, but it is recommended to use notebook. The following example and output I was running in the notebook.

The library to import is declared first:

Import Matplotlib.pyplot as plt%matplotlib inlineimport  numpy as NPimport  Pandas as PDfromimport datasets, Linear_model

Then read the data with pandas:

# the parameters inside the read_csv are the CSV path on your computer, where the CSV file is placed in the CCPP directory under notebook run directory data = pd.read_csv ('. \ccpp\ Ccpp.csv')

We use the 4 columns of AT, V,AP and RH as sample characteristics. Use PE as Sample output:

X = data[['at'V'AP ' RH '  = data[['PE']

The data set is then divided into training sets and test sets:

 from Import  = Train_test_split (X, y, random_state=1)

    

4. Run Ridge regression with Scikit-learn

To run ridge regression, we have to specify the Hyper parameter \ (\alpha\). You might ask, "I don't know what the parameters are." "I don't know, then we randomly specify one (like 1), and we'll talk about how to use cross-validation to quickly select the optimal super-parameter from multiple input hyper-parameters \ (\alpha\).

 from Import  = Ridge (alpha=1) ridge.fit (X_train, Y_train)

After training, you can see what the model parameters are:

Print ridge.coef_ Print Ridge.intercept_

The output results are as follows:

[[ -1.97373209-0.2323016   0.06935852-0.15806479]][447.05552892]

This means that the model we get is:

\ (PE = 447.05552892-1.97373209*at-0.2323016*v + 0.06935852*ap-0.15806479*rh\)

But it's not finished yet? Why, because we assume that the Hyper parameter \ (\alpha\) is 1, in fact we do not know the Super parameter \ (\alpha\) How much best, the actual study is required in a group of optional \ (\alpha\) to select an optimal.

So we are not going to the above program in the N-type \ (\alpha\) value case, run n times, and then compare the results of the pros and cons of the degree? You can do this, but Scikit-learn provides another API for cross-validation to choose the optimal \ (\alpha\), and we use this API to select \ (\alpha\).

5. Select Ridge Regression parameter with Scikit-learn \ (\alpha\)

Here we assume that we want to select an optimal value in these 10 \ (\alpha\) values. The code is as follows:

 from Import  = Ridgecv (alphas=[0.01, 0.1, 0.5, 1, 3, 5, 7, ten,]) ridgecv.fit (X_train, y_train) Ridgecv.alpha_  /c3>

The output is: 7.0, indicating that in our given set of super-parameters, 7 is the optimal \ (\alpha\) value.

6. Using Scikit-learn to study the relationship between hyper-parameters \ (\alpha\) and regression coefficients \ (\theta\)

The loss function expression of ridge regression can be seen, the larger the \alpha\, the more severe the regular penalty, the smaller the regression coefficient \ (\alpha\), and the closer to 0. If \ (\alpha\) is smaller, that is, the smaller the regularization term, the regression coefficients \ (\alpha\) are more and more close to the normal linear regression coefficients.

Here we use Scikit-learn to study the changes of this ridge regression, examples of Scikit-learn's official website. Let's start with a single notebook or Python shell to run this example.

First, the class library is loaded:

Import NumPy as NP Import Matplotlib.pyplot as Plt  from Import Linear_model%matplotlib Inline

We then generate a 10x10 matrix X, which represents a group of 10 samples, each with 10 characteristic data. Generates a 10x1 vector y representing the sample output.

# X is a 10x10 matrix X = 1. /(Np.arange (1, one) + np.arange (0, ten) [:, Np.newaxis])#  y is a x 1 vectory = np.ones (10)

So we have the data, and then we prepare the Hyper parameter \ (\alpha\). We prepared 200 super-parameters to run ridge regression separately. The purpose of preparing so much is to draw back the relationship between \ (\alpha\) and \ (\theta\)

N_alphas =  alphas count is 200, both in 10-10-square and 10-2 Alphas = Np.logspace ( -10,-2, N_alphas)

With these 200 super parameters \ (\alpha\), we do 200 cycles, respectively, to find out the corresponding \theta\ (10 dimensions), save up the back of the drawing.

CLF = Linear_model. Ridge (fit_intercept== []#  Loops 200 times  for in Alphas:     # set the hyper-parameters    for this loop Clf.set_params (alpha=a)    # make ridge regression     clf.fit (X, y)    for each alpha  #  Save the theta of each hyper-parameter Alpha    coefs.append (clf.coef_)

Well, with 200 super parameters \ (\alpha\), and the corresponding \ (\theta\), we can draw. Our graph is the y-axis of \ (\alpha\) for the x-axis, \ (\theta\) for the 10 dimensions. The code is as follows:

Ax =PLT.GCA () ax.plot (Alphas, coefs)#log the value of alpha to make it easier to drawAx.set_xscale ('Log')#flips the size direction of the x-axis so that alpha is displayed from large to smallAx.set_xlim (Ax.get_xlim () [::-1]) Plt.xlabel ('Alpha') Plt.ylabel ('Weights') Plt.title ('Ridge coefficients as a function of the regularization') Plt.axis ('Tight') plt.show ()

The final figures are as follows:

  

As can be seen from the figure, when \ (\alpha\) is relatively large, close to \ (10^{-2}\), the \ (\theta\) 10 dimensions tend to 0. When \ (\alpha\) is relatively small, close to \ (10^{-10}\), the 10 dimensions of \ (\theta\) tend to regression coefficients of linear regression.

(Welcome reprint, reproduced please indicate the source.) Welcome to communicate: [email protected])

Learning Ridge Regression with Scikit-learn and pandas

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.